Data Flow Analysis

Inkog uses advanced data flow techniques to trace vulnerabilities through complex codebases.

Taint Analysis

Taint analysis tracks “tainted” (untrusted) data from sources to sinks.

Sources (Where Taint Originates)


# User input sources
user_input = request.json["query"]        # HTTP request → TAINTED
cli_arg = sys.argv[1]                     # CLI argument → TAINTED
file_content = open("upload.txt").read()  # File upload → TAINTED
rag_doc = retriever.get_relevant(query)   # RAG retrieval → TAINTED

Sinks (Where Taint Causes Harm)


# Dangerous sinks
llm.generate(tainted_prompt)      # Prompt injection
tool.execute(tainted_args)        # Arbitrary tool use
exec(tainted_code)                # Code execution
memory.save(tainted_data)         # Memory poisoning

Taint Propagation

Taint flows through operations:


user_input = request.json["query"]  # TAINTED
 
# Propagation through string operations
prompt = f"Answer: {user_input}"    # TAINTED (concatenation)
upper = user_input.upper()          # TAINTED (string method)
parts = user_input.split()          # TAINTED (all elements)
 
# Propagation through data structures
data = {"query": user_input}        # data["query"] is TAINTED
items = [user_input, "safe"]        # items[0] is TAINTED

Backward Slicing

Backward slicing answers: “What code affected this vulnerable sink?”

Example


def process_request(request):
    # Line 1: Source
    user_query = request.json["query"]
 
    # Line 2: Transform
    formatted = format_query(user_query)
 
    # Line 3: Conditional
    if len(formatted) > 100:
        formatted = formatted[:100]
 
    # Line 4: More processing
    enhanced = add_context(formatted)
 
    # Line 5: SINK - Vulnerability here
    response = llm.generate(enhanced)
 
    return response

Backward slice from line 5:


Line 5: llm.generate(enhanced)
    ↑
Line 4: enhanced = add_context(formatted)
    ↑
Line 3: formatted = formatted[:100] (conditional)
    ↑
Line 2: formatted = format_query(user_query)
    ↑
Line 1: user_query = request.json["query"]  ← SOURCE

Inkog reports the complete taint path, helping you understand exactly how untrusted data reaches sensitive operations.

Inter-Procedural Analysis

Inkog tracks data flow across function boundaries:


# file: utils.py
def process_input(raw):
    cleaned = raw.strip()
    return cleaned
 
# file: handler.py
def handle_query(request):
    query = request.json["query"]
    processed = process_input(query)   # Cross-function tracking
    return agent.run(processed)
 
# file: agent.py
class Agent:
    def run(self, input_text):
        prompt = self.build_prompt(input_text)
        return self.llm.generate(prompt)  # SINK reached
 
    def build_prompt(self, text):
        return f"Query: {text}"           # Taint preserved

Inkog traces:


request.json["query"] (handler.py:3)
    ↓ (parameter passing)
process_input.raw (utils.py:2)
    ↓ (return value)
handler.processed (handler.py:4)
    ↓ (method call)
Agent.run.input_text (agent.py:3)
    ↓ (method call)
Agent.build_prompt.text (agent.py:7)
    ↓ (f-string)
Agent.run.prompt (agent.py:4)
    ↓ (LLM call)
llm.generate() ← VULNERABILITY

Sanitizers

Sanitizers remove taint when properly applied:


# Built-in sanitizer recognition
from inkog.sanitizers import sanitize_for_llm
 
user_input = request.json["query"]  # TAINTED
 
# Sanitization removes taint
safe_input = sanitize_for_llm(user_input)  # NOT TAINTED
 
# Now safe to use
llm.generate(f"Query: {safe_input}")  # OK

Custom Sanitizers

Define your own sanitizers in configuration:

.inkog.yaml


sanitizers:
  - function: "my_app.security.sanitize_prompt"
    removes_taint: true
 
  - function: "my_app.validation.validate_input"
    removes_taint: conditional  # Only if validation passes
    condition: "return_value == True"
 
  - function: "html.escape"
    removes_taint: partial  # Removes XSS taint, not prompt injection
    taint_types: [xss]

Conditional Taint

Some operations conditionally affect taint:


user_input = request.json["query"]  # TAINTED
 
if is_safe(user_input):
    # Taint REMOVED in this branch (validation passed)
    process_safe(user_input)
else:
    # Taint PRESERVED in this branch
    reject(user_input)

Inkog understands common validation patterns:


# Allowlist validation
if user_input in ALLOWED_QUERIES:
    safe_input = user_input  # Taint removed
 
# Regex validation
if re.match(r'^[a-zA-Z0-9\s]+$', user_input):
    safe_input = user_input  # Taint removed
 
# Length validation (partial)
if len(user_input) <= 100:
    bounded_input = user_input  # Taint partially reduced

Path Sensitivity

Inkog is path-sensitive, meaning it considers control flow:


def process(request):
    query = request.json["query"]
    mode = request.json["mode"]
 
    if mode == "safe":
        # Path 1: Safe mode - uses template
        response = llm.generate(SAFE_TEMPLATE, query=query)
    else:
        # Path 2: Unsafe mode - direct concatenation
        response = llm.generate(f"Answer: {query}")
 
    return response

Inkog reports:


VULNERABILITY in Path 2 (line 9):
  Prompt injection via direct string concatenation.
  Path: mode != "safe"

Path 1 (line 6) is SAFE:
  Uses parameterized template.

Visualization

Generate a visual data flow graph:

Terminal

$inkog scan . --output-graph flow.html

Scanning... Found 3 vulnerabilities Generated interactive graph: flow.html

The generated HTML shows:

Green nodes: Safe operations
Red nodes: Vulnerable sinks
Orange nodes: Tainted data
Blue edges: Data flow paths
Dashed edges: Conditional flows

Performance Optimizations

Incremental Analysis


# First scan builds full graph
inkog scan .  # 5.2s
 
# Subsequent scans only analyze changes
inkog scan .  # 0.8s (incremental)

Scope Limiting

.inkog.yaml


analysis:
  # Limit call depth
  max_call_depth: 10
 
  # Limit path exploration
  max_paths: 1000
 
  # Focus on specific entry points
  entry_points:
    - "api.handlers.*"
    - "cli.main"

Reducing analysis scope may cause false negatives. Use judiciously.