Skip to Content
Core ConceptsData Flow Analysis

Data Flow Analysis

Inkog uses advanced data flow techniques to trace vulnerabilities through complex codebases.

Taint Analysis

Taint analysis tracks “tainted” (untrusted) data from sources to sinks.

Sources (Where Taint Originates)

# User input sources user_input = request.json["query"] # HTTP request → TAINTED cli_arg = sys.argv[1] # CLI argument → TAINTED file_content = open("upload.txt").read() # File upload → TAINTED rag_doc = retriever.get_relevant(query) # RAG retrieval → TAINTED

Sinks (Where Taint Causes Harm)

# Dangerous sinks llm.generate(tainted_prompt) # Prompt injection tool.execute(tainted_args) # Arbitrary tool use exec(tainted_code) # Code execution memory.save(tainted_data) # Memory poisoning

Taint Propagation

Taint flows through operations:

user_input = request.json["query"] # TAINTED # Propagation through string operations prompt = f"Answer: {user_input}" # TAINTED (concatenation) upper = user_input.upper() # TAINTED (string method) parts = user_input.split() # TAINTED (all elements) # Propagation through data structures data = {"query": user_input} # data["query"] is TAINTED items = [user_input, "safe"] # items[0] is TAINTED

Backward Slicing

Backward slicing answers: “What code affected this vulnerable sink?”

Example

def process_request(request): # Line 1: Source user_query = request.json["query"] # Line 2: Transform formatted = format_query(user_query) # Line 3: Conditional if len(formatted) > 100: formatted = formatted[:100] # Line 4: More processing enhanced = add_context(formatted) # Line 5: SINK - Vulnerability here response = llm.generate(enhanced) return response

Backward slice from line 5:

Line 5: llm.generate(enhanced) Line 4: enhanced = add_context(formatted) Line 3: formatted = formatted[:100] (conditional) Line 2: formatted = format_query(user_query) Line 1: user_query = request.json["query"] ← SOURCE

Inkog reports the complete taint path, helping you understand exactly how untrusted data reaches sensitive operations.

Inter-Procedural Analysis

Inkog tracks data flow across function boundaries:

# file: utils.py def process_input(raw): cleaned = raw.strip() return cleaned # file: handler.py def handle_query(request): query = request.json["query"] processed = process_input(query) # Cross-function tracking return agent.run(processed) # file: agent.py class Agent: def run(self, input_text): prompt = self.build_prompt(input_text) return self.llm.generate(prompt) # SINK reached def build_prompt(self, text): return f"Query: {text}" # Taint preserved

Inkog traces:

request.json["query"] (handler.py:3) ↓ (parameter passing) process_input.raw (utils.py:2) ↓ (return value) handler.processed (handler.py:4) ↓ (method call) Agent.run.input_text (agent.py:3) ↓ (method call) Agent.build_prompt.text (agent.py:7) ↓ (f-string) Agent.run.prompt (agent.py:4) ↓ (LLM call) llm.generate() ← VULNERABILITY

Sanitizers

Sanitizers remove taint when properly applied:

# Built-in sanitizer recognition from inkog.sanitizers import sanitize_for_llm user_input = request.json["query"] # TAINTED # Sanitization removes taint safe_input = sanitize_for_llm(user_input) # NOT TAINTED # Now safe to use llm.generate(f"Query: {safe_input}") # OK

Custom Sanitizers

Define your own sanitizers in configuration:

.inkog.yaml
sanitizers: - function: "my_app.security.sanitize_prompt" removes_taint: true - function: "my_app.validation.validate_input" removes_taint: conditional # Only if validation passes condition: "return_value == True" - function: "html.escape" removes_taint: partial # Removes XSS taint, not prompt injection taint_types: [xss]

Conditional Taint

Some operations conditionally affect taint:

user_input = request.json["query"] # TAINTED if is_safe(user_input): # Taint REMOVED in this branch (validation passed) process_safe(user_input) else: # Taint PRESERVED in this branch reject(user_input)

Inkog understands common validation patterns:

# Allowlist validation if user_input in ALLOWED_QUERIES: safe_input = user_input # Taint removed # Regex validation if re.match(r'^[a-zA-Z0-9\s]+$', user_input): safe_input = user_input # Taint removed # Length validation (partial) if len(user_input) <= 100: bounded_input = user_input # Taint partially reduced

Path Sensitivity

Inkog is path-sensitive, meaning it considers control flow:

def process(request): query = request.json["query"] mode = request.json["mode"] if mode == "safe": # Path 1: Safe mode - uses template response = llm.generate(SAFE_TEMPLATE, query=query) else: # Path 2: Unsafe mode - direct concatenation response = llm.generate(f"Answer: {query}") return response

Inkog reports:

VULNERABILITY in Path 2 (line 9): Prompt injection via direct string concatenation. Path: mode != "safe" Path 1 (line 6) is SAFE: Uses parameterized template.

Visualization

Generate a visual data flow graph:

Terminal
$inkog scan . --output-graph flow.html
Scanning... Found 3 vulnerabilities Generated interactive graph: flow.html

The generated HTML shows:

  • Green nodes: Safe operations
  • Red nodes: Vulnerable sinks
  • Orange nodes: Tainted data
  • Blue edges: Data flow paths
  • Dashed edges: Conditional flows

Performance Optimizations

Incremental Analysis

# First scan builds full graph inkog scan . # 5.2s # Subsequent scans only analyze changes inkog scan . # 0.8s (incremental)

Scope Limiting

.inkog.yaml
analysis: # Limit call depth max_call_depth: 10 # Limit path exploration max_paths: 1000 # Focus on specific entry points entry_points: - "api.handlers.*" - "cli.main"

Reducing analysis scope may cause false negatives. Use judiciously.

Last updated on