Data Flow Analysis
Inkog uses advanced data flow techniques to trace vulnerabilities through complex codebases.
Taint Analysis
Taint analysis tracks “tainted” (untrusted) data from sources to sinks.
Sources (Where Taint Originates)
# User input sources
user_input = request.json["query"] # HTTP request → TAINTED
cli_arg = sys.argv[1] # CLI argument → TAINTED
file_content = open("upload.txt").read() # File upload → TAINTED
rag_doc = retriever.get_relevant(query) # RAG retrieval → TAINTEDSinks (Where Taint Causes Harm)
# Dangerous sinks
llm.generate(tainted_prompt) # Prompt injection
tool.execute(tainted_args) # Arbitrary tool use
exec(tainted_code) # Code execution
memory.save(tainted_data) # Memory poisoningTaint Propagation
Taint flows through operations:
user_input = request.json["query"] # TAINTED
# Propagation through string operations
prompt = f"Answer: {user_input}" # TAINTED (concatenation)
upper = user_input.upper() # TAINTED (string method)
parts = user_input.split() # TAINTED (all elements)
# Propagation through data structures
data = {"query": user_input} # data["query"] is TAINTED
items = [user_input, "safe"] # items[0] is TAINTEDBackward Slicing
Backward slicing answers: “What code affected this vulnerable sink?”
Example
def process_request(request):
# Line 1: Source
user_query = request.json["query"]
# Line 2: Transform
formatted = format_query(user_query)
# Line 3: Conditional
if len(formatted) > 100:
formatted = formatted[:100]
# Line 4: More processing
enhanced = add_context(formatted)
# Line 5: SINK - Vulnerability here
response = llm.generate(enhanced)
return responseBackward slice from line 5:
Line 5: llm.generate(enhanced)
↑
Line 4: enhanced = add_context(formatted)
↑
Line 3: formatted = formatted[:100] (conditional)
↑
Line 2: formatted = format_query(user_query)
↑
Line 1: user_query = request.json["query"] ← SOURCEInkog reports the complete taint path, helping you understand exactly how untrusted data reaches sensitive operations.
Inter-Procedural Analysis
Inkog tracks data flow across function boundaries:
# file: utils.py
def process_input(raw):
cleaned = raw.strip()
return cleaned
# file: handler.py
def handle_query(request):
query = request.json["query"]
processed = process_input(query) # Cross-function tracking
return agent.run(processed)
# file: agent.py
class Agent:
def run(self, input_text):
prompt = self.build_prompt(input_text)
return self.llm.generate(prompt) # SINK reached
def build_prompt(self, text):
return f"Query: {text}" # Taint preservedInkog traces:
request.json["query"] (handler.py:3)
↓ (parameter passing)
process_input.raw (utils.py:2)
↓ (return value)
handler.processed (handler.py:4)
↓ (method call)
Agent.run.input_text (agent.py:3)
↓ (method call)
Agent.build_prompt.text (agent.py:7)
↓ (f-string)
Agent.run.prompt (agent.py:4)
↓ (LLM call)
llm.generate() ← VULNERABILITYSanitizers
Sanitizers remove taint when properly applied:
# Built-in sanitizer recognition
from inkog.sanitizers import sanitize_for_llm
user_input = request.json["query"] # TAINTED
# Sanitization removes taint
safe_input = sanitize_for_llm(user_input) # NOT TAINTED
# Now safe to use
llm.generate(f"Query: {safe_input}") # OKCustom Sanitizers
Define your own sanitizers in configuration:
sanitizers:
- function: "my_app.security.sanitize_prompt"
removes_taint: true
- function: "my_app.validation.validate_input"
removes_taint: conditional # Only if validation passes
condition: "return_value == True"
- function: "html.escape"
removes_taint: partial # Removes XSS taint, not prompt injection
taint_types: [xss]Conditional Taint
Some operations conditionally affect taint:
user_input = request.json["query"] # TAINTED
if is_safe(user_input):
# Taint REMOVED in this branch (validation passed)
process_safe(user_input)
else:
# Taint PRESERVED in this branch
reject(user_input)Inkog understands common validation patterns:
# Allowlist validation
if user_input in ALLOWED_QUERIES:
safe_input = user_input # Taint removed
# Regex validation
if re.match(r'^[a-zA-Z0-9\s]+$', user_input):
safe_input = user_input # Taint removed
# Length validation (partial)
if len(user_input) <= 100:
bounded_input = user_input # Taint partially reducedPath Sensitivity
Inkog is path-sensitive, meaning it considers control flow:
def process(request):
query = request.json["query"]
mode = request.json["mode"]
if mode == "safe":
# Path 1: Safe mode - uses template
response = llm.generate(SAFE_TEMPLATE, query=query)
else:
# Path 2: Unsafe mode - direct concatenation
response = llm.generate(f"Answer: {query}")
return responseInkog reports:
VULNERABILITY in Path 2 (line 9):
Prompt injection via direct string concatenation.
Path: mode != "safe"
Path 1 (line 6) is SAFE:
Uses parameterized template.Visualization
Generate a visual data flow graph:
The generated HTML shows:
- Green nodes: Safe operations
- Red nodes: Vulnerable sinks
- Orange nodes: Tainted data
- Blue edges: Data flow paths
- Dashed edges: Conditional flows
Performance Optimizations
Incremental Analysis
# First scan builds full graph
inkog scan . # 5.2s
# Subsequent scans only analyze changes
inkog scan . # 0.8s (incremental)Scope Limiting
analysis:
# Limit call depth
max_call_depth: 10
# Limit path exploration
max_paths: 1000
# Focus on specific entry points
entry_points:
- "api.handlers.*"
- "cli.main"Reducing analysis scope may cause false negatives. Use judiciously.