Data Exposure
Data exposure vulnerabilities occur when sensitive information (credentials, PII, internal data) is leaked through logs, prompts, or cross-tenant access.
Hardcoded Credentials (CRITICAL)
CWE-798 | OWASP LLM06
API keys, passwords, or tokens hardcoded in source files instead of using environment variables.
from openai import OpenAI
# CRITICAL: Hardcoded API key
client = OpenAI(api_key="sk-proj-abc123xyz789...")
def chat(message):
return client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": message}]
)import os
from openai import OpenAI
# Load from environment variable
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
if not client.api_key:
raise ValueError("OPENAI_API_KEY environment variable not set")
def chat(message):
return client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": message}]
)Inkog detects hardcoded secrets including:
- API keys (OpenAI, AWS, Google, etc.)
- OAuth tokens
- Private keys
- Database connection strings
- Webhook secrets
Inkog’s Hybrid Privacy Model automatically redacts detected secrets before sending code to the server. Your actual credentials never leave your machine.
Logging Sensitive Data (MEDIUM)
CWE-532, CWE-200 | OWASP LLM06
LLM responses or user input containing secrets/PII logged without sanitization.
import logging
def chat(user_input):
response = llm.invoke(user_input)
# Logs may contain PII, API keys, passwords
logging.info(f"User: {user_input}")
logging.info(f"Response: {response}")
return responseimport logging
def redact_sensitive(text):
# Implement PII redaction
return redact_pii(text)
def chat(user_input):
response = llm.invoke(user_input)
# Log redacted versions only
logging.info(f"User: {redact_sensitive(user_input)}")
logging.info(f"Response: {redact_sensitive(response)}")
return responseCross-Tenant Data Leakage (CRITICAL)
CWE-639
Data from one tenant accessible to another due to improper isolation in multi-tenant systems.
# Shared vector store for all tenants
vectorstore = Chroma(collection_name="documents")
def query(user_id, question):
# CRITICAL: No tenant filtering
# User can retrieve other tenants' documents
docs = vectorstore.similarity_search(question)
return llm.invoke(f"Context: {docs}\nQuestion: {question}")def query(tenant_id, user_id, question):
# Tenant-isolated collection
vectorstore = Chroma(
collection_name=f"tenant_{tenant_id}_documents"
)
# Additional metadata filtering
docs = vectorstore.similarity_search(
question,
filter={"tenant_id": tenant_id}
)
return llm.invoke(f"Context: {docs}\nQuestion: {question}")Isolation strategies:
- Namespace isolation - Separate collections per tenant
- Metadata filtering - Filter by tenant_id on every query
- Row-level security - Database enforced access control
- Encryption - Tenant-specific encryption keys
Secrets in Prompts (HIGH)
Sensitive information embedded directly in prompt templates.
SYSTEM_PROMPT = """You are a database assistant.
Connection string: postgresql://admin:secretpass@db.company.com/prod
Help users write queries."""
# If prompt is leaked, credentials are exposedimport os
SYSTEM_PROMPT = """You are a database assistant.
Help users write queries. You do not have direct database access."""
# Credentials only in secure execution context
def execute_query(query):
conn = psycopg2.connect(os.environ["DATABASE_URL"])
# ... execute with proper parameterizationBest Practices
- Never hardcode secrets - Use environment variables or secret managers
- Redact logs - Scrub PII and credentials before logging
- Isolate tenants - Namespace all data by tenant ID
- Audit access - Log who accessed what data
- Encrypt at rest - Encrypt sensitive data in storage
- Rotate credentials - Regular rotation limits blast radius