Skip to Content
VulnerabilitiesData Exposure

Data Exposure

Data exposure vulnerabilities occur when sensitive information (credentials, PII, internal data) is leaked through logs, prompts, or cross-tenant access.

Hardcoded Credentials (CRITICAL)

CWE-798 | OWASP LLM06

API keys, passwords, or tokens hardcoded in source files instead of using environment variables.

Vulnerable
API key hardcoded in source code
from openai import OpenAI

# CRITICAL: Hardcoded API key
client = OpenAI(api_key="sk-proj-abc123xyz789...")

def chat(message):
  return client.chat.completions.create(
      model="gpt-4",
      messages=[{"role": "user", "content": message}]
  )
Secure
Credentials from environment variables
import os
from openai import OpenAI

# Load from environment variable
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

if not client.api_key:
  raise ValueError("OPENAI_API_KEY environment variable not set")

def chat(message):
  return client.chat.completions.create(
      model="gpt-4",
      messages=[{"role": "user", "content": message}]
  )

Inkog detects hardcoded secrets including:

  • API keys (OpenAI, AWS, Google, etc.)
  • OAuth tokens
  • Private keys
  • Database connection strings
  • Webhook secrets

Inkog’s Hybrid Privacy Model automatically redacts detected secrets before sending code to the server. Your actual credentials never leave your machine.


Logging Sensitive Data (MEDIUM)

CWE-532, CWE-200 | OWASP LLM06

LLM responses or user input containing secrets/PII logged without sanitization.

Vulnerable
Full conversation logged including PII
import logging

def chat(user_input):
  response = llm.invoke(user_input)

  # Logs may contain PII, API keys, passwords
  logging.info(f"User: {user_input}")
  logging.info(f"Response: {response}")

  return response
Secure
Redacted logging with PII removal
import logging

def redact_sensitive(text):
  # Implement PII redaction
  return redact_pii(text)

def chat(user_input):
  response = llm.invoke(user_input)

  # Log redacted versions only
  logging.info(f"User: {redact_sensitive(user_input)}")
  logging.info(f"Response: {redact_sensitive(response)}")

  return response

Cross-Tenant Data Leakage (CRITICAL)

CWE-639

Data from one tenant accessible to another due to improper isolation in multi-tenant systems.

Vulnerable
Shared vector store without tenant isolation
# Shared vector store for all tenants
vectorstore = Chroma(collection_name="documents")

def query(user_id, question):
  # CRITICAL: No tenant filtering
  # User can retrieve other tenants' documents
  docs = vectorstore.similarity_search(question)
  return llm.invoke(f"Context: {docs}\nQuestion: {question}")
Secure
Tenant-scoped queries with namespace isolation
def query(tenant_id, user_id, question):
  # Tenant-isolated collection
  vectorstore = Chroma(
      collection_name=f"tenant_{tenant_id}_documents"
  )

  # Additional metadata filtering
  docs = vectorstore.similarity_search(
      question,
      filter={"tenant_id": tenant_id}
  )

  return llm.invoke(f"Context: {docs}\nQuestion: {question}")

Isolation strategies:

  1. Namespace isolation - Separate collections per tenant
  2. Metadata filtering - Filter by tenant_id on every query
  3. Row-level security - Database enforced access control
  4. Encryption - Tenant-specific encryption keys

Secrets in Prompts (HIGH)

Sensitive information embedded directly in prompt templates.

Vulnerable
Database credentials in prompt
SYSTEM_PROMPT = """You are a database assistant.
Connection string: postgresql://admin:secretpass@db.company.com/prod
Help users write queries."""

# If prompt is leaked, credentials are exposed
Secure
Credentials isolated from prompt context
import os

SYSTEM_PROMPT = """You are a database assistant.
Help users write queries. You do not have direct database access."""

# Credentials only in secure execution context
def execute_query(query):
  conn = psycopg2.connect(os.environ["DATABASE_URL"])
  # ... execute with proper parameterization

Best Practices

  1. Never hardcode secrets - Use environment variables or secret managers
  2. Redact logs - Scrub PII and credentials before logging
  3. Isolate tenants - Namespace all data by tenant ID
  4. Audit access - Log who accessed what data
  5. Encrypt at rest - Encrypt sensitive data in storage
  6. Rotate credentials - Regular rotation limits blast radius
Last updated on