Resource Exhaustion

Resource exhaustion vulnerabilities allow attackers to consume excessive compute, memory, or API tokens, leading to denial of service or runaway costs.

Infinite Loop (CRITICAL)

CVSS 9.0 | CWE-835, CWE-400 | OWASP LLM10

Loop condition depends on LLM output without deterministic termination guarantee.

Vulnerable

Loop continues until LLM says 'done' - may never terminate

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()
result = ""

# Dangerous: No termination guarantee
while "done" not in result.lower():
  result = llm.invoke(f"Continue task: {result}")
  print(result)

Secure

Hard limit on iterations with timeout

from langchain.chat_models import ChatOpenAI
import time

llm = ChatOpenAI()
result = ""
MAX_ITERATIONS = 10
start_time = time.time()
TIMEOUT = 60  # seconds

for i in range(MAX_ITERATIONS):
  if time.time() - start_time > TIMEOUT:
      break
  result = llm.invoke(f"Continue task: {result}")
  if "done" in result.lower():
      break

Compliance:

EU AI Act: Article 15 (Accuracy & Cybersecurity)
NIST AI RMF: MAP 1.3 (System Reliability)

Context Exhaustion (HIGH)

CVSS 7.5 | CWE-400 | OWASP LLM09, LLM10

Unbounded accumulation of context/message history leads to exponential token consumption.

Vulnerable

Messages grow unbounded, eventually exceeding context window

messages = []

def chat(user_input):
  messages.append({"role": "user", "content": user_input})
  response = llm.invoke(messages)
  messages.append({"role": "assistant", "content": response})
  return response

# After 1000 messages, context window is exhausted

Secure

Sliding window keeps last N messages

from collections import deque

MAX_MESSAGES = 20
messages = deque(maxlen=MAX_MESSAGES)

def chat(user_input):
  messages.append({"role": "user", "content": user_input})
  response = llm.invoke(list(messages))
  messages.append({"role": "assistant", "content": response})
  return response

# Old messages automatically removed

Token Bombing (CRITICAL)

CVSS 9.0 | CWE-770 | OWASP LLM10

Excessive token consumption through crafted inputs causing resource exhaustion.

Vulnerable

No input length validation

def process_document(content):
  # No validation - attacker sends 100MB document
  return llm.invoke(f"Summarize: {content}")

Secure

Token count validation before processing

import tiktoken

MAX_TOKENS = 4000
enc = tiktoken.get_encoding("cl100k_base")

def process_document(content):
  tokens = enc.encode(content)
  if len(tokens) > MAX_TOKENS:
      raise ValueError(f"Input exceeds {MAX_TOKENS} tokens")
  return llm.invoke(f"Summarize: {content}")

Missing Rate Limits (HIGH)

CVSS 7.5 | CWE-400

API endpoints lack rate limiting allowing abuse and denial of service.

Vulnerable

No rate limiting on expensive LLM calls

@app.post("/chat")
def chat(request):
  # No rate limit - attacker can spam requests
  return llm.invoke(request.message)

Secure

Rate limiting per user/IP

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@app.post("/chat")
@limiter.limit("10/minute")
def chat(request):
  return llm.invoke(request.message)

RAG Over-fetching (MEDIUM)

CVSS 5.0 | CWE-400

Retrieval-Augmented Generation fetches excessive documents causing context bloat.

Vulnerable

Fetches all matching documents

def rag_query(question):
  # Fetches unlimited documents
  docs = vectorstore.similarity_search(question)
  context = "\n".join([d.page_content for d in docs])
  return llm.invoke(f"Context: {context}\nQuestion: {question}")

Secure

Limited retrieval with relevance threshold

def rag_query(question):
  # Limit to top 3 most relevant
  docs = vectorstore.similarity_search(
      question,
      k=3,
      score_threshold=0.7
  )
  context = "\n".join([d.page_content for d in docs])
  return llm.invoke(f"Context: {context}\nQuestion: {question}")

Recursive Tool Calling (HIGH)

CVSS 7.5 | CWE-674

Tool recursively calls itself leading to exponential resource consumption.

Vulnerable

Tool can call itself infinitely

@tool
def research(topic):
  result = llm.invoke(f"Research: {topic}")
  # Dangerous: Can trigger itself recursively
  if "need more info" in result:
      return research(result)  # Infinite recursion
  return result

Secure

Depth tracking prevents infinite recursion

@tool
def research(topic, depth=0, max_depth=3):
  if depth >= max_depth:
      return "Max research depth reached"

  result = llm.invoke(f"Research: {topic}")
  if "need more info" in result:
      return research(result, depth + 1, max_depth)
  return result

All resource exhaustion vulnerabilities map to OWASP LLM10: Unbounded Consumption and should be addressed with hard limits, timeouts, and rate limiting.