Confidence Calibration
Inkog uses Bayesian confidence calibration to improve detection accuracy over time. As you provide feedback on findings, the system learns which patterns are accurate in your specific codebase.
Why Calibration Matters
Static analysis tools often produce false positives - warnings that aren’t actual vulnerabilities. Traditional tools have fixed confidence scores that never improve. Inkog is different:
- Learns from feedback: Each time you mark a finding as true/false positive, calibration improves
- Per-rule learning: Different rules calibrate independently based on your feedback
- Reliability tracking: The system tells you how confident it is in its calibration
How It Works
Base Confidence → User Feedback → Bayesian Update → Calibrated Confidence
(0.85) (FP) ↓ (0.78)Each finding includes both the original and calibrated confidence:
| Field | Description |
|---|---|
confidence | Base confidence from pattern definition |
calibrated_confidence | Adjusted confidence based on feedback |
calibration_reliability | How stable the calibration is |
calibration_samples | Number of feedback samples collected |
Reliability Levels
| Level | Samples | Meaning |
|---|---|---|
insufficient | < 5 | Not enough data - use base confidence |
low | 5-10 | Early calibration, may shift significantly |
moderate | 11-30 | Reasonably stable, calibration is useful |
high | 31-100 | Stable calibration, high confidence |
very_high | > 100 | Very stable, unlikely to change |
When calibration_reliability is “moderate” or higher, use calibrated_confidence for decision-making. It better reflects real-world accuracy in your environment.
Providing Feedback
You can submit feedback through:
- Dashboard: Click “Mark as False Positive” or “Confirm” on any finding
- API: Use the Feedback API to programmatically submit feedback
- CLI: Use
inkog feedback --finding-id <id> --type false_positive
Feedback Types
| Type | When to Use |
|---|---|
true_positive | The finding is a real vulnerability that needs fixing |
false_positive | The finding is not actually vulnerable |
uncertain | You’re not sure - this still helps calibration |
Example: Improving Prompt Injection Detection
Suppose the universal_prompt_injection pattern flags a legitimate system prompt:
# This is flagged but it's safe - it's a fixed system prompt
SYSTEM_PROMPT = "You are a helpful assistant."- You mark it as
false_positive - The calibrated confidence drops (e.g., 0.85 → 0.78)
- Future scans weight this pattern slightly lower
- After 20+ samples, the system knows this pattern’s real accuracy in your codebase
Calibration is per-organization in cloud deployments. Your feedback trains the model for your specific codebase patterns, not others.
Viewing Calibration Stats
Dashboard
Navigate to Settings → Calibration to see:
- Calibration status for all rules
- Sample counts and reliability levels
- Recommendations for rules needing more feedback
API
curl https://api.inkog.io/v1/feedback \
-H "Authorization: Bearer YOUR_API_KEY"Returns all calibration data including recommendations for rules that need more feedback samples.
Best Practices
- Be consistent: Same type of code should get the same feedback
- Provide context: Add notes explaining why something is/isn’t vulnerable
- Review regularly: Check calibration stats monthly
- Focus on high-volume rules: Prioritize calibrating rules that fire frequently
Integration with CI/CD
You can set confidence thresholds in your CI pipeline:
# inkog.yaml
policy:
fail_on:
min_confidence: 0.7 # Use calibrated confidence if available
severity: HIGHAs calibration improves, borderline findings will be handled more accurately, reducing alert fatigue without missing real vulnerabilities.
Related
- Feedback API Reference - API for submitting feedback
- Security Scoring - How findings are scored
- Configuration - Policy configuration