Findings Validation
Every finding generated by any of the 58 scanning agents passes through a multi-layered validation pipeline before being reported. This pipeline is responsible for the 94%+ true positive rate.
Validation Architectureβ
Layer 1: Always-Reliable Bypassβ
Certain vulnerability types have deterministic detection logic that almost never produces false positives. These types bypass the LLM validation entirely to save cost and avoid incorrect LLM rejections:
jwt_none_algorithm-- Algorithm set to "none", server accepted the forged tokenjwt_algorithm_confusion-- HS256/RS256 confusion, server accepted the wrong algorithmjwt_weak_secret-- Dictionary attack found the signing secretsqli_error-- Database engine error message in response bodysensitive_file_exposed-- Known sensitive file (.env,.git/config, etc.) returned with content
All other finding types proceed to heuristic filtering.
Layer 2: Heuristic Filters (H3-H19)β
Deterministic rules that auto-reject known false positive patterns. Each rule targets a specific FP pattern identified through scan result analysis.
| Rule | Name | What It Rejects |
|---|---|---|
| H3 | NoSQL Target SQLi | SQL injection findings when the target uses MongoDB, DynamoDB, or other NoSQL databases -- SQL payloads are meaningless against document databases |
| H4 | Undeclared Params | SQLi findings on parameters that do not exist in the OpenAPI specification -- if the spec declares no such parameter, the server likely ignores it |
| H4b | Type Validation Errors | SQLi error-based findings caused by framework type validation (e.g., MethodArgumentTypeMismatchException, Invalid UUID) rather than actual SQL engine errors |
| H4c | SQLi on File Upload | sqli_auth_bypass findings on file-upload endpoints or PUT method -- file operations produce status codes that mimic auth bypass patterns |
| H4d | Credential Auth-Init | sensitive_data_credential findings on authentication-initiation endpoints (login, register, token) -- credentials are expected on these endpoints |
| H4e | Non-Transactional Race | race_double_submit findings on endpoints without transactional side effects -- race conditions only matter on state-changing operations |
| H4f | Non-LDAP LDAPi | ldap_injection findings on targets without LDAP stack presence -- LDAP payloads are meaningless without an LDAP directory |
| H5 | GET Race Conditions | Race condition findings on GET endpoints -- read-only operations have no exploitable side effects |
| H6 | IDOR Collection | IDOR findings on GET endpoints that return all records regardless of the parameter value -- the endpoint is a collection endpoint, not an individual resource |
| H7 | Default Data Exposure | default_data_exposure findings triggered only by adding query parameters -- not a real exposure |
| H8 | File Upload on JSON | file_upload findings on endpoints that expect JSON or XML bodies rather than multipart form data |
| H9 | Business Logic JWT | Business logic findings on JWT-related endpoints -- these are auth operations, not business logic flaws |
| H10 | Low-Confidence XSS | XSS findings with very low confidence where the agent itself flagged the finding as unverified |
| H11 | Auth Bypass Non-Admin | AUTH_BYPASS findings on non-admin endpoints -- auth bypass is only meaningful on privileged resources |
| H12 | Cache Poison No-Cache | cache_poisoning findings without evidence that the target actually uses caching (no Cache-Control, X-Cache, or Age headers) |
| H13 | IDOR Non-Exploitable | IDOR variant findings on endpoints where the response data does not change with different IDs |
| H14 | Content Type Confusion | content_type_confusion findings when the server returned an error or the content type is expected |
| H15 | Non-Callback Endpoints | callback_manipulation on endpoints that do not handle callbacks or webhooks |
| H16 | File Upload Error | unrestricted_file_upload findings when the server returned an error response (upload was rejected) |
| H17 | Standard API Response | default_data_exposure on standard API response patterns (pagination, metadata, etc.) |
| H18 | XSS in JSON | XSS findings where the server response content type is application/json -- JSON is not an HTML execution context |
| H19 | Auth Bypass 401/403 | Auth bypass findings where the attack response status code is 401 or 403 -- the auth check actually worked |
Layer 3: Multi-Probe Confirmationβ
Findings that pass heuristic filtering are replayed with payload variations to confirm exploitability:
- Original payload replay -- Resend the exact attack to verify the behavior is reproducible
- Payload variation -- Send modified payloads to confirm the vulnerability is not a one-off response quirk
- Baseline comparison -- Compare attack response against the stored baseline response to verify differential behavior
Layer 4: AI Validationβ
The findingValidator sends surviving findings to an LLM (Claude or GPT) for semantic analysis. The LLM receives:
- The attack request (what payload was sent)
- The attack response (what the server returned)
- A baseline response (what the server normally returns)
- The vulnerability type and a true-positive description explaining what a real positive looks like
- Application context from the blackboard (tech stack, target type)
The LLM returns:
- Confidence adjustment (may increase or decrease the agent's original confidence)
- FP flag (whether the LLM considers this a false positive)
- Reasoning (explanation for the decision)
Performance constraints:
- Only validates medium, high, and critical severity findings -- low/info findings are auto-passed
- Timeout of 25 seconds per validation -- if the LLM times out, the finding passes through unchanged
Layer 5: Deduplicationβ
The findingDeduplicator removes duplicate findings at two levels:
- APP_WIDE Deduplication -- For vulnerability types that apply to the entire application (e.g., missing security headers), only one finding is kept per application
- Endpoint-Level Deduplication -- For endpoint-specific findings, semantic similarity is used to merge findings that describe the same vulnerability on the same endpoint from different agents
Confidence Tiersβ
Final findings are assigned a confidence tier based on their score:
| Tier | Score Range | Meaning |
|---|---|---|
| Confirmed | 90-100 | Exploitability verified with evidence |
| High | 70-89 | Strong indicators, high likelihood of true positive |
| Medium | 50-69 | Moderate evidence, may need manual review |
| Low | 30-49 | Weak indicators, likely requires manual verification |
| Informational | 0-29 | Observations that may warrant investigation |
Related Pagesβ
- Scanner Agents Catalog -- Complete list of all 58 agents
- Phase Pipeline -- How agents are orchestrated across six phases
- AppSec Overview -- Module overview and architecture