ADR-006: Authentication — JWT + SSO + Fail-Secure KB Site
Status: ✅ Accepted Date: 2026-04-01 Decision Makers: Tilak Kumar
Context
ThreatWeaver has two distinct authentication surfaces:
- Main application — the dashboard and scanner used by security teams
- KB site (this documentation site) — contains internal architecture, sales collateral, and product roadmap that should not be publicly accessible
Each surface has different requirements:
Main app:
- Customers expect SSO with their corporate identity provider
- Internal team uses local email/password during development
- Multi-tenant: each login must resolve to the correct tenant's schema
- Scan sensors (Docker agents) need non-interactive service token auth
KB site:
- Must not be publicly Google-indexed
- Sales team needs access without creating accounts
- Engineering needs access without a separate login system
- Must work if the main ThreatWeaver backend is down (fail-secure)
Decision
Main application: JWT (ES256) for humans + service tokens for scan sensors. Microsoft Entra ID SSO for enterprise customers.
KB site: Dual-layer auth — Microsoft SSO (MSAL.js) OR shared access code — with fail-secure lockdown.
Main Application Auth Architecture:
Human users:
POST /auth/login → validates email+password → issues JWT (ES256, 8h)
JWT stored in localStorage, sent as Bearer token on every API request
Enterprise SSO:
Microsoft Entra ID (SAML/OIDC) → SAML assertion → backend issues ThreatWeaver JWT
Role mapped from AD groups → ThreatWeaver RBAC roles
Scan sensors:
POST /auth/service-token → tenant admin issues long-lived service JWT
Sensor authenticates each scan with this token
Token scoped to: scan operations only (no admin endpoints)
KB Site Auth Design:
Priority order:
1. Microsoft SSO (MSAL.js) — if MSAL_CLIENT_ID + MSAL_TENANT_ID configured
2. Access code — if KB_ACCESS_CODE configured
3. LOCKDOWN — if neither configured, show "Access Denied" page
Session: 8h, localStorage-based
Brute-force protection: 5 failed attempts → 60s lockout
Local dev: auth skipped via isLocalDev() check
Fail-secure principle: if no auth is configured, the KB site does NOT default to open access. It shows a lockdown page. This prevents accidental exposure if env vars are missing in production.
Consequences
Positive:
- ES256 (asymmetric) JWT means the backend never needs to share a secret with the frontend or scan sensors
- SSO eliminates password management for enterprise customers
- KB access code is simple enough for sales team (no accounts, no passwords to remember)
- Fail-secure design means misconfiguration locks the site instead of exposing it
- Service tokens are scoped — a compromised sensor token cannot access admin APIs
Negative / Trade-offs:
- MSAL.js adds ~200KB to the KB site bundle (mitigated with dynamic import)
- JWT stored in localStorage is vulnerable to XSS — acceptable trade-off given the alternative (cookie complexity with CORS)
- ES256 key rotation requires reissuing all active tokens
- KB access code is a shared secret — if leaked, all sessions are invalidated by rotating it
- SAML/OIDC SSO setup requires IT involvement from enterprise customers (onboarding friction)
- 8h session = users must re-authenticate mid-workday if they started early
Alternatives Considered
| Option | Why Rejected |
|---|---|
| Session cookies instead of JWT | CORS complexity across Vercel + Render domains; SameSite cookie configuration is error-prone |
| HS256 JWT (symmetric) | Shared secret between all services = if one service is compromised, all tokens are forgeable |
| OAuth2 for everything | Over-engineered for scan sensors; adds complexity without benefit for machine-to-machine auth |
| Open KB site (no auth) | Sales collateral, pricing, and competitive analysis should not be public-facing |
| Single auth method for KB | Sales team doesn't have Microsoft accounts; access code covers them without IT involvement |
| Session-based auth for KB | Requires a database connection from the KB site — we wanted KB site to be fully static / independent |