Skip to main content
Version: Local Β· In Progress

Disaster Recovery Runbook

On-call engineer? Jump straight to the Quick-Reference Runbook Checklists.

This document defines disaster scenario categories, backup strategy, step-by-step recovery procedures, health check monitoring, RTO/RPO targets, and escalation contacts for ThreatWeaver. It is designed to be usable by an on-call engineer at 2am with no prior context.


Disaster Scenario Categories​

ThreatWeaver failures fall into five categories. Identify your scenario first, then follow the corresponding runbook.

#CategorySymptoms
1Backend service crashAPI returning 502/503, /health unreachable, Render shows service stopped
2Database corruption or accidental data deletion500 errors referencing DB, missing assets/findings, TypeORM sync errors on startup
3Failed deploymentNew deploy broke the app, build logs show errors, API regression after push
4Security incidentUnauthorized access detected, tokens leaked, suspicious SecurityAuditLog entries
5Accidental deletion of scan dataFindings missing, assessment results gone, user reports data loss

Backup Strategy​

PostgreSQL β€” Render Managed Backups​

ThreatWeaver's production database runs on Render PostgreSQL (Pro plan). Render automatically creates daily snapshots with a 7-day retention window.

What is backed up:

  • All tenant schemas (public schema + per-tenant schemas)
  • All application data: assets, vulnerabilities, findings, assessments, users, API keys, entitlements
  • Migration state table (migrations)

What is NOT backed up automatically:

  • In-memory state (none β€” ThreatWeaver is stateless between restarts)
  • Redis cache (none in local; in production Redis is ephemeral by design)
  • Local .env files (these live only on the engineer's machine)

Render backup retention by plan:

PlanRetentionPoint-in-time
FreeNoneNo
Starter1 dayNo
Pro7 daysNo
Pro Plus7 daysYes (PITR)

How to trigger a manual backup via Render dashboard​

  1. Log in to Render Dashboard
  2. Navigate to Databases in the left sidebar
  3. Select the ThreatWeaver PostgreSQL instance (dpg-d6vc8nnfte5s73dppuqg-a for UAT/dev)
  4. Click the Backups tab
  5. Click Create Backup (button in the top-right of the Backups panel)
  6. The backup will appear in the list within 1-2 minutes
  7. Note the backup timestamp β€” you will need it for restore operations
Before any risky operation

Always trigger a manual backup before running migrations, bulk deletes, schema changes, or any potentially destructive admin operation.

Export Data from ThreatWeaver (Admin UI)​

For targeted exports without a full DB restore:

  1. Log in as an admin (admin@company.com locally, testingadmin@blucypher.com on UAT)
  2. Navigate to Admin β†’ Archives β†’ Export
  3. Select the data type: Findings, Assets, Assessments, or full tenant export
  4. Choose date range if applicable
  5. Click Export β€” a CSV/JSON file will download to your browser
  6. Store the export in a secure location (not in the repository)
Exports contain sensitive data

Exported files may contain vulnerability details, IP addresses, and credentials used in test scans. Handle with the same care as production credentials. Never commit exports to Git.

Git-Based Source Recovery​

The source code is always recoverable from GitHub. Any deployment can be rebuilt from scratch:

git clone git@github.com:BluCypher1/ThreatWeaver.git
cd ThreatWeaver
git checkout dev # or main for the last stable release
  • Frontend: Rebuild and redeploy to Vercel
  • Backend: Rebuild and redeploy to Render
  • Database schema: Recreate by running npm run migrate:production against a fresh PostgreSQL instance

Recovery Procedures​

Scenario 1: Backend Service Crash (Render)​

Indicators: API returns 502/503, /health endpoint unreachable, users report the app is down.

Expected recovery time: 2 minutes (auto-restart) to 15 minutes (manual intervention)

Step 1 β€” Assess the crash reason​

  1. Open Render Dashboard
  2. Navigate to Services β†’ ThreatWeaver Backend (kina-vulnerability-management-uq1t)
  3. Check the Status badge:
    • Restarting β€” Render is already recovering; wait 60-90 seconds
    • Failed β€” requires manual action
  4. Click Logs tab and scroll to the crash point
  5. Identify the crash reason:
Log patternCauseAction
JavaScript heap out of memoryOOM killIncrease Render instance size or fix memory leak
ECONNREFUSED to DBDatabase unreachableCheck DB service status separately
Cannot find moduleBad deployRollback to previous build (Step 3)
TypeORM EntityMetadata errorEntity definition mismatchCheck recent entity file changes
Port already in useZombie processRender will auto-kill and restart

Step 2 β€” Monitor the auto-restart​

Render automatically restarts crashed services. Monitor recovery:

# Poll the health endpoint every 10 seconds
watch -n 10 curl -s https://kina-vulnerability-management-uq1t.onrender.com/health

Expected healthy response:

{"status":"healthy","database":"connected","timestamp":"2026-04-05T..."}

If the service restarts cleanly within 2-3 minutes, the incident is resolved. Log it in docs/incidents/.

Step 3 β€” If persistent: redeploy last known good build​

  1. In Render Dashboard β†’ Services β†’ ThreatWeaver Backend
  2. Click Deploys tab
  3. Find the last deploy with status Live before the incident started
  4. Click Redeploy on that commit
  5. Monitor build logs β€” the build typically takes 3-5 minutes
  6. After deploy: verify /health returns {"status":"healthy","database":"connected"}

Step 4 β€” If OOM: check for runaway scan​

Scans with large target APIs can spike memory. If a scan is running:

  1. Open ThreatWeaver Admin β†’ AppSec β†’ Active Assessments
  2. Cancel any running assessment
  3. Wait for memory to normalize (Render metrics tab)
  4. Restart the service manually if needed: Render β†’ Service β†’ Restart

Step 5 β€” Verify full recovery​

# Backend health
curl -s https://kina-vulnerability-management-uq1t.onrender.com/health

# Auth service health
curl -s https://kina-vulnerability-management-uq1t.onrender.com/api/auth/health

# Full API health (requires auth token)
curl -s -H "Authorization: Bearer <token>" \
https://kina-vulnerability-management-uq1t.onrender.com/api/health

All three should return 200 with healthy status before declaring recovery complete.


Scenario 2: Database Corruption or Accidental Data Deletion​

Indicators: 500 errors with DB references in logs, missing records, TypeORM errors on startup, users report data gone.

Expected recovery time: 30-60 minutes
Expected data loss: Up to 24 hours (last daily snapshot) unless PITR is enabled

Stop all writes immediately

Before restoring, stop the backend service to prevent further writes to a corrupt state. Writes during restore can create inconsistency.

Step 1 β€” Stop the backend service​

  1. Render Dashboard β†’ Services β†’ ThreatWeaver Backend
  2. Click Suspend (or use the Render API: mcp__render__update_web_service)
  3. Confirm the service shows Suspended status

Step 2 β€” Identify the last known good backup​

  1. Render Dashboard β†’ Databases β†’ ThreatWeaver PostgreSQL
  2. Click Backups tab
  3. Review the backup list with timestamps
  4. Identify the backup created before the incident:
    • If the incident just happened, use today's backup
    • If data was deleted hours ago, use yesterday's backup
  5. Note the backup ID and timestamp

Step 3 β€” Restore from backup​

  1. In the Backups tab, find the target backup
  2. Click Restore next to the backup
  3. Render will prompt: "This will overwrite the current database. Confirm?"
  4. Type the confirmation string and click Restore
  5. The restore typically takes 5-20 minutes depending on database size
  6. Render will show a progress indicator
All data since backup is lost

Any data written between the backup timestamp and the restore will be lost permanently. Communicate this window to affected tenants.

Step 4 β€” Restart the backend​

  1. Render Dashboard β†’ Services β†’ ThreatWeaver Backend
  2. Click Resume (or Deploy if the service was deleted)
  3. On startup, the backend runs npm run migrate:production automatically
  4. Migrations are forward-only and idempotent β€” they will detect the restored state and apply only missing migrations
  5. Monitor startup logs for any migration errors

Step 5 β€” Verify data integrity​

After the backend is running, verify key data is present:

# Check asset count per tenant
curl -s -H "Authorization: Bearer <admin-token>" \
https://kina-vulnerability-management-uq1t.onrender.com/api/admin/stats

# Check vulnerability count
curl -s -H "Authorization: Bearer <admin-token>" \
https://kina-vulnerability-management-uq1t.onrender.com/api/vulnerabilities?limit=1

# Check user list
curl -s -H "Authorization: Bearer <admin-token>" \
https://kina-vulnerability-management-uq1t.onrender.com/api/admin/users

Verify against your pre-incident knowledge of record counts. If counts are wrong, you may need an older backup.

Step 6 β€” Inform affected tenants​

Send an incident notification to affected tenant admins. Use the template in Communication Templates.


Scenario 3: Failed Deployment​

Indicators: App was working before a push, now returns errors. Build succeeded but runtime fails, or build itself failed.

Expected recovery time: 10-20 minutes

Step 1 β€” Confirm this is a deployment regression​

  1. Note the exact time the push happened (check git log or Render deploy history)
  2. Confirm the issue started at or after that time
  3. Check if the issue affects all users or only specific features

Step 2 β€” Rollback to last successful deploy​

  1. Render Dashboard β†’ Services β†’ ThreatWeaver Backend β†’ Deploys tab
  2. Find the deploy immediately before the broken one (status: Live)
  3. Click Redeploy on that commit
  4. Monitor build logs carefully

While waiting for rollback, check what changed:

# What changed in the broken commit?
git log --oneline -5
git show HEAD --stat
git diff HEAD~1 HEAD -- backend/src/

Step 3 β€” Diagnose the build failure (if build failed)​

Common build failures and fixes:

TypeScript compile error:

# Reproduce locally
cd backend && rm -rf dist && npm run build 2>&1
# Fix the TS error, commit, push

npm install failure (missing package):

# Check if package.json has the dependency
# Check if node_modules is being cached incorrectly
# Clear cache in Render: Service β†’ Environment β†’ Clear Build Cache

Missing environment variable:

# Render Dashboard β†’ Service β†’ Environment
# Compare against .env.example β€” add any missing vars

Migration fails on startup:

# Check migration logs in Render
# If a migration is stuck, you may need to manually mark it as run
# DB Shell (local): docker-compose exec -T postgres psql -U tenable -d tenable_dashboard

Step 4 β€” Hotfix workflow​

If a rollback is not sufficient and you need to fix the broken code:

# Create a hotfix on local branch
git checkout local
git pull origin local

# Make the minimal fix
# ... edit files ...

# Test the build locally FIRST
cd backend && rm -rf dist && npm run build
npm start # verify it runs

# Commit with a clear message
git commit -m "fix(deploy): resolve TypeScript error in routes/scan.ts [hotfix]"

# Push to local for review
git push origin local

# Only after explicit confirmation: push to dev
# git push origin local:dev

Step 5 β€” Verify full recovery​

After rollback or hotfix deploy:

curl -s https://kina-vulnerability-management-uq1t.onrender.com/health
# Expected: {"status":"healthy","database":"connected"}

Test the specific feature that was broken to confirm it is working.


Scenario 4: Security Incident (Compromised Credentials)​

Indicators: Unauthorized access in SecurityAuditLog, leaked JWT_SECRET, compromised Tenable/Anthropic API keys, suspicious logins.

Expected recovery time: 30-60 minutes for initial containment

Act immediately

Every minute of delay increases the blast radius. Follow these steps in order β€” do not skip or reorder.

Step 1 β€” Revoke all active JWT sessions​

  1. Log in to ThreatWeaver Admin (if admin credentials are still valid)
  2. Navigate to Admin β†’ Security β†’ Revoke All Sessions
  3. Click Revoke All β€” this invalidates all existing JWT tokens immediately
  4. All users will be logged out and must re-authenticate

If you cannot log in (credentials compromised):

# Direct DB intervention β€” invalidate all sessions by rotating the JWT secret
# Do this immediately in Render environment variables (Step 2)

Step 2 β€” Rotate JWT_SECRET​

  1. Render Dashboard β†’ Services β†’ ThreatWeaver Backend β†’ Environment
  2. Find JWT_SECRET
  3. Generate a new strong secret:
    node -e "console.log(require('crypto').randomBytes(64).toString('hex'))"
  4. Update the value in Render β†’ Save Changes
  5. Render will automatically redeploy with the new secret
  6. All existing tokens signed with the old secret are now invalid

Step 3 β€” Force password reset for all users​

  1. After the new deploy is live, log in with admin credentials
  2. Admin β†’ Users β†’ Force Password Reset (All)
  3. Users will receive reset emails (if email is configured) or be prompted on next login

Step 4 β€” Check SecurityAuditLog for breach scope​

# Via DB shell (local) or Render PostgreSQL query
docker-compose exec -T postgres psql -U tenable -d tenable_dashboard -c \
"SELECT * FROM security_audit_log WHERE created_at > NOW() - INTERVAL '48 hours' ORDER BY created_at DESC LIMIT 100;"

Look for:

  • Logins from unexpected IP addresses
  • Access to admin endpoints by non-admin users
  • Bulk data exports
  • Unusual API key usage

Step 5 β€” Rotate all external secrets​

Rotate these in order (highest impact first):

SecretWhere to rotateImpact of not rotating
JWT_SECRETRender env vars (done in Step 2)Token forgery
TENABLE_API_KEYTenable.io dashboardUnauthorized Tenable API access
ANTHROPIC_API_KEYconsole.anthropic.comAI cost abuse
DATABASE_URLRender DB β†’ Reset credentialsDB access
SESSION_SECRETRender env varsSession hijacking

After rotating each secret, update Render environment variables and trigger a redeploy.

Step 6 β€” Review IP access patterns​

Check Render access logs for unusual patterns:

  1. Render Dashboard β†’ Service β†’ Logs β†’ filter by [ERROR] and unusual HTTP methods
  2. Look for: mass enumeration (sequential IDs), admin endpoint access outside business hours, requests from unexpected geos

If patterns indicate active scanning or exfiltration:

  1. Enable IP allowlisting in Render (Service β†’ Settings β†’ IP Allowlist)
  2. Add only known office/VPN IP ranges
  3. Block all other traffic temporarily

Step 7 β€” Post-incident documentation​

  1. Create an incident report in docs/incidents/INCIDENT-<DATE>-<NAME>.md
  2. Document: timeline, root cause, blast radius, remediation steps taken
  3. Update docs/audits/ISSUE_TRACKER.md with a security finding entry
  4. Notify all tenants of the incident (see Communication Templates)

Scenario 5: Accidental Deletion of Scan Data​

Indicators: Assessment results gone, findings missing, user reports losing work.

Expected recovery time: 5 minutes (from archive) to 60 minutes (from DB backup)

Step 1 β€” Check the Archives first​

ThreatWeaver archives scan data before deletion. Check here before doing a DB restore:

  1. Log in as admin
  2. Navigate to Admin β†’ Archives
  3. Search for the assessment by name, date, or target URL
  4. If found: click Restore to bring the data back
  5. Verify the restored data is complete

This is the fastest path β€” always check archives before escalating to DB restore.

Step 2 β€” If not in archives: restore from DB backup​

Follow Scenario 2 steps.

Key consideration for scan data restore: Scan findings are tenant-scoped. If restoring a specific tenant's data, consider whether a full DB restore is warranted or if the data can be re-generated.

Step 3 β€” Re-run the assessment (last resort)​

For AppSec scanner assessments, scans are fully reproducible:

  1. Note the original assessment configuration (target URL, scan type, auth profile)
  2. Create a new assessment with identical settings
  3. Re-run the scan β€” results will be regenerated
  4. The new findings will differ slightly from the originals (different timestamps, possibly slightly different findings depending on target state)

Health Check Endpoints​

Monitor these endpoints to detect issues proactively:

Endpoint Reference​

EndpointWhat it checksExpected response
GET /healthService alive (no auth required){"status":"healthy"}
GET /api/healthDB connected{"status":"healthy","database":"connected"}
GET /api/auth/healthAuth service alive{"status":"healthy","auth":"operational"}

Set up uptime monitoring (e.g., UptimeRobot, BetterStack, or Render's built-in health checks) to poll GET /health every 60 seconds.

Alert thresholds:

  • Warning: Response time > 2000ms
  • Critical: Status code != 200 for 2 consecutive checks
  • Down: Status code != 200 for 5 consecutive checks

Local Health Verification​

# Quick health check (paste this into your terminal)
echo "=== Service Health ===" && \
curl -s http://localhost:4005/health | python3 -m json.tool && \
echo "=== DB Health ===" && \
curl -s http://localhost:4005/api/health | python3 -m json.tool && \
echo "=== Auth Health ===" && \
curl -s http://localhost:4005/api/auth/health | python3 -m json.tool

Production Health Verification​

BASE_URL="https://kina-vulnerability-management-uq1t.onrender.com"

echo "=== Service Health ===" && \
curl -s "$BASE_URL/health" | python3 -m json.tool && \
echo "=== DB Health ===" && \
curl -s "$BASE_URL/api/health" | python3 -m json.tool && \
echo "=== Auth Health ===" && \
curl -s "$BASE_URL/api/auth/health" | python3 -m json.tool

Recovery Time and Point Objectives​

ScenarioRTORPONotes
Backend service crash (transient)~2 min0 (stateless)Render auto-restart
Backend service crash (persistent)~15 min0 (stateless)Manual redeploy required
Database restore (daily backup)~30-60 min24 hoursLast daily snapshot
Database restore (PITR, Pro Plus)~30-60 minMinutesPoint-in-time recovery
Deployment rollback~10 min0 (code in Git)Render redeploy
Security incident containment~30 minN/ADepends on breach scope
Scan data from archive~5 min0If archived
Scan data re-run~30-120 minN/AReproducible

Quick-Reference Runbook Checklists​

Copy these into a text editor or incident management tool at 2am.

Runbook A: Backend Service Down​

[ ] 1. Check Render Dashboard β†’ Service status
[ ] 2. Read crash logs β†’ identify OOM / runtime error / bad deploy
[ ] 3. Wait 2 min for auto-restart
[ ] 4. If still down: Render β†’ Deploys β†’ Redeploy last Live commit
[ ] 5. Monitor build logs (~3-5 min build time)
[ ] 6. Verify: curl /health returns 200
[ ] 7. Verify: curl /api/health returns database:connected
[ ] 8. Notify users if downtime > 5 min
[ ] 9. Document in docs/incidents/

Runbook B: Database Restore​

[ ] 1. Stop backend: Render β†’ Service β†’ Suspend
[ ] 2. Trigger manual DB backup NOW (documents current state even if corrupt)
[ ] 3. Identify target backup (last known good, from Render β†’ DB β†’ Backups)
[ ] 4. Click Restore on target backup
[ ] 5. Wait 5-20 min for restore
[ ] 6. Resume backend: Render β†’ Service β†’ Resume
[ ] 7. Monitor startup logs for migration errors
[ ] 8. Verify: curl /api/health returns database:connected
[ ] 9. Check asset count / vuln count / user list via API
[ ] 10. Notify affected tenants of data loss window
[ ] 11. Document in docs/incidents/

Runbook C: Deployment Rollback​

[ ] 1. Confirm issue started after recent push (check git log + Render deploy time)
[ ] 2. Render β†’ Service β†’ Deploys β†’ find last Live deploy before incident
[ ] 3. Click Redeploy
[ ] 4. Wait for build (~3-5 min)
[ ] 5. Verify: curl /health returns 200
[ ] 6. Test the specific feature that was broken
[ ] 7. Investigate root cause in parallel
[ ] 8. Create hotfix on local branch, test locally, get confirmation before pushing to dev

Runbook D: Security Incident Containment​

[ ] 1. Admin β†’ Security β†’ Revoke All Sessions (immediate)
[ ] 2. Generate new JWT_SECRET: node -e "require('crypto').randomBytes(64).toString('hex')"
[ ] 3. Render β†’ Service β†’ Environment β†’ Update JWT_SECRET β†’ Save (triggers redeploy)
[ ] 4. After redeploy: Admin β†’ Users β†’ Force Password Reset (All)
[ ] 5. Query SecurityAuditLog for breach scope (last 48 hours)
[ ] 6. Rotate TENABLE_API_KEY in Tenable.io dashboard
[ ] 7. Rotate ANTHROPIC_API_KEY in console.anthropic.com
[ ] 8. If active attack: enable IP allowlisting in Render β†’ Service β†’ Settings
[ ] 9. Notify all tenant admins
[ ] 10. Document full timeline in docs/incidents/INCIDENT-<DATE>.md
[ ] 11. Update ISSUE_TRACKER.md

Runbook E: Scan Data Missing​

[ ] 1. Admin β†’ Archives β†’ Search for assessment by name/date
[ ] 2. If found: click Restore β†’ verify data is complete
[ ] 3. If not found: determine if DB restore is warranted vs re-scan
[ ] 4. If re-scan: create new assessment with identical settings, re-run
[ ] 5. If DB restore needed: follow Runbook B
[ ] 6. Notify user who reported the loss

Environment-Specific Notes​

Local Development​

  • No managed backups β€” your Docker volume is the only copy
  • Back up local DB manually before destructive operations:
    docker-compose exec -T postgres pg_dump -U tenable tenable_dashboard > backup-$(date +%Y%m%d).sql
  • Restore local DB:
    docker-compose exec -T postgres psql -U tenable tenable_dashboard < backup-20260405.sql
  • Local service crash: just restart with npm run dev

UAT / Dev Environment (dev.threatweaver.ai)​

  • Backend: threatweaver-backend.onrender.com (Singapore, Pro Plus)
  • DB: dpg-d6vc8nnfte5s73dppuqg-a
  • Credentials: testingadmin@blucypher.com / TestAdmin@Blu2026!
  • Same runbooks apply β€” treat UAT data loss as lower severity than production

Production (kinavulnerabilitymanagement.vercel.app)​

  • Backend: kina-vulnerability-management-uq1t.onrender.com
  • All incidents require tenant notification within 1 hour of discovery
  • Any DB restore requires written approval from the project owner

Escalation Contacts​

RoleResponsibilityWhen to escalate
On-call engineerInitial triage, Scenarios 1-3Immediately on alert
Project owner (Tilak)Scenarios 4-5, DB restore approvalSecurity incidents, data loss > 1 hour
Render SupportDB restore failures, platform issuesWhen Render UI/API is unresponsive
Supabase SupportProduction DB issues (if on Supabase)DB connection failures not resolved in 15 min

Render Support: https://render.com/support
Render Status Page: https://status.render.com (bookmark this)
GitHub Repo: https://github.com/BluCypher1/ThreatWeaver


Communication Templates​

Template 1: Service Disruption Notification​

Subject: ThreatWeaver Service Disruption β€” [DATE] [TIME UTC]

Team,

We are currently experiencing a service disruption affecting ThreatWeaver.

Impact: [API unavailable / Slow response times / Data access issues]
Started: [TIME UTC]
Affected services: [Backend API / Dashboard / Scanner]
Cause: [Known / Under investigation]

Current status: [Investigating / Restoring / Monitoring recovery]

Next update: [TIME UTC]

We apologize for the inconvenience. Our team is working to resolve this as quickly as possible.

β€” ThreatWeaver Operations

Template 2: Data Loss Notification​

Subject: ThreatWeaver β€” Data Restoration Notice for Your Tenant

[Tenant Admin Name],

We are writing to inform you of a data restoration that affected your ThreatWeaver tenant.

What happened: [Brief description]
Data affected: [Findings / Assets / Assessments / User data]
Time window of data loss: [FROM timestamp] to [TO timestamp]
Data restored to: [Backup timestamp]

Action required: [None β€” your data has been restored / Please re-run assessments created after DATE]

We are sorry for any disruption this caused. Please contact us if you have any questions or notice any data discrepancies.

β€” ThreatWeaver Operations

Template 3: Security Incident Notification​

Subject: IMPORTANT β€” ThreatWeaver Security Notice β€” Action Required

[Tenant Admin Name],

We are writing to notify you of a security incident that may have affected your ThreatWeaver account.

What happened: [Brief, factual description β€” do not speculate]
When: [DATE/TIME UTC]
What we did: Revoked all active sessions, rotated credentials, forced password resets
What you need to do:
1. Re-authenticate to ThreatWeaver at [URL]
2. Set a new password when prompted
3. Review your SecurityAuditLog for any unauthorized activity
4. Rotate any API keys your team stored in ThreatWeaver

If you see any suspicious activity in your audit log, please reply to this email immediately.

We take security very seriously and are conducting a full post-incident review. We will share a detailed report within 72 hours.

β€” ThreatWeaver Security Team

Post-Incident Review Checklist​

After every incident, regardless of severity:

[ ] Write incident report: docs/incidents/INCIDENT-<YYYY-MM-DD>-<slug>.md
[ ] Timeline documented: detection time, response time, resolution time
[ ] Root cause identified and documented
[ ] Blast radius determined: which tenants/data were affected
[ ] Remediation steps documented
[ ] Prevention: what change prevents recurrence?
[ ] Update ISSUE_TRACKER.md if a code fix is needed
[ ] Update this runbook if the steps were unclear or incomplete
[ ] Share report with project owner within 24 hours of resolution