Performance Tuning
This guide covers performance tuning for ThreatWeaver deployments experiencing slow dashboards, sync timeouts, scan lag, or high database load. It addresses backend configuration, frontend caching, AppSec scanner throughput, database maintenance, and capacity planning.
When to Start Tuningβ
Start investigating when you observe any of the following:
| Symptom | Likely area |
|---|---|
| Dashboard KPIs take more than 3β5 seconds to load | Database query performance or missing indexes |
| Tenable sync jobs time out or fail with 504 | Sync engine configuration or DB connection pool exhaustion |
| AppSec scans take much longer than expected | Scanner concurrency or AI validation overhead |
| High memory usage (>1.5 GB for the Node.js process) | Large vulnerability dataset, missing pagination |
| Database CPU spikes during dashboard load | Aggregation queries, missing indexes on filter columns |
| Redis connection errors or cache misses | Redis connectivity or TTL misconfiguration |
| Rate limit errors (429) from dashboard polling | Global rate limiter needs tuning for high-traffic deployments |
Before tuning, baseline the problem:
- Check backend logs for slow query warnings
- Check PostgreSQL
pg_stat_activityfor long-running queries - Check Node.js memory via
process.memoryUsage()(exposed at/health) - Check Redis connectivity via
redis-cli ping
Backend Performanceβ
Database Connection Poolβ
ThreatWeaver uses a bounded connection pool managed by the pg driver under TypeORM. The default is conservative (10 connections) to avoid overwhelming Supabase's Transaction Pooler.
Configuration (environment variables):
| Variable | Default | Description |
|---|---|---|
DB_POOL_MAX | 10 | Maximum concurrent database connections per backend instance |
DB_SSL_REJECT_UNAUTHORIZED | true | Set to false only if using a self-signed certificate in a trusted internal network |
Guidelines:
- For self-hosted PostgreSQL (not Supabase), you can raise
DB_POOL_MAXto 25β50 depending on PostgreSQL'smax_connectionssetting - For Supabase Transaction Pooler (PgBouncer), keep
DB_POOL_MAXat 10β15 per backend instance. The pooler has its own connection limit (typically 20β60 on Supabase Pro) - If deploying multiple backend instances (horizontal scaling), each instance has its own pool. With 3 instances at
DB_POOL_MAX=10, the database receives up to 30 concurrent connections total - Connection timeout is set to 5 seconds (
connectionTimeoutMillis: 5000) β appropriate for cloud database handshakes. If you see frequent timeout errors, increase to 10000
Diagnosing pool exhaustion:
Pool exhaustion manifests as requests hanging for several seconds before completing or timing out with a "connection acquisition timeout" error. Check:
-- Check active connections
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
-- Check long-running queries (running > 10 seconds)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '10 seconds';
Query Optimization and Index Recommendationsβ
Dashboard KPI aggregation queries run across large tables (vulnerabilities, assets). These queries are filter-heavy β adding indexes on commonly filtered columns dramatically reduces query time.
Recommended indexes (run in PostgreSQL on your tenant schema):
-- Vulnerabilities: most common filter patterns
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_severity ON vulnerabilities(severity);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_state ON vulnerabilities(state);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_plugin_id ON vulnerabilities(plugin_id);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_asset_uuid ON vulnerabilities(asset_uuid);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_first_found ON vulnerabilities(first_found);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_last_found ON vulnerabilities(last_found);
-- Assets: common filter columns
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_fqdn ON assets(fqdn);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_ipv4 ON assets(ipv4 text_ops);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_last_seen ON assets(last_seen);
-- Composite index for dashboard severity breakdown by date range
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_severity_last_found
ON vulnerabilities(severity, last_found DESC);
CREATE INDEX CONCURRENTLY builds the index without locking the table. Avoid CREATE INDEX without CONCURRENTLY on tables with active reads/writes.
Aggregation service (aggregation.service.ts) is the core of all KPI calculations β it is ~2936 lines and runs complex multi-table queries. If you observe slow dashboard loads:
- Enable
SLOW_QUERY_LOG=trueenvironment variable to log queries exceeding 1 second - Use
EXPLAIN ANALYZEon the slow query in PostgreSQL to identify sequential scans - Add the appropriate index before retrying
Caching with Redisβ
ThreatWeaver includes an optional Redis caching layer. When Redis is unavailable, the application falls back to direct database queries β Redis is not required for operation.
When Redis helps most:
- Dashboard KPI widgets that many users load simultaneously
- Tenable sync status checks that run on every dashboard page load
- Tenant configuration lookups (permissions, feature flags)
Configuration:
| Variable | Default | Description |
|---|---|---|
REDIS_URL | redis://localhost:6379 | Redis connection URL |
REDIS_ENABLED | true | Set to false to disable Redis entirely |
CACHE_TTL | 3600 (1 hour) | Default cache TTL in seconds |
API_CACHE_TTL | 21600 (6 hours) | TTL for API response cache (Tenable data, etc.) |
TENANT_CACHE_TTL_SECONDS | 300 (5 minutes) | TTL for tenant configuration cache |
TTL recommendations by use case:
| Data | Recommended TTL | Why |
|---|---|---|
| Dashboard KPI totals | 5β15 minutes | Data changes during Tenable syncs, not continuously |
| Tenable sync status | 30β60 seconds | Users poll this frequently |
| Tenant config / feature flags | 5 minutes (300s) | Changes are rare but should propagate within 5 minutes |
| Asset inventory | 15β30 minutes | Updated only during sync windows |
| Vulnerability trends | 1 hour | Historical data β changes infrequently |
Disabling Redis locally: Redis is not used in local development by default. The cache layer degrades gracefully β if Redis is unreachable, all cache reads return null and all writes are silently no-ops.
Sync Engine Tuningβ
The Tenable sync engine pulls vulnerability and asset data from Tenable.io (or Tenable.sc). Sync frequency and behavior are configurable.
Sync frequency trade-off:
| Sync interval | Freshness | DB load |
|---|---|---|
| Every 15 minutes | Near-realtime | High β heavy queries on every sync |
| Every 1 hour | Good for most use cases | Moderate |
| Every 6 hours | Stale for fast-moving environments | Low |
| Manual only | User-controlled | Minimal |
Sync frequency is configured in Admin β Settings β Sync Configuration. For environments with large asset counts (>50,000 assets), consider:
- Scheduling syncs during off-peak hours (evenings/weekends)
- Using incremental sync mode (syncs only assets/vulns changed since last sync) rather than full sync
- Increasing
DB_POOL_MAXbefore large syncs if connection timeout errors appear in sync logs
Monitoring sync performance: Sync job duration and record counts are logged. Access them via Admin β Sync β Sync History. If a sync job consistently takes longer than 10 minutes, consider splitting by asset group or reducing sync frequency.
Rate Limiting Configurationβ
The default global rate limit (100 requests/minute per IP) is appropriate for moderate-traffic deployments. High-traffic deployments with many users polling dashboards simultaneously should raise this limit.
Configuration:
| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_WINDOW_MS | 60000 (1 minute) | Rate limit window in milliseconds |
RATE_LIMIT_MAX_REQUESTS | 100 | Maximum requests per IP per window |
Guidelines for high-traffic deployments:
- Each dashboard page loads approximately 10β15 API requests
- A team of 20 simultaneous users could generate 200β300 requests per minute
- For 20+ concurrent users, raise
RATE_LIMIT_MAX_REQUESTSto 300β500
Note: The global rate limiter already exempts dashboard routes (/api/dashboard/*), AppSec routes, and health check endpoints from counting against the limit. Auth endpoints have their own separate stricter limiters (10 per 15 minutes) that should not be relaxed.
Node.js Heap Settings for Large Datasetsβ
Node.js defaults to a heap size of ~1.5 GB (on 64-bit systems). Large vulnerability datasets loaded into memory during aggregation can hit this limit.
Symptoms of heap pressure:
- Process memory continuously climbing
- Occasional
FATAL ERROR: Reached heap limit Allocation failedcrashes - Garbage collection pauses causing request latency spikes
Fix β increase heap via NODE_OPTIONS environment variable:
# For deployments with >100,000 vulnerabilities
NODE_OPTIONS="--max-old-space-size=4096" node dist/index.js
# For very large deployments (>500,000 vulnerabilities)
NODE_OPTIONS="--max-old-space-size=8192" node dist/index.js
On Render, set NODE_OPTIONS as an environment variable in the service settings.
A Node.js process should not need more than 4β6 GB for the application itself. If it does, investigate whether large data arrays are being held in memory. The aggregation service streams data in batches rather than loading entire tables β if you see large heap usage, check for queries that return unbounded result sets.
Frontend Performanceβ
React Query Cache Configurationβ
ThreatWeaver's frontend uses TanStack Query (React Query) for server-state management. The global default configuration is:
// frontend/src/main.tsx
const queryClient = new QueryClient({
defaultOptions: {
queries: {
staleTime: 5 * 60 * 1000, // 5 minutes
},
},
})
The staleTime of 5 minutes means cached data is used without a refetch for up to 5 minutes after it was last fetched. This is the most impactful setting for perceived frontend performance.
Per-hook overrides: Some hooks override the global default for data that changes more frequently:
- AI Security data: 30-second
staleTime(changes frequently during active sessions) - AppSec assessment data: 5-minute
staleTime(scans are long-running, no benefit to frequent refetch)
Tuning recommendations:
| Scenario | Recommendation |
|---|---|
| Users complain dashboard data is stale | Reduce global staleTime to 2β3 minutes |
| High API traffic from dashboard polling | Increase global staleTime to 10β15 minutes |
| Real-time scan monitoring needed | Use refetchInterval: 10000 (10s) on scan-specific queries |
To change the global default, edit /frontend/src/main.tsx and redeploy.
Pagination and Filteringβ
All large data lists in ThreatWeaver (assets, vulnerabilities, findings) are server-side paginated. The UI never loads the full dataset into memory.
Performance guidelines:
- Always apply filters before exporting β exporting a filtered set is significantly faster than exporting all records and filtering locally
- The default page size is 50 records. Increasing page size (up to 200) reduces the number of API calls but increases per-request data transfer
- The
saved-filtersfeature allows users to bookmark commonly used filter combinations β encourage teams to use this rather than manually resetting filters on every visit
If pagination appears slow, check:
- Whether a sort column has an index (sorting on unindexed columns requires a full table scan)
- Whether the filter combination is selective β broad filters (
severity IS NOT NULL) return near-full-table scans
Large Dataset Exportsβ
Exports of large datasets (>10,000 records) should use the async export endpoint (/api/export), not the inline export buttons in the UI.
The async export flow:
- User submits an export job via the Export panel or API
- The backend processes the export asynchronously (does not hold an HTTP connection)
- When complete, a download link is delivered via the notification system
- The download is available for 24 hours
Attempting to export very large datasets synchronously (via the direct export buttons) can result in HTTP timeouts if the export takes longer than 60β120 seconds (the timeout of the CDN/proxy in front of the backend).
AppSec Scanner Performanceβ
Scan Concurrencyβ
The AppSec scanner runs multiple agent jobs in parallel per assessment. Two concurrency settings control throughput:
Assessment-level concurrency (assessmentQueue.service.ts):
- Default: 3 parallel agent jobs per assessment
- Controls how many scanner agents (e.g.,
sqliTester,xssTester,idorFinder) run simultaneously - Raising this value speeds up scans but increases load on both the target application and the ThreatWeaver backend
Crawler concurrency (crawlerEngine.service.ts):
- Default: 5 parallel requests during endpoint discovery/crawling
- Controls how many HTTP requests the crawler makes in parallel to the target
These defaults are intentionally conservative to avoid overwhelming scan targets. They can be adjusted per-assessment in the assessment configuration (Advanced Settings).
Guidelines:
| Target environment | Recommended agent concurrency | Recommended crawler concurrency |
|---|---|---|
| Production API (rate-limited) | 1β2 | 2β3 |
| Staging/test environment (no rate limits) | 3β5 | 5β10 |
| Local development target | 5β10 | 10β15 |
Limiting Scan Scopeβ
Scan duration is directly proportional to the number of endpoints tested. The most effective performance optimization is reducing scope:
- Specify
priorityEndpointsin the assessment'stestDataHintsto focus agents on high-value endpoints (authenticated, parameterized, business-critical) - Exclude static assets β images, CSS, JavaScript files do not need to be tested for injection vulnerabilities
- Use endpoint filtering β if the target has hundreds of endpoints, filter by path prefix (e.g.,
/api/v1/for API-only testing) - Disable agents not relevant to the target β for example, if the target has no file upload endpoints, disabling the file upload agent saves time
A well-scoped scan of a REST API with 30β50 endpoints typically completes in 10β20 minutes. An unscoped scan of a full web application with 300+ endpoints can take several hours.
AI Validation Overheadβ
ThreatWeaver's AppSec scanner uses Claude AI (via Anthropic API) to validate potential findings before surfacing them as confirmed vulnerabilities. This validation step:
- Reduces false positives significantly (typically from ~40% FP rate to under 15%)
- Adds latency β each AI validation call takes 2β8 seconds depending on the finding complexity
When to reduce AI validation:
- Speed-priority scans: Disable AI validation in the assessment Advanced Settings for a first-pass scan, then enable for the final verification scan
- High-confidence finding types: SQL injection with extractable data, blind XXE with OOB callbacks β these rarely need AI re-confirmation
- Large scan volumes: If running 10+ scans per day, AI API costs and latency compound. Consider AI validation only for
HIGHandCRITICALseverity findings
When to keep AI validation:
- Final scan before a report or customer delivery
- IDOR and BOLA testing β these require context-aware judgment that rule-based checks cannot provide reliably
- Any scan being used for compliance evidence
To disable AI validation per-assessment, navigate to the assessment configuration and uncheck Enable AI Validation in Advanced Settings.
Database Maintenanceβ
Archiving Old Scan Dataβ
AppSec scan data (assessments, findings, raw HTTP logs) accumulates over time and is the primary driver of database growth. ThreatWeaver provides an archiving tool in Admin β Archives.
Recommended archive schedule:
- Archive assessments older than 90 days (completed scans)
- Retain the last 3 assessments per target unconditionally (for trend comparison)
- Archive raw HTTP request/response logs for assessments older than 30 days (these are large)
Archives are stored in a compressed format and can be exported or permanently deleted. They do not count against the active scan data for dashboard calculations.
Vacuuming PostgreSQL Tablesβ
PostgreSQL requires periodic VACUUM to reclaim space from deleted/updated rows and update query planner statistics. Most cloud-hosted PostgreSQL providers (Supabase, RDS, etc.) handle this automatically via autovacuum. For self-hosted PostgreSQL, verify autovacuum is enabled:
-- Check autovacuum status
SHOW autovacuum;
-- Check tables that need manual vacuuming (dead tuples > 10% of live)
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;
-- Manually vacuum a high-churn table (does not lock table)
VACUUM ANALYZE tenant_acme.vulnerabilities;
-- Full vacuum + reindex (locks table β run during maintenance window)
VACUUM FULL ANALYZE tenant_acme.vulnerabilities;
High-churn tables in ThreatWeaver deployments with frequent syncs:
vulnerabilitiesβ updated on every syncassetsβ updated on every syncpentest_findingsβ updated throughout scan lifecyclesecurity_audit_logsβ appended on every security event
For most deployments, autovacuum handles this automatically. Manual intervention is only needed if you see database size growing unexpectedly or query performance degrading after a large sync.
Snapshot Integrity Checksβ
ThreatWeaver periodically takes internal snapshots of dashboard state for trend calculations. If you observe incorrect trend data (e.g., "new vulnerabilities this week" showing 0 when new vulnerabilities exist):
- Navigate to Admin β Sync β Reconciliation
- Run Force Reconciliation β this triggers a fresh snapshot comparison
- If the issue persists, check the
sync_jobstable for failed or stuck jobs:
SELECT id, status, started_at, completed_at, error_message
FROM sync_jobs
ORDER BY started_at DESC
LIMIT 20;
Stuck jobs (status = 'running' with started_at older than 2 hours) can be reset by updating their status:
UPDATE sync_jobs
SET status = 'failed', error_message = 'Manually reset β stuck job'
WHERE status = 'running' AND started_at < NOW() - INTERVAL '2 hours';
Monitoringβ
Track the following metrics to detect performance problems early:
Key Metrics to Watchβ
| Metric | Healthy range | Alert threshold |
|---|---|---|
| API response time (p95) | under 500ms | over 2000ms |
| Dashboard widget load time | under 2s | over 5s |
| Tenable sync duration | under 5 min for under 10K assets | over 15 min |
| DB connection pool utilization | under 70% of DB_POOL_MAX | over 90% |
| DB active queries | under 5 concurrent | over 20 concurrent |
| Node.js heap used | under 75% of --max-old-space-size | over 90% |
| Redis cache hit rate | over 80% | under 50% |
| Sync failure rate | 0% | Any failures |
| AppSec scan queue depth | < 10 pending | > 50 pending |
Where to Access Metricsβ
- Backend health and memory:
GET /healthβ returns Node.js process stats, database connection status, and Redis status - Sync status and history: Admin β Sync β Sync History
- Database query stats: PostgreSQL
pg_stat_statements(if enabled) andpg_stat_activity - Render (production): Render dashboard β Service β Metrics panel shows CPU, memory, and request throughput
- Security events: Admin β Security β Audit Log
Setting Up Alertsβ
For production deployments, configure external uptime monitoring:
- Uptime check:
GET /healthβ should return200 OKwithin 5 seconds - Sync lag check:
GET /api/sync/status(authenticated) β checklastSyncCompletedAtis within expected frequency - Database connectivity: included in
/healthresponse β a failed DB connection returns a degraded health status
Capacity Planningβ
Use these rough guidelines to estimate infrastructure requirements:
Assets and Vulnerabilitiesβ
| Asset count | Vulnerability count | Recommended DB storage | Recommended Node.js RAM |
|---|---|---|---|
| < 5,000 | < 100,000 | 10β20 GB | 512 MB β 1 GB |
| 5,000 β 25,000 | 100,000 β 500,000 | 20β100 GB | 1β2 GB |
| 25,000 β 100,000 | 500,000 β 2,000,000 | 100β400 GB | 2β4 GB |
| > 100,000 | > 2,000,000 | 400 GB+ | 4β8 GB |
Scans per Dayβ
| Scans/day (AppSec) | Recommended instance size | Notes |
|---|---|---|
| < 5 | 1 vCPU, 1 GB RAM | Starter β adequate for dev/testing |
| 5β25 | 2 vCPU, 2 GB RAM | Small team |
| 25β100 | 4 vCPU, 4β8 GB RAM | Mid-size security team |
| > 100 | 8 vCPU, 16 GB RAM, consider horizontal scaling | Enterprise / automated pipeline |
Tenants (SaaS Mode)β
| Tenant count | Recommended DB | Notes |
|---|---|---|
| < 10 | Supabase Pro (8 GB) | Default for new deployments |
| 10β50 | Dedicated PostgreSQL 16 GB+ | Consider connection pooler (PgBouncer) |
| 50β200 | Dedicated PostgreSQL 32 GB+, PgBouncer required | Monitor schema count and DB_POOL_MAX |
| > 200 | Evaluate database sharding or read replicas | Consult BluCypher for architecture guidance |
Each tenant schema adds ~134 tables to the PostgreSQL catalog. At 200+ tenants, catalog queries (e.g., information_schema.tables) can become slow. This does not affect tenant data queries, but does affect internal tooling.
Concurrent Usersβ
| Concurrent users | Expected requests/min | Recommended RATE_LIMIT_MAX_REQUESTS | Recommended DB_POOL_MAX |
|---|---|---|---|
| < 10 | < 150 | 100 (default) | 10 (default) |
| 10β50 | 150β750 | 300β500 | 15β25 |
| 50β200 | 750β3,000 | 1,000+ | 25β50 |
| > 200 | > 3,000 | Horizontal scaling | Multiple instances |
For deployments expecting > 200 concurrent users, horizontal scaling (multiple backend instances behind a load balancer) is recommended over simply raising limits on a single instance.