Skip to main content
Version: Local Β· In Progress

Performance Tuning

This guide covers performance tuning for ThreatWeaver deployments experiencing slow dashboards, sync timeouts, scan lag, or high database load. It addresses backend configuration, frontend caching, AppSec scanner throughput, database maintenance, and capacity planning.


When to Start Tuning​

Start investigating when you observe any of the following:

SymptomLikely area
Dashboard KPIs take more than 3–5 seconds to loadDatabase query performance or missing indexes
Tenable sync jobs time out or fail with 504Sync engine configuration or DB connection pool exhaustion
AppSec scans take much longer than expectedScanner concurrency or AI validation overhead
High memory usage (>1.5 GB for the Node.js process)Large vulnerability dataset, missing pagination
Database CPU spikes during dashboard loadAggregation queries, missing indexes on filter columns
Redis connection errors or cache missesRedis connectivity or TTL misconfiguration
Rate limit errors (429) from dashboard pollingGlobal rate limiter needs tuning for high-traffic deployments

Before tuning, baseline the problem:

  1. Check backend logs for slow query warnings
  2. Check PostgreSQL pg_stat_activity for long-running queries
  3. Check Node.js memory via process.memoryUsage() (exposed at /health)
  4. Check Redis connectivity via redis-cli ping

Backend Performance​

Database Connection Pool​

ThreatWeaver uses a bounded connection pool managed by the pg driver under TypeORM. The default is conservative (10 connections) to avoid overwhelming Supabase's Transaction Pooler.

Configuration (environment variables):

VariableDefaultDescription
DB_POOL_MAX10Maximum concurrent database connections per backend instance
DB_SSL_REJECT_UNAUTHORIZEDtrueSet to false only if using a self-signed certificate in a trusted internal network

Guidelines:

  • For self-hosted PostgreSQL (not Supabase), you can raise DB_POOL_MAX to 25–50 depending on PostgreSQL's max_connections setting
  • For Supabase Transaction Pooler (PgBouncer), keep DB_POOL_MAX at 10–15 per backend instance. The pooler has its own connection limit (typically 20–60 on Supabase Pro)
  • If deploying multiple backend instances (horizontal scaling), each instance has its own pool. With 3 instances at DB_POOL_MAX=10, the database receives up to 30 concurrent connections total
  • Connection timeout is set to 5 seconds (connectionTimeoutMillis: 5000) β€” appropriate for cloud database handshakes. If you see frequent timeout errors, increase to 10000

Diagnosing pool exhaustion:

Pool exhaustion manifests as requests hanging for several seconds before completing or timing out with a "connection acquisition timeout" error. Check:

-- Check active connections
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

-- Check long-running queries (running > 10 seconds)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '10 seconds';

Query Optimization and Index Recommendations​

Dashboard KPI aggregation queries run across large tables (vulnerabilities, assets). These queries are filter-heavy β€” adding indexes on commonly filtered columns dramatically reduces query time.

Recommended indexes (run in PostgreSQL on your tenant schema):

-- Vulnerabilities: most common filter patterns
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_severity ON vulnerabilities(severity);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_state ON vulnerabilities(state);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_plugin_id ON vulnerabilities(plugin_id);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_asset_uuid ON vulnerabilities(asset_uuid);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_first_found ON vulnerabilities(first_found);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_last_found ON vulnerabilities(last_found);

-- Assets: common filter columns
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_fqdn ON assets(fqdn);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_ipv4 ON assets(ipv4 text_ops);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_last_seen ON assets(last_seen);

-- Composite index for dashboard severity breakdown by date range
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_severity_last_found
ON vulnerabilities(severity, last_found DESC);
tip

CREATE INDEX CONCURRENTLY builds the index without locking the table. Avoid CREATE INDEX without CONCURRENTLY on tables with active reads/writes.

Aggregation service (aggregation.service.ts) is the core of all KPI calculations β€” it is ~2936 lines and runs complex multi-table queries. If you observe slow dashboard loads:

  1. Enable SLOW_QUERY_LOG=true environment variable to log queries exceeding 1 second
  2. Use EXPLAIN ANALYZE on the slow query in PostgreSQL to identify sequential scans
  3. Add the appropriate index before retrying

Caching with Redis​

ThreatWeaver includes an optional Redis caching layer. When Redis is unavailable, the application falls back to direct database queries β€” Redis is not required for operation.

When Redis helps most:

  • Dashboard KPI widgets that many users load simultaneously
  • Tenable sync status checks that run on every dashboard page load
  • Tenant configuration lookups (permissions, feature flags)

Configuration:

VariableDefaultDescription
REDIS_URLredis://localhost:6379Redis connection URL
REDIS_ENABLEDtrueSet to false to disable Redis entirely
CACHE_TTL3600 (1 hour)Default cache TTL in seconds
API_CACHE_TTL21600 (6 hours)TTL for API response cache (Tenable data, etc.)
TENANT_CACHE_TTL_SECONDS300 (5 minutes)TTL for tenant configuration cache

TTL recommendations by use case:

DataRecommended TTLWhy
Dashboard KPI totals5–15 minutesData changes during Tenable syncs, not continuously
Tenable sync status30–60 secondsUsers poll this frequently
Tenant config / feature flags5 minutes (300s)Changes are rare but should propagate within 5 minutes
Asset inventory15–30 minutesUpdated only during sync windows
Vulnerability trends1 hourHistorical data β€” changes infrequently

Disabling Redis locally: Redis is not used in local development by default. The cache layer degrades gracefully β€” if Redis is unreachable, all cache reads return null and all writes are silently no-ops.

Sync Engine Tuning​

The Tenable sync engine pulls vulnerability and asset data from Tenable.io (or Tenable.sc). Sync frequency and behavior are configurable.

Sync frequency trade-off:

Sync intervalFreshnessDB load
Every 15 minutesNear-realtimeHigh β€” heavy queries on every sync
Every 1 hourGood for most use casesModerate
Every 6 hoursStale for fast-moving environmentsLow
Manual onlyUser-controlledMinimal

Sync frequency is configured in Admin β†’ Settings β†’ Sync Configuration. For environments with large asset counts (>50,000 assets), consider:

  • Scheduling syncs during off-peak hours (evenings/weekends)
  • Using incremental sync mode (syncs only assets/vulns changed since last sync) rather than full sync
  • Increasing DB_POOL_MAX before large syncs if connection timeout errors appear in sync logs

Monitoring sync performance: Sync job duration and record counts are logged. Access them via Admin β†’ Sync β†’ Sync History. If a sync job consistently takes longer than 10 minutes, consider splitting by asset group or reducing sync frequency.

Rate Limiting Configuration​

The default global rate limit (100 requests/minute per IP) is appropriate for moderate-traffic deployments. High-traffic deployments with many users polling dashboards simultaneously should raise this limit.

Configuration:

VariableDefaultDescription
RATE_LIMIT_WINDOW_MS60000 (1 minute)Rate limit window in milliseconds
RATE_LIMIT_MAX_REQUESTS100Maximum requests per IP per window

Guidelines for high-traffic deployments:

  • Each dashboard page loads approximately 10–15 API requests
  • A team of 20 simultaneous users could generate 200–300 requests per minute
  • For 20+ concurrent users, raise RATE_LIMIT_MAX_REQUESTS to 300–500

Note: The global rate limiter already exempts dashboard routes (/api/dashboard/*), AppSec routes, and health check endpoints from counting against the limit. Auth endpoints have their own separate stricter limiters (10 per 15 minutes) that should not be relaxed.

Node.js Heap Settings for Large Datasets​

Node.js defaults to a heap size of ~1.5 GB (on 64-bit systems). Large vulnerability datasets loaded into memory during aggregation can hit this limit.

Symptoms of heap pressure:

  • Process memory continuously climbing
  • Occasional FATAL ERROR: Reached heap limit Allocation failed crashes
  • Garbage collection pauses causing request latency spikes

Fix β€” increase heap via NODE_OPTIONS environment variable:

# For deployments with >100,000 vulnerabilities
NODE_OPTIONS="--max-old-space-size=4096" node dist/index.js

# For very large deployments (>500,000 vulnerabilities)
NODE_OPTIONS="--max-old-space-size=8192" node dist/index.js

On Render, set NODE_OPTIONS as an environment variable in the service settings.

tip

A Node.js process should not need more than 4–6 GB for the application itself. If it does, investigate whether large data arrays are being held in memory. The aggregation service streams data in batches rather than loading entire tables β€” if you see large heap usage, check for queries that return unbounded result sets.


Frontend Performance​

React Query Cache Configuration​

ThreatWeaver's frontend uses TanStack Query (React Query) for server-state management. The global default configuration is:

// frontend/src/main.tsx
const queryClient = new QueryClient({
defaultOptions: {
queries: {
staleTime: 5 * 60 * 1000, // 5 minutes
},
},
})

The staleTime of 5 minutes means cached data is used without a refetch for up to 5 minutes after it was last fetched. This is the most impactful setting for perceived frontend performance.

Per-hook overrides: Some hooks override the global default for data that changes more frequently:

  • AI Security data: 30-second staleTime (changes frequently during active sessions)
  • AppSec assessment data: 5-minute staleTime (scans are long-running, no benefit to frequent refetch)

Tuning recommendations:

ScenarioRecommendation
Users complain dashboard data is staleReduce global staleTime to 2–3 minutes
High API traffic from dashboard pollingIncrease global staleTime to 10–15 minutes
Real-time scan monitoring neededUse refetchInterval: 10000 (10s) on scan-specific queries

To change the global default, edit /frontend/src/main.tsx and redeploy.

Pagination and Filtering​

All large data lists in ThreatWeaver (assets, vulnerabilities, findings) are server-side paginated. The UI never loads the full dataset into memory.

Performance guidelines:

  • Always apply filters before exporting β€” exporting a filtered set is significantly faster than exporting all records and filtering locally
  • The default page size is 50 records. Increasing page size (up to 200) reduces the number of API calls but increases per-request data transfer
  • The saved-filters feature allows users to bookmark commonly used filter combinations β€” encourage teams to use this rather than manually resetting filters on every visit

If pagination appears slow, check:

  1. Whether a sort column has an index (sorting on unindexed columns requires a full table scan)
  2. Whether the filter combination is selective β€” broad filters (severity IS NOT NULL) return near-full-table scans

Large Dataset Exports​

Exports of large datasets (>10,000 records) should use the async export endpoint (/api/export), not the inline export buttons in the UI.

The async export flow:

  1. User submits an export job via the Export panel or API
  2. The backend processes the export asynchronously (does not hold an HTTP connection)
  3. When complete, a download link is delivered via the notification system
  4. The download is available for 24 hours

Attempting to export very large datasets synchronously (via the direct export buttons) can result in HTTP timeouts if the export takes longer than 60–120 seconds (the timeout of the CDN/proxy in front of the backend).


AppSec Scanner Performance​

Scan Concurrency​

The AppSec scanner runs multiple agent jobs in parallel per assessment. Two concurrency settings control throughput:

Assessment-level concurrency (assessmentQueue.service.ts):

  • Default: 3 parallel agent jobs per assessment
  • Controls how many scanner agents (e.g., sqliTester, xssTester, idorFinder) run simultaneously
  • Raising this value speeds up scans but increases load on both the target application and the ThreatWeaver backend

Crawler concurrency (crawlerEngine.service.ts):

  • Default: 5 parallel requests during endpoint discovery/crawling
  • Controls how many HTTP requests the crawler makes in parallel to the target

These defaults are intentionally conservative to avoid overwhelming scan targets. They can be adjusted per-assessment in the assessment configuration (Advanced Settings).

Guidelines:

Target environmentRecommended agent concurrencyRecommended crawler concurrency
Production API (rate-limited)1–22–3
Staging/test environment (no rate limits)3–55–10
Local development target5–1010–15

Limiting Scan Scope​

Scan duration is directly proportional to the number of endpoints tested. The most effective performance optimization is reducing scope:

  • Specify priorityEndpoints in the assessment's testDataHints to focus agents on high-value endpoints (authenticated, parameterized, business-critical)
  • Exclude static assets β€” images, CSS, JavaScript files do not need to be tested for injection vulnerabilities
  • Use endpoint filtering β€” if the target has hundreds of endpoints, filter by path prefix (e.g., /api/v1/ for API-only testing)
  • Disable agents not relevant to the target β€” for example, if the target has no file upload endpoints, disabling the file upload agent saves time

A well-scoped scan of a REST API with 30–50 endpoints typically completes in 10–20 minutes. An unscoped scan of a full web application with 300+ endpoints can take several hours.

AI Validation Overhead​

ThreatWeaver's AppSec scanner uses Claude AI (via Anthropic API) to validate potential findings before surfacing them as confirmed vulnerabilities. This validation step:

  • Reduces false positives significantly (typically from ~40% FP rate to under 15%)
  • Adds latency β€” each AI validation call takes 2–8 seconds depending on the finding complexity

When to reduce AI validation:

  • Speed-priority scans: Disable AI validation in the assessment Advanced Settings for a first-pass scan, then enable for the final verification scan
  • High-confidence finding types: SQL injection with extractable data, blind XXE with OOB callbacks β€” these rarely need AI re-confirmation
  • Large scan volumes: If running 10+ scans per day, AI API costs and latency compound. Consider AI validation only for HIGH and CRITICAL severity findings

When to keep AI validation:

  • Final scan before a report or customer delivery
  • IDOR and BOLA testing β€” these require context-aware judgment that rule-based checks cannot provide reliably
  • Any scan being used for compliance evidence

To disable AI validation per-assessment, navigate to the assessment configuration and uncheck Enable AI Validation in Advanced Settings.


Database Maintenance​

Archiving Old Scan Data​

AppSec scan data (assessments, findings, raw HTTP logs) accumulates over time and is the primary driver of database growth. ThreatWeaver provides an archiving tool in Admin β†’ Archives.

Recommended archive schedule:

  • Archive assessments older than 90 days (completed scans)
  • Retain the last 3 assessments per target unconditionally (for trend comparison)
  • Archive raw HTTP request/response logs for assessments older than 30 days (these are large)

Archives are stored in a compressed format and can be exported or permanently deleted. They do not count against the active scan data for dashboard calculations.

Vacuuming PostgreSQL Tables​

PostgreSQL requires periodic VACUUM to reclaim space from deleted/updated rows and update query planner statistics. Most cloud-hosted PostgreSQL providers (Supabase, RDS, etc.) handle this automatically via autovacuum. For self-hosted PostgreSQL, verify autovacuum is enabled:

-- Check autovacuum status
SHOW autovacuum;

-- Check tables that need manual vacuuming (dead tuples > 10% of live)
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

-- Manually vacuum a high-churn table (does not lock table)
VACUUM ANALYZE tenant_acme.vulnerabilities;

-- Full vacuum + reindex (locks table β€” run during maintenance window)
VACUUM FULL ANALYZE tenant_acme.vulnerabilities;

High-churn tables in ThreatWeaver deployments with frequent syncs:

  • vulnerabilities β€” updated on every sync
  • assets β€” updated on every sync
  • pentest_findings β€” updated throughout scan lifecycle
  • security_audit_logs β€” appended on every security event

For most deployments, autovacuum handles this automatically. Manual intervention is only needed if you see database size growing unexpectedly or query performance degrading after a large sync.

Snapshot Integrity Checks​

ThreatWeaver periodically takes internal snapshots of dashboard state for trend calculations. If you observe incorrect trend data (e.g., "new vulnerabilities this week" showing 0 when new vulnerabilities exist):

  1. Navigate to Admin β†’ Sync β†’ Reconciliation
  2. Run Force Reconciliation β€” this triggers a fresh snapshot comparison
  3. If the issue persists, check the sync_jobs table for failed or stuck jobs:
SELECT id, status, started_at, completed_at, error_message
FROM sync_jobs
ORDER BY started_at DESC
LIMIT 20;

Stuck jobs (status = 'running' with started_at older than 2 hours) can be reset by updating their status:

UPDATE sync_jobs
SET status = 'failed', error_message = 'Manually reset β€” stuck job'
WHERE status = 'running' AND started_at < NOW() - INTERVAL '2 hours';

Monitoring​

Track the following metrics to detect performance problems early:

Key Metrics to Watch​

MetricHealthy rangeAlert threshold
API response time (p95)under 500msover 2000ms
Dashboard widget load timeunder 2sover 5s
Tenable sync durationunder 5 min for under 10K assetsover 15 min
DB connection pool utilizationunder 70% of DB_POOL_MAXover 90%
DB active queriesunder 5 concurrentover 20 concurrent
Node.js heap usedunder 75% of --max-old-space-sizeover 90%
Redis cache hit rateover 80%under 50%
Sync failure rate0%Any failures
AppSec scan queue depth< 10 pending> 50 pending

Where to Access Metrics​

  • Backend health and memory: GET /health β€” returns Node.js process stats, database connection status, and Redis status
  • Sync status and history: Admin β†’ Sync β†’ Sync History
  • Database query stats: PostgreSQL pg_stat_statements (if enabled) and pg_stat_activity
  • Render (production): Render dashboard β†’ Service β†’ Metrics panel shows CPU, memory, and request throughput
  • Security events: Admin β†’ Security β†’ Audit Log

Setting Up Alerts​

For production deployments, configure external uptime monitoring:

  1. Uptime check: GET /health β€” should return 200 OK within 5 seconds
  2. Sync lag check: GET /api/sync/status (authenticated) β€” check lastSyncCompletedAt is within expected frequency
  3. Database connectivity: included in /health response β€” a failed DB connection returns a degraded health status

Capacity Planning​

Use these rough guidelines to estimate infrastructure requirements:

Assets and Vulnerabilities​

Asset countVulnerability countRecommended DB storageRecommended Node.js RAM
< 5,000< 100,00010–20 GB512 MB – 1 GB
5,000 – 25,000100,000 – 500,00020–100 GB1–2 GB
25,000 – 100,000500,000 – 2,000,000100–400 GB2–4 GB
> 100,000> 2,000,000400 GB+4–8 GB

Scans per Day​

Scans/day (AppSec)Recommended instance sizeNotes
< 51 vCPU, 1 GB RAMStarter β€” adequate for dev/testing
5–252 vCPU, 2 GB RAMSmall team
25–1004 vCPU, 4–8 GB RAMMid-size security team
> 1008 vCPU, 16 GB RAM, consider horizontal scalingEnterprise / automated pipeline

Tenants (SaaS Mode)​

Tenant countRecommended DBNotes
< 10Supabase Pro (8 GB)Default for new deployments
10–50Dedicated PostgreSQL 16 GB+Consider connection pooler (PgBouncer)
50–200Dedicated PostgreSQL 32 GB+, PgBouncer requiredMonitor schema count and DB_POOL_MAX
> 200Evaluate database sharding or read replicasConsult BluCypher for architecture guidance
note

Each tenant schema adds ~134 tables to the PostgreSQL catalog. At 200+ tenants, catalog queries (e.g., information_schema.tables) can become slow. This does not affect tenant data queries, but does affect internal tooling.

Concurrent Users​

Concurrent usersExpected requests/minRecommended RATE_LIMIT_MAX_REQUESTSRecommended DB_POOL_MAX
< 10< 150100 (default)10 (default)
10–50150–750300–50015–25
50–200750–3,0001,000+25–50
> 200> 3,000Horizontal scalingMultiple instances

For deployments expecting > 200 concurrent users, horizontal scaling (multiple backend instances behind a load balancer) is recommended over simply raising limits on a single instance.