Version: Local · In Progress

Performance Tuning

This guide covers performance tuning for ThreatWeaver deployments experiencing slow dashboards, sync timeouts, scan lag, or high database load. It addresses backend configuration, frontend caching, AppSec scanner throughput, database maintenance, and capacity planning.

When to Start Tuning

Start investigating when you observe any of the following:

Symptom	Likely area
Dashboard KPIs take more than 3–5 seconds to load	Database query performance or missing indexes
Tenable sync jobs time out or fail with 504	Sync engine configuration or DB connection pool exhaustion
AppSec scans take much longer than expected	Scanner concurrency or AI validation overhead
High memory usage (>1.5 GB for the Node.js process)	Large vulnerability dataset, missing pagination
Database CPU spikes during dashboard load	Aggregation queries, missing indexes on filter columns
Redis connection errors or cache misses	Redis connectivity or TTL misconfiguration
Rate limit errors (429) from dashboard polling	Global rate limiter needs tuning for high-traffic deployments

Before tuning, baseline the problem:

Check backend logs for slow query warnings
Check PostgreSQL pg_stat_activity for long-running queries
Check Node.js memory via process.memoryUsage() (exposed at /health)
Check Redis connectivity via redis-cli ping

Backend Performance

Database Connection Pool

ThreatWeaver uses a bounded connection pool managed by the pg driver under TypeORM. The default is conservative (10 connections) to avoid overwhelming Supabase's Transaction Pooler.

Configuration (environment variables):

Variable	Default	Description
`DB_POOL_MAX`	`10`	Maximum concurrent database connections per backend instance
`DB_SSL_REJECT_UNAUTHORIZED`	`true`	Set to `false` only if using a self-signed certificate in a trusted internal network

Guidelines:

For self-hosted PostgreSQL (not Supabase), you can raise DB_POOL_MAX to 25–50 depending on PostgreSQL's max_connections setting
For Supabase Transaction Pooler (PgBouncer), keep DB_POOL_MAX at 10–15 per backend instance. The pooler has its own connection limit (typically 20–60 on Supabase Pro)
If deploying multiple backend instances (horizontal scaling), each instance has its own pool. With 3 instances at DB_POOL_MAX=10, the database receives up to 30 concurrent connections total
Connection timeout is set to 5 seconds (connectionTimeoutMillis: 5000) — appropriate for cloud database handshakes. If you see frequent timeout errors, increase to 10000

Diagnosing pool exhaustion:

Pool exhaustion manifests as requests hanging for several seconds before completing or timing out with a "connection acquisition timeout" error. Check:

-- Check active connections
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

-- Check long-running queries (running > 10 seconds)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '10 seconds';

Query Optimization and Index Recommendations

Dashboard KPI aggregation queries run across large tables (vulnerabilities, assets). These queries are filter-heavy — adding indexes on commonly filtered columns dramatically reduces query time.

Recommended indexes (run in PostgreSQL on your tenant schema):

-- Vulnerabilities: most common filter patterns
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_severity ON vulnerabilities(severity);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_state ON vulnerabilities(state);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_plugin_id ON vulnerabilities(plugin_id);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_asset_uuid ON vulnerabilities(asset_uuid);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_first_found ON vulnerabilities(first_found);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_last_found ON vulnerabilities(last_found);

-- Assets: common filter columns
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_fqdn ON assets(fqdn);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_ipv4 ON assets(ipv4 text_ops);
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_asset_last_seen ON assets(last_seen);

-- Composite index for dashboard severity breakdown by date range
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_vuln_severity_last_found 
  ON vulnerabilities(severity, last_found DESC);

tip

CREATE INDEX CONCURRENTLY builds the index without locking the table. Avoid CREATE INDEX without CONCURRENTLY on tables with active reads/writes.

Aggregation service (aggregation.service.ts) is the core of all KPI calculations — it is ~2936 lines and runs complex multi-table queries. If you observe slow dashboard loads:

Enable SLOW_QUERY_LOG=true environment variable to log queries exceeding 1 second
Use EXPLAIN ANALYZE on the slow query in PostgreSQL to identify sequential scans
Add the appropriate index before retrying

Caching with Redis

ThreatWeaver includes an optional Redis caching layer. When Redis is unavailable, the application falls back to direct database queries — Redis is not required for operation.

When Redis helps most:

Dashboard KPI widgets that many users load simultaneously
Tenable sync status checks that run on every dashboard page load
Tenant configuration lookups (permissions, feature flags)

Configuration:

Variable	Default	Description
`REDIS_URL`	`redis://localhost:6379`	Redis connection URL
`REDIS_ENABLED`	`true`	Set to `false` to disable Redis entirely
`CACHE_TTL`	`3600` (1 hour)	Default cache TTL in seconds
`API_CACHE_TTL`	`21600` (6 hours)	TTL for API response cache (Tenable data, etc.)
`TENANT_CACHE_TTL_SECONDS`	`300` (5 minutes)	TTL for tenant configuration cache

TTL recommendations by use case:

Data	Recommended TTL	Why
Dashboard KPI totals	5–15 minutes	Data changes during Tenable syncs, not continuously
Tenable sync status	30–60 seconds	Users poll this frequently
Tenant config / feature flags	5 minutes (300s)	Changes are rare but should propagate within 5 minutes
Asset inventory	15–30 minutes	Updated only during sync windows
Vulnerability trends	1 hour	Historical data — changes infrequently

Disabling Redis locally: Redis is not used in local development by default. The cache layer degrades gracefully — if Redis is unreachable, all cache reads return null and all writes are silently no-ops.

Sync Engine Tuning

The Tenable sync engine pulls vulnerability and asset data from Tenable.io (or Tenable.sc). Sync frequency and behavior are configurable.

Sync frequency trade-off:

Sync interval	Freshness	DB load
Every 15 minutes	Near-realtime	High — heavy queries on every sync
Every 1 hour	Good for most use cases	Moderate
Every 6 hours	Stale for fast-moving environments	Low
Manual only	User-controlled	Minimal

Sync frequency is configured in Admin → Settings → Sync Configuration. For environments with large asset counts (>50,000 assets), consider:

Scheduling syncs during off-peak hours (evenings/weekends)
Using incremental sync mode (syncs only assets/vulns changed since last sync) rather than full sync
Increasing DB_POOL_MAX before large syncs if connection timeout errors appear in sync logs

Monitoring sync performance: Sync job duration and record counts are logged. Access them via Admin → Sync → Sync History. If a sync job consistently takes longer than 10 minutes, consider splitting by asset group or reducing sync frequency.

Rate Limiting Configuration

The default global rate limit (100 requests/minute per IP) is appropriate for moderate-traffic deployments. High-traffic deployments with many users polling dashboards simultaneously should raise this limit.

Configuration:

Variable	Default	Description
`RATE_LIMIT_WINDOW_MS`	`60000` (1 minute)	Rate limit window in milliseconds
`RATE_LIMIT_MAX_REQUESTS`	`100`	Maximum requests per IP per window

Guidelines for high-traffic deployments:

Each dashboard page loads approximately 10–15 API requests
A team of 20 simultaneous users could generate 200–300 requests per minute
For 20+ concurrent users, raise RATE_LIMIT_MAX_REQUESTS to 300–500

Note: The global rate limiter already exempts dashboard routes (/api/dashboard/*), AppSec routes, and health check endpoints from counting against the limit. Auth endpoints have their own separate stricter limiters (10 per 15 minutes) that should not be relaxed.

Node.js Heap Settings for Large Datasets

Node.js defaults to a heap size of ~1.5 GB (on 64-bit systems). Large vulnerability datasets loaded into memory during aggregation can hit this limit.

Symptoms of heap pressure:

Process memory continuously climbing
Occasional FATAL ERROR: Reached heap limit Allocation failed crashes
Garbage collection pauses causing request latency spikes

Fix — increase heap via NODE_OPTIONS environment variable:

# For deployments with >100,000 vulnerabilities
NODE_OPTIONS="--max-old-space-size=4096" node dist/index.js

# For very large deployments (>500,000 vulnerabilities)
NODE_OPTIONS="--max-old-space-size=8192" node dist/index.js

On Render, set NODE_OPTIONS as an environment variable in the service settings.

tip

A Node.js process should not need more than 4–6 GB for the application itself. If it does, investigate whether large data arrays are being held in memory. The aggregation service streams data in batches rather than loading entire tables — if you see large heap usage, check for queries that return unbounded result sets.

Frontend Performance

React Query Cache Configuration

ThreatWeaver's frontend uses TanStack Query (React Query) for server-state management. The global default configuration is:

// frontend/src/main.tsx
const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      staleTime: 5 * 60 * 1000, // 5 minutes
    },
  },
})

The staleTime of 5 minutes means cached data is used without a refetch for up to 5 minutes after it was last fetched. This is the most impactful setting for perceived frontend performance.

Per-hook overrides: Some hooks override the global default for data that changes more frequently:

AI Security data: 30-second staleTime (changes frequently during active sessions)
AppSec assessment data: 5-minute staleTime (scans are long-running, no benefit to frequent refetch)

Tuning recommendations:

Scenario	Recommendation
Users complain dashboard data is stale	Reduce global `staleTime` to 2–3 minutes
High API traffic from dashboard polling	Increase global `staleTime` to 10–15 minutes
Real-time scan monitoring needed	Use `refetchInterval: 10000` (10s) on scan-specific queries

To change the global default, edit /frontend/src/main.tsx and redeploy.

Pagination and Filtering

All large data lists in ThreatWeaver (assets, vulnerabilities, findings) are server-side paginated. The UI never loads the full dataset into memory.

Performance guidelines:

Always apply filters before exporting — exporting a filtered set is significantly faster than exporting all records and filtering locally
The default page size is 50 records. Increasing page size (up to 200) reduces the number of API calls but increases per-request data transfer
The saved-filters feature allows users to bookmark commonly used filter combinations — encourage teams to use this rather than manually resetting filters on every visit

If pagination appears slow, check:

Whether a sort column has an index (sorting on unindexed columns requires a full table scan)
Whether the filter combination is selective — broad filters (severity IS NOT NULL) return near-full-table scans

Large Dataset Exports

Exports of large datasets (>10,000 records) should use the async export endpoint (/api/export), not the inline export buttons in the UI.

The async export flow:

User submits an export job via the Export panel or API
The backend processes the export asynchronously (does not hold an HTTP connection)
When complete, a download link is delivered via the notification system
The download is available for 24 hours

Attempting to export very large datasets synchronously (via the direct export buttons) can result in HTTP timeouts if the export takes longer than 60–120 seconds (the timeout of the CDN/proxy in front of the backend).

AppSec Scanner Performance

Scan Concurrency

The AppSec scanner runs multiple agent jobs in parallel per assessment. Two concurrency settings control throughput:

Assessment-level concurrency (assessmentQueue.service.ts):

Default: 3 parallel agent jobs per assessment
Controls how many scanner agents (e.g., sqliTester, xssTester, idorFinder) run simultaneously
Raising this value speeds up scans but increases load on both the target application and the ThreatWeaver backend

Crawler concurrency (crawlerEngine.service.ts):

Default: 5 parallel requests during endpoint discovery/crawling
Controls how many HTTP requests the crawler makes in parallel to the target

These defaults are intentionally conservative to avoid overwhelming scan targets. They can be adjusted per-assessment in the assessment configuration (Advanced Settings).

Guidelines:

Target environment	Recommended agent concurrency	Recommended crawler concurrency
Production API (rate-limited)	1–2	2–3
Staging/test environment (no rate limits)	3–5	5–10
Local development target	5–10	10–15

Limiting Scan Scope

Scan duration is directly proportional to the number of endpoints tested. The most effective performance optimization is reducing scope:

Specify priorityEndpoints in the assessment's testDataHints to focus agents on high-value endpoints (authenticated, parameterized, business-critical)
Exclude static assets — images, CSS, JavaScript files do not need to be tested for injection vulnerabilities
Use endpoint filtering — if the target has hundreds of endpoints, filter by path prefix (e.g., /api/v1/ for API-only testing)
Disable agents not relevant to the target — for example, if the target has no file upload endpoints, disabling the file upload agent saves time

A well-scoped scan of a REST API with 30–50 endpoints typically completes in 10–20 minutes. An unscoped scan of a full web application with 300+ endpoints can take several hours.

AI Validation Overhead

ThreatWeaver's AppSec scanner uses Claude AI (via Anthropic API) to validate potential findings before surfacing them as confirmed vulnerabilities. This validation step:

Reduces false positives significantly (typically from ~40% FP rate to under 15%)
Adds latency — each AI validation call takes 2–8 seconds depending on the finding complexity

When to reduce AI validation:

Speed-priority scans: Disable AI validation in the assessment Advanced Settings for a first-pass scan, then enable for the final verification scan
High-confidence finding types: SQL injection with extractable data, blind XXE with OOB callbacks — these rarely need AI re-confirmation
Large scan volumes: If running 10+ scans per day, AI API costs and latency compound. Consider AI validation only for HIGH and CRITICAL severity findings

When to keep AI validation:

Final scan before a report or customer delivery
IDOR and BOLA testing — these require context-aware judgment that rule-based checks cannot provide reliably
Any scan being used for compliance evidence

To disable AI validation per-assessment, navigate to the assessment configuration and uncheck Enable AI Validation in Advanced Settings.

Database Maintenance

Archiving Old Scan Data

AppSec scan data (assessments, findings, raw HTTP logs) accumulates over time and is the primary driver of database growth. ThreatWeaver provides an archiving tool in Admin → Archives.

Recommended archive schedule:

Archive assessments older than 90 days (completed scans)
Retain the last 3 assessments per target unconditionally (for trend comparison)
Archive raw HTTP request/response logs for assessments older than 30 days (these are large)

Archives are stored in a compressed format and can be exported or permanently deleted. They do not count against the active scan data for dashboard calculations.

Vacuuming PostgreSQL Tables

PostgreSQL requires periodic VACUUM to reclaim space from deleted/updated rows and update query planner statistics. Most cloud-hosted PostgreSQL providers (Supabase, RDS, etc.) handle this automatically via autovacuum. For self-hosted PostgreSQL, verify autovacuum is enabled:

-- Check autovacuum status
SHOW autovacuum;

-- Check tables that need manual vacuuming (dead tuples > 10% of live)
SELECT relname, n_live_tup, n_dead_tup, 
       round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
       last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

-- Manually vacuum a high-churn table (does not lock table)
VACUUM ANALYZE tenant_acme.vulnerabilities;

-- Full vacuum + reindex (locks table — run during maintenance window)
VACUUM FULL ANALYZE tenant_acme.vulnerabilities;

High-churn tables in ThreatWeaver deployments with frequent syncs:

vulnerabilities — updated on every sync
assets — updated on every sync
pentest_findings — updated throughout scan lifecycle
security_audit_logs — appended on every security event

For most deployments, autovacuum handles this automatically. Manual intervention is only needed if you see database size growing unexpectedly or query performance degrading after a large sync.

Snapshot Integrity Checks

ThreatWeaver periodically takes internal snapshots of dashboard state for trend calculations. If you observe incorrect trend data (e.g., "new vulnerabilities this week" showing 0 when new vulnerabilities exist):

Navigate to Admin → Sync → Reconciliation
Run Force Reconciliation — this triggers a fresh snapshot comparison
If the issue persists, check the sync_jobs table for failed or stuck jobs:

SELECT id, status, started_at, completed_at, error_message
FROM sync_jobs
ORDER BY started_at DESC
LIMIT 20;

Stuck jobs (status = 'running' with started_at older than 2 hours) can be reset by updating their status:

UPDATE sync_jobs
SET status = 'failed', error_message = 'Manually reset — stuck job'
WHERE status = 'running' AND started_at < NOW() - INTERVAL '2 hours';

Monitoring

Track the following metrics to detect performance problems early:

Key Metrics to Watch

Metric	Healthy range	Alert threshold
API response time (p95)	under 500ms	over 2000ms
Dashboard widget load time	under 2s	over 5s
Tenable sync duration	under 5 min for under 10K assets	over 15 min
DB connection pool utilization	under 70% of `DB_POOL_MAX`	over 90%
DB active queries	under 5 concurrent	over 20 concurrent
Node.js heap used	under 75% of `--max-old-space-size`	over 90%
Redis cache hit rate	over 80%	under 50%
Sync failure rate	0%	Any failures
AppSec scan queue depth	< 10 pending	> 50 pending

Where to Access Metrics

Backend health and memory: GET /health — returns Node.js process stats, database connection status, and Redis status
Sync status and history: Admin → Sync → Sync History
Database query stats: PostgreSQL pg_stat_statements (if enabled) and pg_stat_activity
Render (production): Render dashboard → Service → Metrics panel shows CPU, memory, and request throughput
Security events: Admin → Security → Audit Log

Setting Up Alerts

For production deployments, configure external uptime monitoring:

Uptime check: GET /health — should return 200 OK within 5 seconds
Sync lag check: GET /api/sync/status (authenticated) — check lastSyncCompletedAt is within expected frequency
Database connectivity: included in /health response — a failed DB connection returns a degraded health status

Capacity Planning

Use these rough guidelines to estimate infrastructure requirements:

Assets and Vulnerabilities

Asset count	Vulnerability count	Recommended DB storage	Recommended Node.js RAM
< 5,000	< 100,000	10–20 GB	512 MB – 1 GB
5,000 – 25,000	100,000 – 500,000	20–100 GB	1–2 GB
25,000 – 100,000	500,000 – 2,000,000	100–400 GB	2–4 GB
> 100,000	> 2,000,000	400 GB+	4–8 GB

Scans per Day

Scans/day (AppSec)	Recommended instance size	Notes
< 5	1 vCPU, 1 GB RAM	Starter — adequate for dev/testing
5–25	2 vCPU, 2 GB RAM	Small team
25–100	4 vCPU, 4–8 GB RAM	Mid-size security team
> 100	8 vCPU, 16 GB RAM, consider horizontal scaling	Enterprise / automated pipeline

Tenants (SaaS Mode)

Tenant count	Recommended DB	Notes
< 10	Supabase Pro (8 GB)	Default for new deployments
10–50	Dedicated PostgreSQL 16 GB+	Consider connection pooler (PgBouncer)
50–200	Dedicated PostgreSQL 32 GB+, PgBouncer required	Monitor schema count and `DB_POOL_MAX`
> 200	Evaluate database sharding or read replicas	Consult BluCypher for architecture guidance

note

Each tenant schema adds ~134 tables to the PostgreSQL catalog. At 200+ tenants, catalog queries (e.g., information_schema.tables) can become slow. This does not affect tenant data queries, but does affect internal tooling.

Concurrent Users

Concurrent users	Expected requests/min	Recommended `RATE_LIMIT_MAX_REQUESTS`	Recommended `DB_POOL_MAX`
< 10	< 150	100 (default)	10 (default)
10–50	150–750	300–500	15–25
50–200	750–3,000	1,000+	25–50
> 200	> 3,000	Horizontal scaling	Multiple instances

For deployments expecting > 200 concurrent users, horizontal scaling (multiple backend instances behind a load balancer) is recommended over simply raising limits on a single instance.

When to Start Tuning​

Backend Performance​

Database Connection Pool​

Query Optimization and Index Recommendations​

Caching with Redis​

Sync Engine Tuning​

Rate Limiting Configuration​

Node.js Heap Settings for Large Datasets​

Frontend Performance​

React Query Cache Configuration​

Pagination and Filtering​

Large Dataset Exports​

AppSec Scanner Performance​

Scan Concurrency​

Limiting Scan Scope​

AI Validation Overhead​

Database Maintenance​

Archiving Old Scan Data​

Vacuuming PostgreSQL Tables​

Snapshot Integrity Checks​

Monitoring​

Key Metrics to Watch​

Where to Access Metrics​

Setting Up Alerts​

Capacity Planning​

Assets and Vulnerabilities​

Scans per Day​

Tenants (SaaS Mode)​

Concurrent Users​