Troubleshooting

Common problems and how to diagnose them. When you're investigating an issue, check the server logs first. Insights logs every significant event and lifecycle decision at info or warn. Start the server with --log-level debug for finer-grained output.

Scrape Failures

The scraper connects to the target NATS system using the --sys.* credentials and publishes collected monitoring data to the indexer over NATS. A failure anywhere in that pipeline surfaces in the logs with the word scraper or scrape.

Symptoms

Log line scraper setup failed, retrying repeats at growing backoff intervals.
Web UI shows a stale "Last epoch" timestamp.

Common causes

Cause	Diagnostic	Fix
Target NATS unreachable	`connect to scrape target` in logs	Verify `--sys.server`, firewall rules, TLS cert chain. Test with `nats --context=<ctx> server check` from the same host.
Invalid system credentials	`authorization violation` or `nkey unknown` in logs	Ensure `--sys.creds` points at a system-account credentials file and that the account has `$SYS` access on the target.
Scraper already running	`scrape already in progress`	Previous scrape is still in flight. Either the previous cycle is hung (check target NATS health) or the scrape interval is too aggressive for the system size.
License expired	`scraper not reconnecting: license expired`	Refresh the license JWT. See Installation › Configure a License.

What the retry loop does

When setup fails, the scraper retries with exponential backoff up to a maximum interval. Failures don't crash the process. Insights keeps the web UI, API, and indexer running so queries against historical data still work.

License Validation Errors

Insights validates the license JWT at startup (skipped in simulator mode).

no license token provided. Neither --license.token nor --license.file was supplied. Set one of them. For evaluation, use --simulator.enabled instead.

invalid license: token is malformed. The string you provided isn't a valid JWT. Copy-paste the token again, with no stray whitespace or line breaks.

license does not include an Insights entitlement. The JWT is signed correctly but it wasn't issued for Insights. Confirm the license was provisioned by Synadia specifically for Insights (the entitlement is called insights).

license check failed: invalid license: token has invalid claims: token is expired. The license is past its expiry. Check expires_at logged at startup, renew by contacting Synadia at https://www.synadia.com/contact, and restart with the new token.

Clock skew. JWT validation rejects tokens issued in the future. If the host clock is wrong, run NTP sync and try again.

Disk and Memory Issues

Insights stores historical data in DuckDB (<data-dir>/insights.db) and uses JetStream streams under <data-dir>/nats/ as the scrape message bus.

Disk growth

Historical data grows roughly linearly with the scrape interval and the size of the monitored system. The retention sweeper deletes epochs older than --db.retention.duration on each sweep, but the database file isn't physically compacted. DuckDB reuses freed blocks for new writes, so the file size is a high-water mark, not the current footprint.

Set --db.retention.duration (default 768h, 32 days; 0 disables retention). For example, --db.retention.duration 168h keeps a week of history.
--db.retention.interval controls how often the sweep runs. A larger interval smooths out sweep cost.
To reclaim disk space right away, take a backup (insights backup) and restart Insights pointing at the backup file as the new --data-dir.

Memory pressure

--db.memory-limit caps DuckDB's in-memory buffer (default 4GB). On a shared host, lower it so the OS OOM-killer doesn't target Insights.
Large ad-hoc queries (especially unbounded SELECT * FROM against high-cardinality tables like hx.connections) can blow past the limit. Add a WHERE epoch > ... filter to narrow the scan.
Backups temporarily bump up disk use while the snapshot is being prepared. Plan for roughly 2x the current database size during a backup window.

Missing Data

If the web UI reports fewer epochs than expected, or gaps appear in sparklines:

Check retention: epochs older than --db.retention.duration are intentionally deleted.
Check for scraper downtime in the logs. Each gap in the timeline corresponds to a stretch where no scrape cycle completed.
Verify the scraper is still advancing epochs: the "Last epoch" timestamp in the web UI should stay close to the scrape interval. A growing gap means no new epoch has been processed.
If the scraper recently reconnected, it resumes from the latest available epoch. Historical data during the outage is not backfilled.

Web UI Not Loading

Symptom	Likely cause	Resolution
Connection refused	`--web.hostname` binds to `127.0.0.1` by default	For remote access, set `--web.hostname 0.0.0.0` or a specific interface.
Port already in use	Another process on `--web.port`	Change the port or stop the conflicting process.
Blank page, console errors	Stale browser cache after an upgrade	Hard-reload (Shift-click the reload button) or clear cache.
Self-signed TLS warning	Auto-generated cert on first run	Accept once for a test deployment, or provide `--web.tls-cert` and `--web.tls-key`.
401 / redirect loop	Authentication configured but session seed changed	Clear browser cookies for the Insights host. Ensure `--web.session-seed` is stable across restarts in production.

Slow Queries

If SQL queries via insights query or the web UI feel slow:

Add an epoch predicate. Queries that scan all of history are rarely what you want.
For aggregates, use the pre-joined hx views (hx.servers, hx.connections, hx.streams, and so on) rather than raw _stats tables, unless you need performance-sensitive access.
If the indexer is saturated, new queries compete with ingest for CPU and memory. Watch for slow ingest cycles in the logs.
Increase --db.memory-limit if the query plan is spilling to disk (you'll see it in DuckDB's query log).