Synadia Insights
Troubleshooting
Common problems and how to diagnose them. When you're investigating an issue, check the server logs first. Insights logs every significant event and lifecycle decision at info or warn. Start the server with --log-level debug for finer-grained output.
Scrape Failures
The scraper connects to the target NATS system using the --sys.* credentials and publishes collected monitoring data to the indexer over NATS. A failure anywhere in that pipeline surfaces in the logs with the word scraper or scrape.
Symptoms
- Log line
scraper setup failed, retryingrepeats at growing backoff intervals. - Web UI shows a stale "Last epoch" timestamp.
Common causes
| Cause | Diagnostic | Fix |
|---|---|---|
| Target NATS unreachable | connect to scrape target in logs | Verify --sys.server, firewall rules, TLS cert chain. Test with nats --context=<ctx> server check from the same host. |
| Invalid system credentials | authorization violation or nkey unknown in logs | Ensure --sys.creds points at a system-account credentials file and that the account has $SYS access on the target. |
| Scraper already running | scrape already in progress | Previous scrape is still in flight. Either the previous cycle is hung (check target NATS health) or the scrape interval is too aggressive for the system size. |
| License expired | scraper not reconnecting: license expired | Refresh the license JWT. See Installation › Configure a License. |
What the retry loop does
When setup fails, the scraper retries with exponential backoff up to a maximum interval. Failures don't crash the process. Insights keeps the web UI, API, and indexer running so queries against historical data still work.
License Validation Errors
Insights validates the license JWT at startup (skipped in simulator mode).
no license token provided. Neither --license.token nor --license.file was supplied. Set one of them. For evaluation, use --simulator.enabled instead.
invalid license: token is malformed. The string you provided isn't a valid JWT. Copy-paste the token again, with no stray whitespace or line breaks.
license does not include an Insights entitlement. The JWT is signed correctly but it wasn't issued for Insights. Confirm the license was provisioned by Synadia specifically for Insights (the entitlement is called insights).
license check failed: invalid license: token has invalid claims: token is expired. The license is past its expiry. Check expires_at logged at startup, renew by contacting Synadia at https://www.synadia.com/contact, and restart with the new token.
Clock skew. JWT validation rejects tokens issued in the future. If the host clock is wrong, run NTP sync and try again.
Disk and Memory Issues
Insights stores historical data in DuckDB (<data-dir>/insights.db) and uses JetStream streams under <data-dir>/nats/ as the scrape message bus.
Disk growth
Historical data grows roughly linearly with the scrape interval and the size of the monitored system. The retention sweeper deletes epochs older than --db.retention.duration on each sweep, but the database file isn't physically compacted. DuckDB reuses freed blocks for new writes, so the file size is a high-water mark, not the current footprint.
- Set
--db.retention.duration(default768h, 32 days;0disables retention). For example,--db.retention.duration 168hkeeps a week of history. --db.retention.intervalcontrols how often the sweep runs. A larger interval smooths out sweep cost.- To reclaim disk space right away, take a backup (
insights backup) and restart Insights pointing at the backup file as the new--data-dir.
Memory pressure
--db.memory-limitcaps DuckDB's in-memory buffer (default4GB). On a shared host, lower it so the OS OOM-killer doesn't target Insights.- Large ad-hoc queries (especially unbounded
SELECT * FROMagainst high-cardinality tables likehx.connections) can blow past the limit. Add aWHERE epoch > ...filter to narrow the scan. - Backups temporarily bump up disk use while the snapshot is being prepared. Plan for roughly 2x the current database size during a backup window.
Missing Data
If the web UI reports fewer epochs than expected, or gaps appear in sparklines:
- Check retention: epochs older than
--db.retention.durationare intentionally deleted. - Check for scraper downtime in the logs. Each gap in the timeline corresponds to a stretch where no scrape cycle completed.
- Verify the scraper is still advancing epochs: the "Last epoch" timestamp in the web UI should stay close to the scrape interval. A growing gap means no new epoch has been processed.
- If the scraper recently reconnected, it resumes from the latest available epoch. Historical data during the outage is not backfilled.
Web UI Not Loading
| Symptom | Likely cause | Resolution |
|---|---|---|
| Connection refused | --web.hostname binds to 127.0.0.1 by default | For remote access, set --web.hostname 0.0.0.0 or a specific interface. |
| Port already in use | Another process on --web.port | Change the port or stop the conflicting process. |
| Blank page, console errors | Stale browser cache after an upgrade | Hard-reload (Shift-click the reload button) or clear cache. |
| Self-signed TLS warning | Auto-generated cert on first run | Accept once for a test deployment, or provide --web.tls-cert and --web.tls-key. |
| 401 / redirect loop | Authentication configured but session seed changed | Clear browser cookies for the Insights host. Ensure --web.session-seed is stable across restarts in production. |
Slow Queries
If SQL queries via insights query or the web UI feel slow:
- Add an
epochpredicate. Queries that scan all of history are rarely what you want. - For aggregates, use the pre-joined
hxviews (hx.servers,hx.connections,hx.streams, and so on) rather than raw_statstables, unless you need performance-sensitive access. - If the indexer is saturated, new queries compete with ingest for CPU and memory. Watch for slow ingest cycles in the logs.
- Increase
--db.memory-limitif the query plan is spilling to disk (you'll see it in DuckDB's query log).