Collector Troubleshooting
This page covers diagnosing and resolving issues with the LakeSentry collector. For initial setup, see Collector Deployment.
Quick diagnosis
Section titled “Quick diagnosis”Start here when something looks wrong:
| Symptom | Likely cause | Jump to |
|---|---|---|
| Region connector stuck on “Pending” | Collector never ran or failed on first run | First run failures |
| Region health shows “Warning” (yellow) | Collector hasn’t pushed data in 2+ hours | Missed runs |
| Region health shows “Error” (red) | No data in 30+ hours, or validation error | Connection errors |
| Job run fails in Databricks | Collector error during extraction | Extraction errors |
| Data appears stale or incomplete | Checkpoint issues or missing table permissions | Checkpoint issues |
| Some tables missing from “Tables received” | Permission not granted for those tables | Missing tables |
First run failures
Section titled “First run failures”If the region connector never moves past Pending status:
- Check the Databricks job — Go to Workflows > Jobs and verify the collector job exists and has run at least once.
- Check the job run status — Click the most recent run. If it failed, read the error output.
- Verify the connection string — The connection string is consumed during
configure. If configuration failed, generate a new connection string and re-run the configure step.
Common first-run errors:
| Error | Cause | Fix |
|---|---|---|
Connection string is not valid base64 | Malformed or truncated connection string | Generate a new connection string from the region connector |
Connection string is missing required fields | Incomplete or expired connection string | Generate a new connection string |
LakeSentry API is not reachable | Network connectivity issue or invalid API URL | Verify the workspace has outbound HTTPS access to LakeSentry |
Module not found: lakesentry_collector | Wheel not installed or wrong path | Check the wheel file path and re-install |
Connection errors
Section titled “Connection errors”Network connectivity
Section titled “Network connectivity”The collector needs outbound HTTPS access to api.lakesentry.io (or your tenant’s API endpoint). If your Databricks workspace uses a private network or firewall:
- Add
*.lakesentry.ioto the allowlist for outbound HTTPS (port 443). - If using a proxy, configure the
HTTPS_PROXYenvironment variable in the cluster configuration.
Authentication failures
Section titled “Authentication failures”If the collector was working and suddenly fails authentication:
- Connection string was regenerated — Someone generated a new connection string, invalidating the old token. Reconfigure the collector with the new string.
- Region connector was deleted and re-created — The connector ID changed. Reconfigure with the new connection string.
Extraction errors
Section titled “Extraction errors”Extraction errors occur when the collector fails to read from Databricks system tables.
Permission denied
Section titled “Permission denied”Error: [TABLE_OR_VIEW_NOT_FOUND] Table or view not found: system.compute.clustersor
Error: User does not have permission to SELECT on table system.compute.clustersThe service principal is missing SELECT access to the specified table. Grant permissions as described in Account & Connector Setup.
Unity Catalog not enabled
Section titled “Unity Catalog not enabled”Error: Unity Catalog is not enabled for this workspaceSystem tables require Unity Catalog. Enable it on your Databricks account, or use a workspace that already has it enabled.
Table not available
Section titled “Table not available”Some system tables are only available in certain Databricks pricing tiers or regions. If a table doesn’t exist:
- The collector logs a warning and skips the table.
- LakeSentry marks the corresponding feature as unavailable.
- Other tables are still extracted normally.
This is expected behavior — the collector continues with whatever tables are accessible.
Missed runs
Section titled “Missed runs”If the collector misses scheduled runs:
- Check the Databricks job schedule — Verify the job is still enabled and the schedule hasn’t been modified.
- Check cluster availability — If the job cluster fails to start (e.g., cloud provider capacity issues), the run fails before the collector code executes.
- Check for long-running previous runs — If a run takes longer than the schedule interval, Databricks may skip or queue subsequent runs depending on the concurrency setting.
Checkpoint and resumability
Section titled “Checkpoint and resumability”The collector uses checkpoint-based incremental extraction. Each table has a watermark that tracks the last successfully extracted position.
How checkpoints work
Section titled “How checkpoints work”| Table type | Watermark column | Strategy |
|---|---|---|
system.billing.usage | usage_start_time | Incremental — extract rows after the watermark |
system.query.history | start_time | Incremental |
system.compute.clusters | create_time | Incremental |
system.lakeflow.jobs | change_time | Incremental |
system.compute.node_types | — | Full snapshot (reference table) |
Incremental tables use the watermark to avoid re-extracting data. Full-snapshot tables are extracted completely on each run (they’re small reference tables).
Checkpoint recovery
Section titled “Checkpoint recovery”If a run fails partway through extraction:
- Completed tables — Their checkpoints are already updated. The next run skips them.
- In-progress table — The checkpoint was not updated. The next run re-extracts from the last successful watermark.
- Remaining tables — Not yet attempted. The next run extracts them normally.
This means the collector is crash-safe — you can safely stop a run at any point and the next run recovers automatically. There’s no need to manually reset checkpoints in normal operation.
Resetting checkpoints
Section titled “Resetting checkpoints”In rare cases, you may need to re-extract historical data (e.g., after granting access to a previously inaccessible table). To reset checkpoints:
- In LakeSentry, go to the region connector detail page.
- Click Reset Checkpoints for the specific table, or Reset All for a full re-extraction.
- The next collector run starts from the beginning for those tables.
Missing tables
Section titled “Missing tables”If the “Tables received” list on the region connector doesn’t include expected tables:
| Missing table category | Check |
|---|---|
| Billing tables | These are account-level. They may only appear for one region if you have multiple regions. |
| MLflow tables | Requires optional permissions. See optional tables. |
| Serving tables | Requires optional permissions. |
| Audit tables | Requires optional permissions. |
| All regional tables | Verify the service principal has workspace-level access in the target region. |
Log interpretation
Section titled “Log interpretation”Collector logs are available in the Databricks job run output. Key log entries to look for:
| Log level | Message pattern | Meaning |
|---|---|---|
INFO | Starting collection | Normal — extraction cycle beginning |
INFO | Extracted (with table, records, status fields) | Normal — successful table extraction |
INFO | Extraction batch pushed successfully | Normal — data sent successfully |
WARN | Table not available in workspace, skipping | Permission not granted or table does not exist |
INFO | No records found | Normal — no data in the extraction window |
ERROR | LakeSentry API is not reachable | Connection string is invalid or API endpoint is down |
ERROR | API error response (with status_code) | LakeSentry API rejected the push |
ERROR | Extraction failed / Table extraction failed | Query error on a specific table |
Common issues by symptom
Section titled “Common issues by symptom”Data shows in LakeSentry but appears incomplete
Section titled “Data shows in LakeSentry but appears incomplete”- Some workspaces missing — The workspace may be in a region without a region connector. Check the region mapping.
- Recent data missing — The collector may have a stuck checkpoint. Check the extraction checkpoints on the region connector detail page.
- Specific data type missing — Check if the corresponding system table is in the “Tables received” list.
Collector runs succeed but no data in LakeSentry
Section titled “Collector runs succeed but no data in LakeSentry”- Check the LakeSentry ingestion pipeline — Data flows through extraction → ingestion → transformation. If extraction succeeds but data doesn’t appear in dashboards, the issue may be downstream. Check the Connectors page for ingestion errors.
- Check for deduplication — If the same data was already ingested (e.g., from a previous run with the same extraction ID), it’s deduplicated silently. This is expected.
Cost data lags behind real-time
Section titled “Cost data lags behind real-time”Databricks system tables themselves have latency:
| Table | Typical Databricks latency |
|---|---|
system.billing.usage | 1-4 hours behind real-time |
system.compute.clusters | Near real-time |
system.query.history | Minutes to 1 hour |
system.lakeflow.job_run_timeline | Minutes to 1 hour |
LakeSentry can only show data that Databricks has made available in system tables. If cost data appears delayed, the lag is often at the Databricks level rather than the collector level.
Getting help
Section titled “Getting help”If you’ve worked through this guide and the issue persists:
- Gather the Databricks job run logs from the failed run.
- Note the region connector status and last ingestion time from the LakeSentry Connectors page.
- Contact LakeSentry support with these details.
Next steps
Section titled “Next steps”- Collector Deployment — Setup and configuration reference
- Region Connectors — Managing regional data collection
- Data Freshness & Pipeline Status — Understanding end-to-end data latency