Skip to content

Collector Troubleshooting

This page covers diagnosing and resolving issues with the LakeSentry collector. For initial setup, see Collector Deployment.

Start here when something looks wrong:

SymptomLikely causeJump to
Region connector stuck on “Pending”Collector never ran or failed on first runFirst run failures
Region health shows “Warning” (yellow)Collector hasn’t pushed data in 2+ hoursMissed runs
Region health shows “Error” (red)No data in 30+ hours, or validation errorConnection errors
Job run fails in DatabricksCollector error during extractionExtraction errors
Data appears stale or incompleteCheckpoint issues or missing table permissionsCheckpoint issues
Some tables missing from “Tables received”Permission not granted for those tablesMissing tables

If the region connector never moves past Pending status:

  1. Check the Databricks job — Go to Workflows > Jobs and verify the collector job exists and has run at least once.
  2. Check the job run status — Click the most recent run. If it failed, read the error output.
  3. Verify the connection string — The connection string is consumed during configure. If configuration failed, generate a new connection string and re-run the configure step.

Common first-run errors:

ErrorCauseFix
Connection string is not valid base64Malformed or truncated connection stringGenerate a new connection string from the region connector
Connection string is missing required fieldsIncomplete or expired connection stringGenerate a new connection string
LakeSentry API is not reachableNetwork connectivity issue or invalid API URLVerify the workspace has outbound HTTPS access to LakeSentry
Module not found: lakesentry_collectorWheel not installed or wrong pathCheck the wheel file path and re-install

The collector needs outbound HTTPS access to api.lakesentry.io (or your tenant’s API endpoint). If your Databricks workspace uses a private network or firewall:

  • Add *.lakesentry.io to the allowlist for outbound HTTPS (port 443).
  • If using a proxy, configure the HTTPS_PROXY environment variable in the cluster configuration.

If the collector was working and suddenly fails authentication:

  • Connection string was regenerated — Someone generated a new connection string, invalidating the old token. Reconfigure the collector with the new string.
  • Region connector was deleted and re-created — The connector ID changed. Reconfigure with the new connection string.

Extraction errors occur when the collector fails to read from Databricks system tables.

Error: [TABLE_OR_VIEW_NOT_FOUND] Table or view not found: system.compute.clusters

or

Error: User does not have permission to SELECT on table system.compute.clusters

The service principal is missing SELECT access to the specified table. Grant permissions as described in Account & Connector Setup.

Error: Unity Catalog is not enabled for this workspace

System tables require Unity Catalog. Enable it on your Databricks account, or use a workspace that already has it enabled.

Some system tables are only available in certain Databricks pricing tiers or regions. If a table doesn’t exist:

  • The collector logs a warning and skips the table.
  • LakeSentry marks the corresponding feature as unavailable.
  • Other tables are still extracted normally.

This is expected behavior — the collector continues with whatever tables are accessible.

If the collector misses scheduled runs:

  1. Check the Databricks job schedule — Verify the job is still enabled and the schedule hasn’t been modified.
  2. Check cluster availability — If the job cluster fails to start (e.g., cloud provider capacity issues), the run fails before the collector code executes.
  3. Check for long-running previous runs — If a run takes longer than the schedule interval, Databricks may skip or queue subsequent runs depending on the concurrency setting.

The collector uses checkpoint-based incremental extraction. Each table has a watermark that tracks the last successfully extracted position.

Table typeWatermark columnStrategy
system.billing.usageusage_start_timeIncremental — extract rows after the watermark
system.query.historystart_timeIncremental
system.compute.clusterscreate_timeIncremental
system.lakeflow.jobschange_timeIncremental
system.compute.node_typesFull snapshot (reference table)

Incremental tables use the watermark to avoid re-extracting data. Full-snapshot tables are extracted completely on each run (they’re small reference tables).

If a run fails partway through extraction:

  • Completed tables — Their checkpoints are already updated. The next run skips them.
  • In-progress table — The checkpoint was not updated. The next run re-extracts from the last successful watermark.
  • Remaining tables — Not yet attempted. The next run extracts them normally.

This means the collector is crash-safe — you can safely stop a run at any point and the next run recovers automatically. There’s no need to manually reset checkpoints in normal operation.

In rare cases, you may need to re-extract historical data (e.g., after granting access to a previously inaccessible table). To reset checkpoints:

  1. In LakeSentry, go to the region connector detail page.
  2. Click Reset Checkpoints for the specific table, or Reset All for a full re-extraction.
  3. The next collector run starts from the beginning for those tables.

If the “Tables received” list on the region connector doesn’t include expected tables:

Missing table categoryCheck
Billing tablesThese are account-level. They may only appear for one region if you have multiple regions.
MLflow tablesRequires optional permissions. See optional tables.
Serving tablesRequires optional permissions.
Audit tablesRequires optional permissions.
All regional tablesVerify the service principal has workspace-level access in the target region.

Collector logs are available in the Databricks job run output. Key log entries to look for:

Log levelMessage patternMeaning
INFOStarting collectionNormal — extraction cycle beginning
INFOExtracted (with table, records, status fields)Normal — successful table extraction
INFOExtraction batch pushed successfullyNormal — data sent successfully
WARNTable not available in workspace, skippingPermission not granted or table does not exist
INFONo records foundNormal — no data in the extraction window
ERRORLakeSentry API is not reachableConnection string is invalid or API endpoint is down
ERRORAPI error response (with status_code)LakeSentry API rejected the push
ERRORExtraction failed / Table extraction failedQuery error on a specific table

Data shows in LakeSentry but appears incomplete

Section titled “Data shows in LakeSentry but appears incomplete”
  • Some workspaces missing — The workspace may be in a region without a region connector. Check the region mapping.
  • Recent data missing — The collector may have a stuck checkpoint. Check the extraction checkpoints on the region connector detail page.
  • Specific data type missing — Check if the corresponding system table is in the “Tables received” list.

Collector runs succeed but no data in LakeSentry

Section titled “Collector runs succeed but no data in LakeSentry”
  • Check the LakeSentry ingestion pipeline — Data flows through extraction → ingestion → transformation. If extraction succeeds but data doesn’t appear in dashboards, the issue may be downstream. Check the Connectors page for ingestion errors.
  • Check for deduplication — If the same data was already ingested (e.g., from a previous run with the same extraction ID), it’s deduplicated silently. This is expected.

Databricks system tables themselves have latency:

TableTypical Databricks latency
system.billing.usage1-4 hours behind real-time
system.compute.clustersNear real-time
system.query.historyMinutes to 1 hour
system.lakeflow.job_run_timelineMinutes to 1 hour

LakeSentry can only show data that Databricks has made available in system tables. If cost data appears delayed, the lag is often at the Databricks level rather than the collector level.

If you’ve worked through this guide and the issue persists:

  1. Gather the Databricks job run logs from the failed run.
  2. Note the region connector status and last ingestion time from the LakeSentry Connectors page.
  3. Contact LakeSentry support with these details.