Skip to content

Collector Issues

This page provides a user-focused guide to diagnosing collector problems. It complements the admin-oriented Collector Troubleshooting guide, which covers detailed log interpretation, checkpoint mechanics, and error catalogs.

Before diving into collector-specific troubleshooting, confirm the collector is actually the issue:

  1. Go to Settings > Connector and select the region connector.
  2. Check the status indicator:
    • Green (“Synced”) — Collector has successfully pushed data. The issue is likely elsewhere (see Common Issues or Data Freshness).
    • Grey (“Awaiting data”) — No data received yet. The collector may not have run, or it hasn’t completed its first extraction.
    • Red (“Error”) — The connector is in an error or failed state. The collector needs attention.
  3. Check Last Sync — This shows when the collector last successfully pushed data. If it’s recent, the collector is working.

Permission issues are the most common category of collector problems. They prevent the collector from reading Databricks system tables.

  • Region connector shows “Tables received: 0” or a lower count than expected
  • Collector job logs show TABLE_OR_VIEW_NOT_FOUND or User does not have permission errors
  • Specific data types are missing from dashboards (e.g., no job data, no warehouse data)
  1. In LakeSentry, go to the region connector detail page.
  2. Compare the “Tables received” list against the expected tables list.
  3. Any missing table indicates a permission gap.

Grant the missing permissions by running the appropriate GRANT SELECT statements as described in Account & Connector Setup.

After granting permissions:

  1. Wait for the next scheduled collector run (or trigger a manual run in Databricks).
  2. Check the “Tables received” list again — the newly accessible tables should appear.
ScenarioWhat to check
All tables missingService principal may not have USE CATALOG on the system catalog
Only billing tables presentService principal has account-level access but is missing workspace-level grants
MLflow/Serving tables missingThese require optional permissions that aren’t part of the default setup
Tables worked before but stoppedService principal permissions may have been revoked, or Unity Catalog metastore was reassigned
  • Collector job fails with connection timeouts or DNS resolution errors
  • Logs show Cannot resolve host or Connection timed out
  • Job succeeds in one workspace but fails in another

The collector needs outbound HTTPS access (port 443) to api.lakesentry.io. If your Databricks workspace uses a private VPC, firewall, or PrivateLink configuration, this traffic may be blocked.

  • Firewall rules — Add *.lakesentry.io to your outbound HTTPS allowlist.
  • Proxy configuration — If the workspace routes traffic through a proxy, set the HTTPS_PROXY environment variable in the Databricks cluster configuration used by the collector job.
  • PrivateLink workspaces — PrivateLink restricts outbound traffic by default. You’ll need to add an outbound rule or configure a NAT gateway for LakeSentry API access.
  • Region connector status shows “Awaiting data” or “Error”
  • “Last Sync” timestamp is hours or days old
  • No recent job runs visible in Databricks Workflows
  1. Open your Databricks workspace and go to Workflows > Jobs.
  2. Find the LakeSentry collector job.
  3. Check:
    • Is the job still present? (It may have been accidentally deleted.)
    • Is the schedule enabled? (It may have been paused.)
    • What does the most recent run show?
CauseFix
Job was deletedRe-create the job following Collector Deployment
Schedule was disabledRe-enable the schedule in the job configuration
Cluster failed to startCheck Databricks cluster events — common causes are cloud capacity limits or configuration changes
Previous run still in progressIf runs are overlapping, set maximum concurrent runs to 1 and wait for the current run to complete
Databricks workspace issueCheck the Databricks workspace status page for outages
  • Collector job runs but fails partway through
  • Some tables are extracted while others error
  • Region connector shows partial data

Check the Databricks job run output. The collector logs which tables succeeded and which failed.

Error messageCauseResolution
Unity Catalog is not enabledWorkspace doesn’t have UC enabledEnable Unity Catalog, or use a different workspace
Table or view not foundTable doesn’t exist in this Databricks tier/regionExpected behavior — the collector skips unavailable tables. See Collector Troubleshooting.
Query execution timeoutSystem table query is slow (usually system.query.history on large accounts)This is usually transient. The collector retries on the next run. If persistent, contact LakeSentry support.
Rate limit exceededToo many API calls to DatabricksThe collector has built-in rate limiting. If this occurs, it’s usually due to other workloads competing for API quota. Try scheduling the collector during off-peak hours.
  • Collector fails immediately with authentication errors
  • Logs show Invalid collector token or HTTP 401 errors
CauseFix
Token was regeneratedGenerating a new connection string invalidates the previous token. If someone regenerated the connection string, the old token no longer works. Generate a new one and reconfigure the collector.
Copied incorrectlyConnection strings are long base64 values. Verify the full string was copied without truncation. Re-generate if unsure.

Reconfiguring with a new connection string

Section titled “Reconfiguring with a new connection string”
  1. In LakeSentry, go to the region connector and click Generate Connection String.
  2. Copy the new connection string.
  3. In Databricks, update the collector job’s configuration with the new string.
  4. Run the collector job manually to verify it authenticates successfully.

Typical collector run times:

Account sizeExpected duration
Small (< 50 workspaces, < 1M queries/month)2–5 minutes
Medium (50–200 workspaces)5–15 minutes
Large (200+ workspaces, 10M+ queries/month)15–45 minutes

If runs are significantly slower than expected:

  • First run — The initial extraction includes all available history and is significantly longer than incremental runs. This is normal.
  • After checkpoint reset — Similar to a first run, re-extracting historical data takes longer.
  • Large query historysystem.query.history is typically the largest table. If it’s causing slowness, this is usually proportional to query volume and expected.

The collector runs as a standard Databricks job:

  • It uses a job cluster that starts and stops with each run — no persistent compute cost.
  • Queries against system tables are lightweight read operations.
  • The collector does not interfere with production workloads.

The compute cost of running the collector is typically negligible (a few cents per run for a small cluster running 5–15 minutes).

If you’ve worked through this guide and the issue persists, gather the following before contacting support:

  1. Region connector status and last sync time from LakeSentry
  2. Databricks job run ID and run output/logs from the failed run
  3. Error messages — exact text, not paraphrased
  4. Recent changes — Did anything change in the Databricks environment (permissions, network, workspace configuration)?
  5. Timeline — When did it last work? When did it break?