Collector Issues

This page provides a user-focused guide to diagnosing collector problems. It complements the admin-oriented Collector Troubleshooting guide, which covers detailed log interpretation, checkpoint mechanics, and error catalogs.

Is the collector the problem?

Before diving into collector-specific troubleshooting, confirm the collector is actually the issue:

Go to Settings > Connector and select the region connector.
Check the status indicator:
- Green (“Synced”) — Collector has successfully pushed data. The issue is likely elsewhere (see Common Issues or Data Freshness).
- Grey (“Awaiting data”) — No data received yet. The collector may not have run, or it hasn’t completed its first extraction.
- Red (“Error”) — The connector is in an error or failed state. The collector needs attention.
Check Last Sync — This shows when the collector last successfully pushed data. If it’s recent, the collector is working.

Permission problems

Permission issues are the most common category of collector problems. They prevent the collector from reading Databricks system tables.

Symptoms

Region connector shows “Tables received: 0” or a lower count than expected
Collector job logs show TABLE_OR_VIEW_NOT_FOUND or User does not have permission errors
Specific data types are missing from dashboards (e.g., no job data, no warehouse data)

Diagnosis

In LakeSentry, go to the region connector detail page.
Compare the “Tables received” list against the expected tables list.
Any missing table indicates a permission gap.

Resolution

Grant the missing permissions by running the appropriate GRANT SELECT statements as described in Account & Connector Setup.

After granting permissions:

Wait for the next scheduled collector run (or trigger a manual run in Databricks).
Check the “Tables received” list again — the newly accessible tables should appear.

Common permission scenarios

Scenario	What to check
All tables missing	Service principal may not have `USE CATALOG` on the `system` catalog
Only billing tables present	Service principal has account-level access but is missing workspace-level grants
MLflow/Serving tables missing	These require optional permissions that aren’t part of the default setup
Tables worked before but stopped	Service principal permissions may have been revoked, or Unity Catalog metastore was reassigned

Network and connectivity

Symptoms

Collector job fails with connection timeouts or DNS resolution errors
Logs show Cannot resolve host or Connection timed out
Job succeeds in one workspace but fails in another

Diagnosis

The collector needs outbound HTTPS access (port 443) to api.lakesentry.io. If your Databricks workspace uses a private VPC, firewall, or PrivateLink configuration, this traffic may be blocked.

Resolution

Firewall rules — Add *.lakesentry.io to your outbound HTTPS allowlist.
Proxy configuration — If the workspace routes traffic through a proxy, set the HTTPS_PROXY environment variable in the Databricks cluster configuration used by the collector job.
PrivateLink workspaces — PrivateLink restricts outbound traffic by default. You’ll need to add an outbound rule or configure a NAT gateway for LakeSentry API access.

Test connectivity from within Databricks by running a simple notebook cell:

import requests
response = requests.get("https://api.lakesentry.io/health")
print(response.status_code)  # Should print 200

If this fails, the issue is network-level, not collector-specific.

Collector job not running

Symptoms

Region connector status shows “Awaiting data” or “Error”
“Last Sync” timestamp is hours or days old
No recent job runs visible in Databricks Workflows

Diagnosis

Open your Databricks workspace and go to Workflows > Jobs.
Find the LakeSentry collector job.
Check:
- Is the job still present? (It may have been accidentally deleted.)
- Is the schedule enabled? (It may have been paused.)
- What does the most recent run show?

Common causes

Cause	Fix
Job was deleted	Re-create the job following Collector Deployment
Schedule was disabled	Re-enable the schedule in the job configuration
Cluster failed to start	Check Databricks cluster events — common causes are cloud capacity limits or configuration changes
Previous run still in progress	If runs are overlapping, set maximum concurrent runs to 1 and wait for the current run to complete
Databricks workspace issue	Check the Databricks workspace status page for outages

Extraction failures

Symptoms

Collector job runs but fails partway through
Some tables are extracted while others error
Region connector shows partial data

Diagnosis

Check the Databricks job run output. The collector logs which tables succeeded and which failed.

Common extraction errors

Error message	Cause	Resolution
`Unity Catalog is not enabled`	Workspace doesn’t have UC enabled	Enable Unity Catalog, or use a different workspace
`Table or view not found`	Table doesn’t exist in this Databricks tier/region	Expected behavior — the collector skips unavailable tables. See Collector Troubleshooting.
`Query execution timeout`	System table query is slow (usually `system.query.history` on large accounts)	This is usually transient. The collector retries on the next run. If persistent, contact LakeSentry support.
`Rate limit exceeded`	Too many API calls to Databricks	The collector has built-in rate limiting. If this occurs, it’s usually due to other workloads competing for API quota. Try scheduling the collector during off-peak hours.

Connection string issues

Symptoms

Collector fails immediately with authentication errors
Logs show Invalid collector token or HTTP 401 errors

Causes and fixes

Cause	Fix
Token was regenerated	Generating a new connection string invalidates the previous token. If someone regenerated the connection string, the old token no longer works. Generate a new one and reconfigure the collector.
Copied incorrectly	Connection strings are long base64 values. Verify the full string was copied without truncation. Re-generate if unsure.

Reconfiguring with a new connection string

In LakeSentry, go to the region connector and click Generate Connection String.
Copy the new connection string.
In Databricks, update the collector job’s configuration with the new string.
Run the collector job manually to verify it authenticates successfully.

Performance concerns

Collector runs are slow

Typical collector run times:

Account size	Expected duration
Small (< 50 workspaces, < 1M queries/month)	2–5 minutes
Medium (50–200 workspaces)	5–15 minutes
Large (200+ workspaces, 10M+ queries/month)	15–45 minutes

If runs are significantly slower than expected:

First run — The initial extraction includes all available history and is significantly longer than incremental runs. This is normal.
After checkpoint reset — Similar to a first run, re-extracting historical data takes longer.
Large query history — system.query.history is typically the largest table. If it’s causing slowness, this is usually proportional to query volume and expected.

Collector impact on Databricks

The collector runs as a standard Databricks job:

It uses a job cluster that starts and stops with each run — no persistent compute cost.
Queries against system tables are lightweight read operations.
The collector does not interfere with production workloads.

The compute cost of running the collector is typically negligible (a few cents per run for a small cluster running 5–15 minutes).

Escalation checklist

If you’ve worked through this guide and the issue persists, gather the following before contacting support:

Region connector status and last sync time from LakeSentry
Databricks job run ID and run output/logs from the failed run
Error messages — exact text, not paraphrased
Recent changes — Did anything change in the Databricks environment (permissions, network, workspace configuration)?
Timeline — When did it last work? When did it break?

Next steps

Collector Troubleshooting — Detailed admin guide with log interpretation and checkpoint mechanics
Collector Deployment — Setup and configuration reference
Common Issues — Broader troubleshooting for non-collector issues
Data Freshness & Pipeline Status — Understanding end-to-end data latency