Collector Issues
This page provides a user-focused guide to diagnosing collector problems. It complements the admin-oriented Collector Troubleshooting guide, which covers detailed log interpretation, checkpoint mechanics, and error catalogs.
Is the collector the problem?
Section titled “Is the collector the problem?”Before diving into collector-specific troubleshooting, confirm the collector is actually the issue:
- Go to Settings > Connector and select the region connector.
- Check the status indicator:
- Green (“Synced”) — Collector has successfully pushed data. The issue is likely elsewhere (see Common Issues or Data Freshness).
- Grey (“Awaiting data”) — No data received yet. The collector may not have run, or it hasn’t completed its first extraction.
- Red (“Error”) — The connector is in an error or failed state. The collector needs attention.
- Check Last Sync — This shows when the collector last successfully pushed data. If it’s recent, the collector is working.
Permission problems
Section titled “Permission problems”Permission issues are the most common category of collector problems. They prevent the collector from reading Databricks system tables.
Symptoms
Section titled “Symptoms”- Region connector shows “Tables received: 0” or a lower count than expected
- Collector job logs show
TABLE_OR_VIEW_NOT_FOUNDorUser does not have permissionerrors - Specific data types are missing from dashboards (e.g., no job data, no warehouse data)
Diagnosis
Section titled “Diagnosis”- In LakeSentry, go to the region connector detail page.
- Compare the “Tables received” list against the expected tables list.
- Any missing table indicates a permission gap.
Resolution
Section titled “Resolution”Grant the missing permissions by running the appropriate GRANT SELECT statements as described in Account & Connector Setup.
After granting permissions:
- Wait for the next scheduled collector run (or trigger a manual run in Databricks).
- Check the “Tables received” list again — the newly accessible tables should appear.
Common permission scenarios
Section titled “Common permission scenarios”| Scenario | What to check |
|---|---|
| All tables missing | Service principal may not have USE CATALOG on the system catalog |
| Only billing tables present | Service principal has account-level access but is missing workspace-level grants |
| MLflow/Serving tables missing | These require optional permissions that aren’t part of the default setup |
| Tables worked before but stopped | Service principal permissions may have been revoked, or Unity Catalog metastore was reassigned |
Network and connectivity
Section titled “Network and connectivity”Symptoms
Section titled “Symptoms”- Collector job fails with connection timeouts or DNS resolution errors
- Logs show
Cannot resolve hostorConnection timed out - Job succeeds in one workspace but fails in another
Diagnosis
Section titled “Diagnosis”The collector needs outbound HTTPS access (port 443) to api.lakesentry.io. If your Databricks workspace uses a private VPC, firewall, or PrivateLink configuration, this traffic may be blocked.
Resolution
Section titled “Resolution”- Firewall rules — Add
*.lakesentry.ioto your outbound HTTPS allowlist. - Proxy configuration — If the workspace routes traffic through a proxy, set the
HTTPS_PROXYenvironment variable in the Databricks cluster configuration used by the collector job. - PrivateLink workspaces — PrivateLink restricts outbound traffic by default. You’ll need to add an outbound rule or configure a NAT gateway for LakeSentry API access.
Collector job not running
Section titled “Collector job not running”Symptoms
Section titled “Symptoms”- Region connector status shows “Awaiting data” or “Error”
- “Last Sync” timestamp is hours or days old
- No recent job runs visible in Databricks Workflows
Diagnosis
Section titled “Diagnosis”- Open your Databricks workspace and go to Workflows > Jobs.
- Find the LakeSentry collector job.
- Check:
- Is the job still present? (It may have been accidentally deleted.)
- Is the schedule enabled? (It may have been paused.)
- What does the most recent run show?
Common causes
Section titled “Common causes”| Cause | Fix |
|---|---|
| Job was deleted | Re-create the job following Collector Deployment |
| Schedule was disabled | Re-enable the schedule in the job configuration |
| Cluster failed to start | Check Databricks cluster events — common causes are cloud capacity limits or configuration changes |
| Previous run still in progress | If runs are overlapping, set maximum concurrent runs to 1 and wait for the current run to complete |
| Databricks workspace issue | Check the Databricks workspace status page for outages |
Extraction failures
Section titled “Extraction failures”Symptoms
Section titled “Symptoms”- Collector job runs but fails partway through
- Some tables are extracted while others error
- Region connector shows partial data
Diagnosis
Section titled “Diagnosis”Check the Databricks job run output. The collector logs which tables succeeded and which failed.
Common extraction errors
Section titled “Common extraction errors”| Error message | Cause | Resolution |
|---|---|---|
Unity Catalog is not enabled | Workspace doesn’t have UC enabled | Enable Unity Catalog, or use a different workspace |
Table or view not found | Table doesn’t exist in this Databricks tier/region | Expected behavior — the collector skips unavailable tables. See Collector Troubleshooting. |
Query execution timeout | System table query is slow (usually system.query.history on large accounts) | This is usually transient. The collector retries on the next run. If persistent, contact LakeSentry support. |
Rate limit exceeded | Too many API calls to Databricks | The collector has built-in rate limiting. If this occurs, it’s usually due to other workloads competing for API quota. Try scheduling the collector during off-peak hours. |
Connection string issues
Section titled “Connection string issues”Symptoms
Section titled “Symptoms”- Collector fails immediately with authentication errors
- Logs show
Invalid collector tokenor HTTP 401 errors
Causes and fixes
Section titled “Causes and fixes”| Cause | Fix |
|---|---|
| Token was regenerated | Generating a new connection string invalidates the previous token. If someone regenerated the connection string, the old token no longer works. Generate a new one and reconfigure the collector. |
| Copied incorrectly | Connection strings are long base64 values. Verify the full string was copied without truncation. Re-generate if unsure. |
Reconfiguring with a new connection string
Section titled “Reconfiguring with a new connection string”- In LakeSentry, go to the region connector and click Generate Connection String.
- Copy the new connection string.
- In Databricks, update the collector job’s configuration with the new string.
- Run the collector job manually to verify it authenticates successfully.
Performance concerns
Section titled “Performance concerns”Collector runs are slow
Section titled “Collector runs are slow”Typical collector run times:
| Account size | Expected duration |
|---|---|
| Small (< 50 workspaces, < 1M queries/month) | 2–5 minutes |
| Medium (50–200 workspaces) | 5–15 minutes |
| Large (200+ workspaces, 10M+ queries/month) | 15–45 minutes |
If runs are significantly slower than expected:
- First run — The initial extraction includes all available history and is significantly longer than incremental runs. This is normal.
- After checkpoint reset — Similar to a first run, re-extracting historical data takes longer.
- Large query history —
system.query.historyis typically the largest table. If it’s causing slowness, this is usually proportional to query volume and expected.
Collector impact on Databricks
Section titled “Collector impact on Databricks”The collector runs as a standard Databricks job:
- It uses a job cluster that starts and stops with each run — no persistent compute cost.
- Queries against system tables are lightweight read operations.
- The collector does not interfere with production workloads.
The compute cost of running the collector is typically negligible (a few cents per run for a small cluster running 5–15 minutes).
Escalation checklist
Section titled “Escalation checklist”If you’ve worked through this guide and the issue persists, gather the following before contacting support:
- Region connector status and last sync time from LakeSentry
- Databricks job run ID and run output/logs from the failed run
- Error messages — exact text, not paraphrased
- Recent changes — Did anything change in the Databricks environment (permissions, network, workspace configuration)?
- Timeline — When did it last work? When did it break?
Next steps
Section titled “Next steps”- Collector Troubleshooting — Detailed admin guide with log interpretation and checkpoint mechanics
- Collector Deployment — Setup and configuration reference
- Common Issues — Broader troubleshooting for non-collector issues
- Data Freshness & Pipeline Status — Understanding end-to-end data latency