Connecting Your Databricks Account

LakeSentry connects to your Databricks account through a service principal with read access to system tables. This page covers how to create the service principal, configure connectors for your Databricks workspaces, and verify data collection.

Connection architecture

LakeSentry uses a per-workspace connector model. You create one connector for each Databricks workspace you want to monitor, and LakeSentry groups them by region. The first connector you create automatically becomes the primary connector, which ingests account-level (global) tables such as billing and workspace metadata in addition to the regional tables.

This structure exists because Databricks system tables are regional. A connector in eastus cannot query system table data for workspaces in westeurope. You need a connector for each region you operate in.

Databricks Account (e.g., Acme Corp)
├── Region: East US
│   └── Connector (primary) → reads system tables for East US workspaces
├── Region: West Europe
│   └── Connector → reads system tables for West Europe workspaces
└── Region: West US 2
    └── (not configured yet — no data collected)

Prerequisites

Databricks account admin access (to create service principals)
Unity Catalog enabled on the account (required for system table access)
Ability to create or schedule jobs in at least one workspace per region

Step 1: Create a service principal

LakeSentry authenticates using OAuth machine-to-machine (M2M) via a Databricks service principal.

Go to your Databricks account console.
Navigate to User Management > Service Principals.
Click Add Service Principal and give it a descriptive name (e.g., lakesentry-reader).
Under OAuth, generate an OAuth secret. Copy both the Client ID and Secret — you’ll need them in the next step.

Step 2: Grant system table permissions

The service principal needs SELECT access to the system tables LakeSentry ingests. Run these SQL statements in a workspace with Unity Catalog enabled:

-- Grant access to billing tables (account-level)
GRANT USE CATALOG ON CATALOG system TO `lakesentry-reader`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.billing.usage TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.billing.list_prices TO `lakesentry-reader`;

-- Grant access to compute tables (regional)
GRANT USE SCHEMA ON SCHEMA system.compute TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.clusters TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.node_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.node_types TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.warehouse_events TO `lakesentry-reader`;

-- Grant access to job/pipeline tables (regional)
GRANT USE SCHEMA ON SCHEMA system.lakeflow TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.jobs TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_tasks TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_run_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_task_run_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.pipelines TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.pipeline_update_timeline TO `lakesentry-reader`;

-- Grant access to query history (regional)
GRANT USE SCHEMA ON SCHEMA system.query TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.query.history TO `lakesentry-reader`;

-- Grant access to workspace metadata
GRANT USE SCHEMA ON SCHEMA system.access TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.workspaces_latest TO `lakesentry-reader`;

Optional tables

For full feature coverage, you can also grant access to:

-- MLflow tracking
GRANT USE SCHEMA ON SCHEMA system.mlflow TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.experiments_latest TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.runs_latest TO `lakesentry-reader`;

-- Model serving
GRANT USE SCHEMA ON SCHEMA system.serving TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.serving.served_entities TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.serving.endpoint_usage TO `lakesentry-reader`;

-- Audit logs (for audit trail features)
GRANT SELECT ON TABLE system.access.audit TO `lakesentry-reader`;

-- Table lineage
GRANT SELECT ON TABLE system.access.table_lineage TO `lakesentry-reader`;

LakeSentry works without these optional tables — the corresponding features (MLflow tracking, model serving costs, audit log, table lineage) will show as unavailable until access is granted.

Step 3: Create connectors

LakeSentry supports two approaches for setting up connectors: workspace discovery (recommended) and manual creation.

Option A: Workspace discovery (recommended)

If your service principal has account-level access, LakeSentry can automatically discover all workspaces in your Databricks account and group them by region:

In LakeSentry, go to Settings > Connectors.
Provide your account-level credentials:
- Account ID — Your Databricks account ID (found in the account console URL or settings).
- Cloud provider — Azure, AWS, or GCP.
- OAuth Client ID — From the service principal you created.
- OAuth Secret — The secret you saved in Step 1.
LakeSentry lists all running workspaces, grouped by region. Select the workspaces you want to monitor.
Click Connect. A connector is created for each selected workspace.

Option B: Manual creation

You can also add connectors one at a time:

In LakeSentry, go to Settings > Connectors.
Enter a workspace URL (e.g., https://adb-1234567890123456.7.azuredatabricks.net).
Select the authentication method (OAuth M2M or Personal Access Token) and provide the corresponding credentials.
Click Test Connection. LakeSentry validates the credentials by listing SQL warehouses in the workspace and probing system table access.
Once validated, click Save.

The cloud provider and region are automatically detected from the workspace URL. All connectors within a tenant must belong to the same Databricks account.

Generate a connection string

Each connector that uses collector mode needs a collector deployed in the corresponding Databricks workspace.

Click Generate Connection String on the connector.
Copy the connection string (starts with LAKESENTRY://).
Store it securely — it contains a one-time token that won’t be shown again.

The connection string encodes everything the collector needs: API URL, connector ID, authentication token, reference catalog/schema, and execution mode. If you need to audit what’s inside, the payload is base64-encoded JSON that you can decode.

Step 4: Deploy the collector

See Collector Deployment for the detailed deployment process. In short:

Upload the collector wheel to your Databricks workspace.
Run lakesentry-collector configure --connection-string "LAKESENTRY://..." to write the .env configuration file.
Create a Databricks Job scheduled every 15 minutes.
Start the schedule.

The collector reads system tables and pushes the data to LakeSentry over HTTPS. It uses checkpoint-based incremental extraction — each run picks up where the last one left off.

Verifying the connection

After the collector runs its first cycle, check the Connectors page:

Connector status: Should show Active once data is received.
Last ingestion: Shows when data was last received.
Ingestion lag: Shows how recently data was collected.

If the status stays Pending after the collector’s first scheduled run, see Collector Troubleshooting for common issues.

Adding more regions

Repeat Steps 3 and 4 for each additional region. LakeSentry automatically aggregates data across all regions into a unified cost view. The Connectors page shows the health status of every connector so you can monitor collection at a glance.

Security notes

LakeSentry connects via a read-only service principal. Write permissions are only needed if you choose to execute optimization actions (and you opt in to that separately).
The service principal accesses system tables only — billing, compute, job, and query metadata. It never touches your business data, notebooks, or query results.
Collector tokens are hashed server-side. LakeSentry stores only the hash, not the plain token.
All data transfer happens over HTTPS.

Next steps

Understanding the Dashboard — Tour the Overview page once data starts flowing
Collector Deployment — Detailed collector setup and configuration
Collector Troubleshooting — Fixing common connection issues