Account & Connector Setup

Account connectors are the bridge between LakeSentry and your Databricks account. Each connector authenticates via a service principal and provides access to billing data, compute metadata, and workload history through Databricks system tables.

This page covers the setup process from creating credentials to verifying connectivity. For adding additional regions after initial setup, see Region Connectors.

Prerequisites

Before creating an account connector, ensure you have:

Databricks account admin access (to create service principals and grant permissions)
Unity Catalog enabled on your Databricks account (required for system table access)
At least one workspace per region where you can create or schedule jobs (for the collector)
Your Databricks account ID (found in the account console URL or settings page)

Step 1: Create a service principal

LakeSentry authenticates using OAuth machine-to-machine (M2M) via a Databricks service principal.

Go to your Databricks account console.
Navigate to User Management > Service Principals.
Click Add Service Principal and give it a descriptive name (e.g., lakesentry-reader).
Under OAuth, generate an OAuth secret. Copy both the Client ID and Secret.

Step 2: Grant system table permissions

The service principal needs SELECT access to the system tables LakeSentry ingests. Run these SQL statements in a workspace with Unity Catalog enabled:

-- Grant access to billing tables (account-level)
GRANT USE CATALOG ON CATALOG system TO `lakesentry-reader`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.billing.usage TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.billing.list_prices TO `lakesentry-reader`;

-- Grant access to compute tables (regional)
GRANT USE SCHEMA ON SCHEMA system.compute TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.clusters TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.node_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.node_types TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.compute.warehouse_events TO `lakesentry-reader`;

-- Grant access to job/pipeline tables (regional)
GRANT USE SCHEMA ON SCHEMA system.lakeflow TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.jobs TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_tasks TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_run_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.job_task_run_timeline TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.pipelines TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.lakeflow.pipeline_update_timeline TO `lakesentry-reader`;

-- Grant access to query history (regional)
GRANT USE SCHEMA ON SCHEMA system.query TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.query.history TO `lakesentry-reader`;

-- Grant access to workspace metadata
GRANT USE SCHEMA ON SCHEMA system.access TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.access.workspaces_latest TO `lakesentry-reader`;

Optional tables

For full feature coverage, grant access to these additional tables:

-- MLflow tracking (for ML pipeline cost tracking)
GRANT USE SCHEMA ON SCHEMA system.mlflow TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.experiments_latest TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.mlflow.runs_latest TO `lakesentry-reader`;

-- Model serving (for serving endpoint costs)
GRANT USE SCHEMA ON SCHEMA system.serving TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.serving.served_entities TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.serving.endpoint_usage TO `lakesentry-reader`;

-- Audit logs (for audit trail features)
GRANT SELECT ON TABLE system.access.audit TO `lakesentry-reader`;

-- Table lineage (for lineage-based cost attribution)
GRANT SELECT ON TABLE system.access.table_lineage TO `lakesentry-reader`;

-- Storage metadata (for storage cost tracking)
GRANT USE SCHEMA ON SCHEMA system.storage TO `lakesentry-reader`;
GRANT SELECT ON TABLE system.storage.predictive_optimization_operations_history TO `lakesentry-reader`;

LakeSentry works without these optional tables — the corresponding features (MLflow, Model Serving, Storage, Audit Log) show as unavailable until access is granted.

Step 3: Create the account connector

In LakeSentry, go to Settings > Connectors.
Click Add Account Connector.
Fill in the required fields:

Field	Description
Workspace URL	The URL of any Databricks workspace in your account (e.g., `https://adb-1234567890123456.7.azuredatabricks.net`). Cloud provider is auto-detected from the URL.
OAuth Client ID	The client ID from the service principal you created
OAuth Secret	The secret you saved in Step 1

Click Validate Credentials. LakeSentry validates the credentials by listing SQL warehouses in your workspace.
Once validated, click Connect Workspace. The connector status shows as Active.

What the test validates

The connection test checks:

OAuth credentials are valid and not expired
The service principal can list SQL warehouses (workspace-level API access)
At least one SQL warehouse exists in the workspace
The service principal can SELECT from system tables (probed automatically)

If the test fails, verify that the service principal has workspace-level access, at least one SQL warehouse exists, and the OAuth secret hasn’t expired.

Step 4: Add your first region

After creating the account connector, add a region connector for each Databricks region you operate in.

On the Connectors page, click Add Region.
Select the region (e.g., eastus, westeurope, us-west-2).
Enter a workspace URL from that region (e.g., https://adb-1234567890123456.7.azuredatabricks.net).
Click Save.

For detailed information on multi-region configuration, see Region Connectors.

Generate a connection string

Each region connector needs a collector deployed in the corresponding Databricks workspace.

Click Generate Connection String on the region connector.
Copy the connection string (starts with LAKESENTRY://).
Store it securely — it contains a one-time token that won’t be shown again.

The connection string encodes the API URL, connector ID, authentication token, and configuration. If you need to audit its contents, the payload is base64-encoded JSON that you can decode.

Step 5: Deploy the collector

See Collector Deployment for the full deployment process. The short version:

Upload the collector wheel to your Databricks workspace.
Run lakesentry-collector configure --connection-string "LAKESENTRY://..." to set up the environment.
Create a Databricks Job scheduled every 15 minutes.
Start the schedule.

The collector runs for about 5 minutes per cycle, reading system tables and pushing the data to LakeSentry over HTTPS. It uses checkpoint-based incremental extraction — each run picks up where the last one left off.

Verifying connectivity

After the collector completes its first cycle, check the Connectors page:

Indicator	Healthy state
Region health	OK (green)
Last ingestion	Shows a recent timestamp
Tables received	Lists successfully extracted system tables

If the status stays Pending after the first scheduled run, see Collector Troubleshooting for common issues.

Security model

LakeSentry connects via a read-only service principal. Write permissions are only needed if you choose to execute optimization actions (and you opt in separately).
The service principal accesses system tables only — billing, compute, job, and query metadata. It never touches your business data, notebooks, or query results.
Collector tokens are hashed server-side. LakeSentry stores only the hash, not the plain token.
All data transfer happens over HTTPS.

Removing a connector

To disconnect LakeSentry from your Databricks account:

Stop the collectors — Disable or delete the Databricks jobs running the collector in each region.
Delete region connectors — Remove each region connector from the Connectors page.
Delete the account connector — Remove the account connector.
Revoke the service principal — In the Databricks account console, delete the service principal or rotate its OAuth secret.

Next steps

Region Connectors — Multi-region setup and management
Collector Deployment — Detailed collector installation guide
Collector Troubleshooting — Diagnosing connection issues