What is LakeSentry

LakeSentry is a Databricks cost investigation and workload optimization platform. It helps platform teams answer “why did our bill spike?” and safely reduce waste — without risking production stability.

What LakeSentry does

LakeSentry connects to your Databricks account, ingests data from system tables, and transforms it into an understandable cost model. From there, it surfaces insights about waste and anomalies, and can execute optimization actions with safety guardrails.

Capability	What it means
Cost visibility	See where money goes — by workspace, team, job, warehouse, or SKU
Attribution	Connect costs to owners using rules, tags, and identity mapping
Investigation	Drill down from an anomaly to root cause in a few clicks
Automation	Terminate idle clusters, resize warehouses — only after earning trust

Problems LakeSentry solves

”Why did our bill spike?”

Finance asks about a $50K increase. Your team manually queries system tables, cross-references billing exports, and pieces together the story.

With LakeSentry, you get time-range comparison, cost breakdown by dimension, and anomaly detection — answering the question in minutes instead of hours.

”Who should pay for this?”

Shared clusters, cross-team jobs, no clear ownership. Chargeback reports are guesswork.

LakeSentry provides attribution rules with confidence tiers (exact, strong, estimated, unattributed). It’s transparent about what it can and can’t attribute, so your chargeback numbers hold up under scrutiny.

”We’re wasting money on idle resources”

Clusters running 24/7 for jobs that run once a day. Warehouses oversized for actual query load.

LakeSentry detects waste and suggests actions with estimated savings. You review what would be saved before approving execution.

”I don’t trust automation with production infrastructure”

Previous automation tools caused outages or unexpected behavior.

LakeSentry runs read-only by default. All optimization actions require explicit approval. You can enable autopilot for selected safe actions — with guardrails, rate limits, and a kill switch.

Who LakeSentry is for

Platform and DataOps engineers

You manage Databricks infrastructure and own the bill. You need forensic investigation tools that help you trace cost back to specific jobs, clusters, and users — not executive summary charts.

FinOps teams

You handle chargeback and showback reporting. You need attribution you can trust and rules you can configure, not opaque algorithms you can’t explain to stakeholders.

Data and ML teams

You run training jobs, experiments, and ML pipelines. You need visibility into compute spend per experiment and serving endpoint so you can optimize within your budget.

How it works

LakeSentry follows a three-step flow:

Connect — Add a read-only service principal and connect your Databricks account. Takes minutes, not days.
Collect — LakeSentry ingests system tables on a schedule to build a normalized cost ledger.
Act safely — Review insights, approve changes — or enable autopilot for selected safe actions with guardrails.

For the detailed setup process, see the Quick Start Guide.

Core principles

LakeSentry is built around a few key design decisions:

Conservative attribution — LakeSentry shows “unattributed” rather than guessing wrong. Confidence tiers (exact, strong, estimated, unattributed) tell you how much to trust each number.
Trust-building automation — All actions require manual approval before execution. You opt-in to escalating automation tiers as you build confidence.
Financial forensics, not real-time ops — Designed for “why did this happen?” rather than “what’s happening right now?” Time-range selectors, drill-down paths, and historical trends.
Low noise, high signal — Significance scoring instead of alert storms. Every insight is worth reading.

What LakeSentry connects to

LakeSentry reads from Databricks system tables — billing, compute, jobs, queries, serving, and access metadata. It never accesses your business data, notebooks, or query results (beyond query text for insight quality in the current version).

Data source	What it provides
`system.billing.*`	Billable usage and list prices
`system.compute.*`	Cluster and warehouse configuration and utilization
`system.lakeflow.*`	Job and pipeline definitions and run history
`system.query.history`	SQL statements on warehouses and serverless
`system.serving.*`	Model serving endpoints and usage
`system.access.*`	Workspace metadata, lineage, and network events

Next steps

Quick Start Guide — Get from signup to your first cost investigation
How LakeSentry Works — Understand the data pipeline architecture
Cost Attribution — Learn how costs are assigned to teams and owners