What is LakeSentry
LakeSentry is a Databricks cost investigation and workload optimization platform. It helps platform teams answer “why did our bill spike?” and safely reduce waste — without risking production stability.
What LakeSentry does
Section titled “What LakeSentry does”LakeSentry connects to your Databricks account, ingests data from system tables, and transforms it into an understandable cost model. From there, it surfaces insights about waste and anomalies, and can execute optimization actions with safety guardrails.
| Capability | What it means |
|---|---|
| Cost visibility | See where money goes — by workspace, team, job, warehouse, or SKU |
| Attribution | Connect costs to owners using rules, tags, and identity mapping |
| Investigation | Drill down from an anomaly to root cause in a few clicks |
| Automation | Terminate idle clusters, resize warehouses — only after earning trust |
Problems LakeSentry solves
Section titled “Problems LakeSentry solves””Why did our bill spike?”
Section titled “”Why did our bill spike?””Finance asks about a $50K increase. Your team manually queries system tables, cross-references billing exports, and pieces together the story.
With LakeSentry, you get time-range comparison, cost breakdown by dimension, and anomaly detection — answering the question in minutes instead of hours.
”Who should pay for this?”
Section titled “”Who should pay for this?””Shared clusters, cross-team jobs, no clear ownership. Chargeback reports are guesswork.
LakeSentry provides attribution rules with confidence tiers (exact, strong, estimated, unattributed). It’s transparent about what it can and can’t attribute, so your chargeback numbers hold up under scrutiny.
”We’re wasting money on idle resources”
Section titled “”We’re wasting money on idle resources””Clusters running 24/7 for jobs that run once a day. Warehouses oversized for actual query load.
LakeSentry detects waste and suggests actions with estimated savings. You review what would be saved before approving execution.
”I don’t trust automation with production infrastructure”
Section titled “”I don’t trust automation with production infrastructure””Previous automation tools caused outages or unexpected behavior.
LakeSentry runs read-only by default. All optimization actions require explicit approval. You can enable autopilot for selected safe actions — with guardrails, rate limits, and a kill switch.
Who LakeSentry is for
Section titled “Who LakeSentry is for”Platform and DataOps engineers
Section titled “Platform and DataOps engineers”You manage Databricks infrastructure and own the bill. You need forensic investigation tools that help you trace cost back to specific jobs, clusters, and users — not executive summary charts.
FinOps teams
Section titled “FinOps teams”You handle chargeback and showback reporting. You need attribution you can trust and rules you can configure, not opaque algorithms you can’t explain to stakeholders.
Data and ML teams
Section titled “Data and ML teams”You run training jobs, experiments, and ML pipelines. You need visibility into compute spend per experiment and serving endpoint so you can optimize within your budget.
How it works
Section titled “How it works”LakeSentry follows a three-step flow:
- Connect — Add a read-only service principal and connect your Databricks account. Takes minutes, not days.
- Collect — LakeSentry ingests system tables on a schedule to build a normalized cost ledger.
- Act safely — Review insights, approve changes — or enable autopilot for selected safe actions with guardrails.
For the detailed setup process, see the Quick Start Guide.
Core principles
Section titled “Core principles”LakeSentry is built around a few key design decisions:
- Conservative attribution — LakeSentry shows “unattributed” rather than guessing wrong. Confidence tiers (exact, strong, estimated, unattributed) tell you how much to trust each number.
- Trust-building automation — All actions require manual approval before execution. You opt-in to escalating automation tiers as you build confidence.
- Financial forensics, not real-time ops — Designed for “why did this happen?” rather than “what’s happening right now?” Time-range selectors, drill-down paths, and historical trends.
- Low noise, high signal — Significance scoring instead of alert storms. Every insight is worth reading.
What LakeSentry connects to
Section titled “What LakeSentry connects to”LakeSentry reads from Databricks system tables — billing, compute, jobs, queries, serving, and access metadata. It never accesses your business data, notebooks, or query results (beyond query text for insight quality in the current version).
| Data source | What it provides |
|---|---|
system.billing.* | Billable usage and list prices |
system.compute.* | Cluster and warehouse configuration and utilization |
system.lakeflow.* | Job and pipeline definitions and run history |
system.query.history | SQL statements on warehouses and serverless |
system.serving.* | Model serving endpoints and usage |
system.access.* | Workspace metadata, lineage, and network events |
Next steps
Section titled “Next steps”- Quick Start Guide — Get from signup to your first cost investigation
- How LakeSentry Works — Understand the data pipeline architecture
- Cost Attribution — Learn how costs are assigned to teams and owners