Waste Detection & Insights

LakeSentry continuously scans your Databricks environment for wasted spend — resources that are running but not doing useful work, infrastructure that’s oversized for its actual load, and configurations that cost more than they need to.

Types of waste detected

LakeSentry looks for waste across several categories:

Idle clusters

Interactive clusters that are in a RUNNING state but haven’t executed any jobs for an extended period. This is the most common type of Databricks waste — clusters left running after a developer finishes their work, or all-purpose clusters with auto-termination disabled.

Idle duration	Severity
2+ hours	Medium
6+ hours	High
12+ hours	Critical

Each idle cluster insight includes an estimated waste amount in dollars, calculated from the cluster’s hourly cost multiplied by the idle duration.

Overprovisioned workers

Fixed-size clusters with more workers than their actual utilization requires. LakeSentry analyzes CPU and memory utilization over time and recommends a reduced worker count.

The detection uses a median-based approach rather than simple averaging, which handles bursty workloads better. A cluster that spikes to 100% CPU for 5 minutes per hour but sits at 10% the rest of the time doesn’t need to be sized for the peak.

The algorithm:

For each time interval, calculate the minimum workers needed to keep CPU below 85% and memory below 90%
Take the median across all intervals
Compare against the current worker count

Excess workers	Severity
3+ workers	High
2 workers	Medium
1 worker	Low

Weekend waste

Non-production workspaces with significant spend during weekends. This flags environments where dev/staging clusters could be shut down when nobody is working.

Zombie model serving endpoints

Model serving endpoints that haven’t received any inference requests in 90+ days. These endpoints incur cost even without traffic, and may have been left running after a model was retired.

Long auto-termination settings

Interactive clusters with auto-termination disabled or set above 120 minutes. Clusters with auto-termination disabled are flagged as critical severity since they will run indefinitely until manually stopped. Long timeout values (above 120 minutes) mean clusters stay running (and billing) long after the last user disconnects.

Spot instance candidates

ON_DEMAND clusters running workloads that could tolerate spot/preemptible instances. Spot instances typically cost 60–90% less than on-demand pricing for fault-tolerant workloads.

Single-node candidates

Clusters with 1 worker that could run in single-node mode. Single-node clusters avoid the overhead of a separate driver and worker, reducing costs for workloads that don’t need distributed compute.

Outdated runtime versions

Clusters running non-current Databricks Runtime versions. Newer runtimes often include performance improvements that can reduce cost for the same workload.

How insights are generated

Waste detection runs on a schedule — some detections (idle clusters, zombie models, weekend waste) run hourly, while most hygiene and optimization detections (overprovisioned workers, auto-termination, spot candidates, single-node candidates, outdated runtime) run daily. Each detection algorithm queries the latest ledger and metrics data, evaluates conditions, and creates insights for any resources that meet the criteria.

Activity filtering

To reduce noise, cluster-related insights (auto-termination, spot candidates, outdated runtime) are only generated for clusters that:

Had activity in the last 30 days, OR
Were created in the last 7 days

This prevents LakeSentry from generating insights for long-dormant clusters that nobody cares about.

Deduplication

If LakeSentry already has an active insight for the same resource and issue type, it won’t create a duplicate. Existing insights are updated with fresh evidence (like a new idle duration) rather than replaced.

Severity levels

Waste insights use the same severity scale as anomalies:

Severity	Meaning
Critical	Large financial impact or long-running waste (e.g., 12+ hour idle cluster)
High	Significant waste worth addressing soon (e.g., 6+ hour idle, 3+ excess workers)
Medium	Moderate waste (e.g., 2+ hour idle, 2 excess workers)
Low	Minor optimization opportunity
Info	Informational finding, no immediate action needed

Confidence scoring

Like anomalies, waste insights carry a confidence score based on the amount of data available:

Data quality	Confidence	Context
100+ utilization samples	95%	Strong data — recommendation is highly reliable
50–99 samples	90%	Good data — recommendation is reliable
20–49 samples	75%	Moderate data — recommendation is likely correct
Fewer than 20 samples	60%	Limited data — take recommendation with caution

Estimated savings

Where possible, LakeSentry calculates estimated savings for each waste insight. These estimates are based on:

Current resource cost (from billing data)
The nature of the waste (idle time, excess workers, on-demand vs. spot pricing)
Historical utilization patterns

Estimated savings appear in the insight detail view and in action plans generated from the insight.

Managing waste insights

Resolving

Waste insights are automatically resolved when the condition is no longer true. If you terminate an idle cluster, the insight resolves on the next detection cycle. If a cluster’s utilization increases to match its provisioning, the overprovisioned-workers insight resolves.

Snoozing

Snooze an insight if you’re aware of the issue but can’t address it right now. Snoozed insights automatically become active again after the snooze period expires.

Dismissing

Dismiss an insight if it’s not actionable for your situation — maybe the cluster needs to stay running for operational reasons, or the cost is acceptable. You can also set up auto-dismiss rules to automatically dismiss insights matching certain patterns.

Taking action

Many waste insights have associated action plans that can be executed directly from LakeSentry. For example, an idle cluster insight may offer a “Terminate cluster” action plan.

Next steps

Action Plans & Automation — How to act on waste insights
Insights & Actions — Managing insights in the UI
Compute — Cluster and warehouse views with utilization data