Waste Detection & Insights
LakeSentry continuously scans your Databricks environment for wasted spend — resources that are running but not doing useful work, infrastructure that’s oversized for its actual load, and configurations that cost more than they need to.
Types of waste detected
Section titled “Types of waste detected”LakeSentry looks for waste across several categories:
Idle clusters
Section titled “Idle clusters”Interactive clusters that are in a RUNNING state but haven’t executed any jobs for an extended period. This is the most common type of Databricks waste — clusters left running after a developer finishes their work, or all-purpose clusters with auto-termination disabled.
| Idle duration | Severity |
|---|---|
| 2+ hours | Medium |
| 6+ hours | High |
| 12+ hours | Critical |
Each idle cluster insight includes an estimated waste amount in dollars, calculated from the cluster’s hourly cost multiplied by the idle duration.
Overprovisioned workers
Section titled “Overprovisioned workers”Fixed-size clusters with more workers than their actual utilization requires. LakeSentry analyzes CPU and memory utilization over time and recommends a reduced worker count.
The detection uses a median-based approach rather than simple averaging, which handles bursty workloads better. A cluster that spikes to 100% CPU for 5 minutes per hour but sits at 10% the rest of the time doesn’t need to be sized for the peak.
The algorithm:
- For each time interval, calculate the minimum workers needed to keep CPU below 85% and memory below 90%
- Take the median across all intervals
- Compare against the current worker count
| Excess workers | Severity |
|---|---|
| 3+ workers | High |
| 2 workers | Medium |
| 1 worker | Low |
Weekend waste
Section titled “Weekend waste”Non-production workspaces with significant spend during weekends. This flags environments where dev/staging clusters could be shut down when nobody is working.
Zombie model serving endpoints
Section titled “Zombie model serving endpoints”Model serving endpoints that haven’t received any inference requests in 90+ days. These endpoints incur cost even without traffic, and may have been left running after a model was retired.
Long auto-termination settings
Section titled “Long auto-termination settings”Interactive clusters with auto-termination disabled or set above 120 minutes. Clusters with auto-termination disabled are flagged as critical severity since they will run indefinitely until manually stopped. Long timeout values (above 120 minutes) mean clusters stay running (and billing) long after the last user disconnects.
Spot instance candidates
Section titled “Spot instance candidates”ON_DEMAND clusters running workloads that could tolerate spot/preemptible instances. Spot instances typically cost 60–90% less than on-demand pricing for fault-tolerant workloads.
Single-node candidates
Section titled “Single-node candidates”Clusters with 1 worker that could run in single-node mode. Single-node clusters avoid the overhead of a separate driver and worker, reducing costs for workloads that don’t need distributed compute.
Outdated runtime versions
Section titled “Outdated runtime versions”Clusters running non-current Databricks Runtime versions. Newer runtimes often include performance improvements that can reduce cost for the same workload.
How insights are generated
Section titled “How insights are generated”Waste detection runs on a schedule — some detections (idle clusters, zombie models, weekend waste) run hourly, while most hygiene and optimization detections (overprovisioned workers, auto-termination, spot candidates, single-node candidates, outdated runtime) run daily. Each detection algorithm queries the latest ledger and metrics data, evaluates conditions, and creates insights for any resources that meet the criteria.
Activity filtering
Section titled “Activity filtering”To reduce noise, cluster-related insights (auto-termination, spot candidates, outdated runtime) are only generated for clusters that:
- Had activity in the last 30 days, OR
- Were created in the last 7 days
This prevents LakeSentry from generating insights for long-dormant clusters that nobody cares about.
Deduplication
Section titled “Deduplication”If LakeSentry already has an active insight for the same resource and issue type, it won’t create a duplicate. Existing insights are updated with fresh evidence (like a new idle duration) rather than replaced.
Severity levels
Section titled “Severity levels”Waste insights use the same severity scale as anomalies:
| Severity | Meaning |
|---|---|
| Critical | Large financial impact or long-running waste (e.g., 12+ hour idle cluster) |
| High | Significant waste worth addressing soon (e.g., 6+ hour idle, 3+ excess workers) |
| Medium | Moderate waste (e.g., 2+ hour idle, 2 excess workers) |
| Low | Minor optimization opportunity |
| Info | Informational finding, no immediate action needed |
Confidence scoring
Section titled “Confidence scoring”Like anomalies, waste insights carry a confidence score based on the amount of data available:
| Data quality | Confidence | Context |
|---|---|---|
| 100+ utilization samples | 95% | Strong data — recommendation is highly reliable |
| 50–99 samples | 90% | Good data — recommendation is reliable |
| 20–49 samples | 75% | Moderate data — recommendation is likely correct |
| Fewer than 20 samples | 60% | Limited data — take recommendation with caution |
Estimated savings
Section titled “Estimated savings”Where possible, LakeSentry calculates estimated savings for each waste insight. These estimates are based on:
- Current resource cost (from billing data)
- The nature of the waste (idle time, excess workers, on-demand vs. spot pricing)
- Historical utilization patterns
Estimated savings appear in the insight detail view and in action plans generated from the insight.
Managing waste insights
Section titled “Managing waste insights”Resolving
Section titled “Resolving”Waste insights are automatically resolved when the condition is no longer true. If you terminate an idle cluster, the insight resolves on the next detection cycle. If a cluster’s utilization increases to match its provisioning, the overprovisioned-workers insight resolves.
Snoozing
Section titled “Snoozing”Snooze an insight if you’re aware of the issue but can’t address it right now. Snoozed insights automatically become active again after the snooze period expires.
Dismissing
Section titled “Dismissing”Dismiss an insight if it’s not actionable for your situation — maybe the cluster needs to stay running for operational reasons, or the cost is acceptable. You can also set up auto-dismiss rules to automatically dismiss insights matching certain patterns.
Taking action
Section titled “Taking action”Many waste insights have associated action plans that can be executed directly from LakeSentry. For example, an idle cluster insight may offer a “Terminate cluster” action plan.
Next steps
Section titled “Next steps”- Action Plans & Automation — How to act on waste insights
- Insights & Actions — Managing insights in the UI
- Compute — Cluster and warehouse views with utilization data