Action Plans & Automation
LakeSentry doesn’t just tell you what’s wrong — it can help you fix it. When an insight identifies waste or an optimization opportunity, LakeSentry generates an action plan with a specific recommendation, estimated savings, and the ability to execute the change directly.
All automation runs through a safety model designed to build trust incrementally. Nothing executes without your knowledge, and the most impactful actions always require explicit approval.
Safety tiers
Section titled “Safety tiers”Every action plan is assigned to a safety tier that determines how it can be executed:
| Tier | Name | How it executes | Examples |
|---|---|---|---|
| Tier 0 | Autopilot | Automatically when autopilot is enabled | Terminate idle clusters |
| Tier 1 | Approval required | Only after an admin explicitly approves | Cancel runaway job runs, reduce auto-termination timeouts |
| Tier 2 | Manual only | Displayed as recommendation — you execute it yourself | Switch to spot instances, convert to single-node, right-size clusters, upgrade runtime |
Tier 0: Autopilot actions
Section titled “Tier 0: Autopilot actions”These are provably safe and reversible. Terminating an idle cluster that nobody is using has no impact on running workloads.
Autopilot is disabled by default. You opt in per action type after reviewing what it would do. Even with autopilot enabled, guardrails prevent excessive execution.
Tier 1: Approval-only actions
Section titled “Tier 1: Approval-only actions”These change infrastructure configuration and could affect running workloads if applied carelessly. Canceling a runaway job run or reducing auto-termination timeouts are generally safe operations, but they affect how resources behave going forward.
These actions always require an admin to review the plan, check the estimated impact, and click Approve.
Tier 2: Manual actions
Section titled “Tier 2: Manual actions”These are recommendations that LakeSentry can’t (or shouldn’t) execute automatically. They might require changes to job code, architecture decisions, or coordination across teams.
LakeSentry surfaces the recommendation with evidence and estimated savings, but you take the action yourself.
Action plan lifecycle
Section titled “Action plan lifecycle”Pending → Approved → Executed ↘ Rejected ↘ Failed- Pending — An analysis worker creates the action plan from an insight. The plan is waiting for review (Tier 1) or autopilot evaluation (Tier 0).
- Approved — An admin approved the plan (Tier 1) or autopilot accepted it (Tier 0).
- Executed — The action was successfully executed. The associated insight is resolved.
- Failed — The Databricks API call failed during execution.
- Rejected — An admin reviewed the plan and decided not to proceed.
Guardrails
Section titled “Guardrails”Guardrails prevent automation from doing too much too fast. They apply to both autopilot and approved actions.
Allowlists and denylists
Section titled “Allowlists and denylists”You can configure which resources are eligible for automated actions:
- Allowlist — Only resources on this list can be targeted
- Denylist — Resources on this list are never targeted, regardless of other settings
Time windows
Section titled “Time windows”Restrict when actions can execute. For example, you might allow autopilot only during business hours, or block execution during a known deployment window.
Cooldown periods
Section titled “Cooldown periods”After an action executes on a resource, a cooldown prevents the same type of action from targeting that resource again for a configured period. This prevents oscillation (e.g., terminating and re-terminating the same cluster in a loop).
Rate limits
Section titled “Rate limits”Limit how many actions can execute within a time window. This prevents a burst of autopilot actions from making too many changes at once.
Kill switch
Section titled “Kill switch”An emergency stop that immediately halts all in-progress and pending actions. Use this if you notice unexpected behavior from automated actions. The kill switch is accessible from the Actions page and the global navigation.
What actions are available
Section titled “What actions are available”Compute actions
Section titled “Compute actions”| Action | Safety tier | What it does |
|---|---|---|
| Terminate idle cluster | Tier 0 | Terminates a cluster with no recent activity |
| Cancel runaway run | Tier 1 | Cancels a job run exceeding its historical runtime |
| Reduce auto-termination | Tier 1 | Reduces auto-termination timeout to a recommended value (e.g., 120 minutes) |
Recommendations (manual)
Section titled “Recommendations (manual)”| Recommendation | What it suggests |
|---|---|
| Switch to spot instances | Use spot/preemptible instances for fault-tolerant workloads |
| Convert to single-node | Switch a 1-worker cluster to single-node mode |
| Upgrade runtime | Move to a newer Databricks Runtime version |
| Right-size cluster | Reduce fixed worker count based on utilization |
Estimated savings
Section titled “Estimated savings”Each action plan includes an estimated savings figure. This is calculated from:
- Historical cost data — What the resource has been costing
- The proposed change — How the action would reduce cost
- Time horizon — Projected savings over 30 days
For example, an idle cluster burning $5/hour for 12 hours generates an estimated savings of ~$60 for that idle period, plus projected savings if the pattern continues.
Execution details
Section titled “Execution details”When an action executes, LakeSentry:
- Checks preconditions — Verifies the resource still exists and the condition still applies (e.g., cluster is still running, run is still active)
- Calls the Databricks API — Applies the change (terminate cluster, cancel run, etc.)
- Records the result — Logs success or failure with details in the execution record
- Updates plan status — Marks the plan as executed or failed
- Resolves the insight — If successful, the associated insight is marked as resolved
If the Databricks API call fails, the action is marked as failed with the error details. Failed actions have no automatic retry — the next scheduled detection cycle will regenerate the action plan if the condition persists.
Audit trail
Section titled “Audit trail”Every action — whether executed by autopilot, by admin approval, or rejected — is recorded in the audit log with:
- Who approved it (or “autopilot” for Tier 0)
- When it was executed
- What Databricks API call was made
- The result (success or failure with details)
- The estimated vs. actual savings (tracked over time)
Next steps
Section titled “Next steps”- Insights & Actions — Managing action plans in the UI
- Waste Detection & Insights — Understanding what triggers action plans
- Settings — Configuring guardrails and autopilot preferences