Action Plans & Automation

LakeSentry doesn’t just tell you what’s wrong — it can help you fix it. When an insight identifies waste or an optimization opportunity, LakeSentry generates an action plan with a specific recommendation, estimated savings, and the ability to execute the change directly.

All automation runs through a safety model designed to build trust incrementally. Nothing executes without your knowledge, and the most impactful actions always require explicit approval.

Safety tiers

Every action plan is assigned to a safety tier that determines how it can be executed:

Tier	Name	How it executes	Examples
Tier 0	Autopilot	Automatically when autopilot is enabled	Terminate idle clusters
Tier 1	Approval required	Only after an admin explicitly approves	Cancel runaway job runs, reduce auto-termination timeouts
Tier 2	Manual only	Displayed as recommendation — you execute it yourself	Switch to spot instances, convert to single-node, right-size clusters, upgrade runtime

Tier 0: Autopilot actions

These are provably safe and reversible. Terminating an idle cluster that nobody is using has no impact on running workloads.

Autopilot is disabled by default. You opt in per action type after reviewing what it would do. Even with autopilot enabled, guardrails prevent excessive execution.

Tier 1: Approval-only actions

These change infrastructure configuration and could affect running workloads if applied carelessly. Canceling a runaway job run or reducing auto-termination timeouts are generally safe operations, but they affect how resources behave going forward.

These actions always require an admin to review the plan, check the estimated impact, and click Approve.

Tier 2: Manual actions

These are recommendations that LakeSentry can’t (or shouldn’t) execute automatically. They might require changes to job code, architecture decisions, or coordination across teams.

LakeSentry surfaces the recommendation with evidence and estimated savings, but you take the action yourself.

Action plan lifecycle

Pending → Approved → Executed
       ↘ Rejected   ↘ Failed

Pending — An analysis worker creates the action plan from an insight. The plan is waiting for review (Tier 1) or autopilot evaluation (Tier 0).
Approved — An admin approved the plan (Tier 1) or autopilot accepted it (Tier 0).
Executed — The action was successfully executed. The associated insight is resolved.
Failed — The Databricks API call failed during execution.
Rejected — An admin reviewed the plan and decided not to proceed.

Guardrails

Guardrails prevent automation from doing too much too fast. They apply to both autopilot and approved actions.

Allowlists and denylists

You can configure which resources are eligible for automated actions:

Allowlist — Only resources on this list can be targeted
Denylist — Resources on this list are never targeted, regardless of other settings

Time windows

Restrict when actions can execute. For example, you might allow autopilot only during business hours, or block execution during a known deployment window.

Cooldown periods

After an action executes on a resource, a cooldown prevents the same type of action from targeting that resource again for a configured period. This prevents oscillation (e.g., terminating and re-terminating the same cluster in a loop).

Rate limits

Limit how many actions can execute within a time window. This prevents a burst of autopilot actions from making too many changes at once.

Kill switch

An emergency stop that immediately halts all in-progress and pending actions. Use this if you notice unexpected behavior from automated actions. The kill switch is accessible from the Actions page and the global navigation.

What actions are available

Compute actions

Action	Safety tier	What it does
Terminate idle cluster	Tier 0	Terminates a cluster with no recent activity
Cancel runaway run	Tier 1	Cancels a job run exceeding its historical runtime
Reduce auto-termination	Tier 1	Reduces auto-termination timeout to a recommended value (e.g., 120 minutes)

Recommendations (manual)

Recommendation	What it suggests
Switch to spot instances	Use spot/preemptible instances for fault-tolerant workloads
Convert to single-node	Switch a 1-worker cluster to single-node mode
Upgrade runtime	Move to a newer Databricks Runtime version
Right-size cluster	Reduce fixed worker count based on utilization

Estimated savings

Each action plan includes an estimated savings figure. This is calculated from:

Historical cost data — What the resource has been costing
The proposed change — How the action would reduce cost
Time horizon — Projected savings over 30 days

For example, an idle cluster burning $5/hour for 12 hours generates an estimated savings of ~$60 for that idle period, plus projected savings if the pattern continues.

Execution details

When an action executes, LakeSentry:

Checks preconditions — Verifies the resource still exists and the condition still applies (e.g., cluster is still running, run is still active)
Calls the Databricks API — Applies the change (terminate cluster, cancel run, etc.)
Records the result — Logs success or failure with details in the execution record
Updates plan status — Marks the plan as executed or failed
Resolves the insight — If successful, the associated insight is marked as resolved

If the Databricks API call fails, the action is marked as failed with the error details. Failed actions have no automatic retry — the next scheduled detection cycle will regenerate the action plan if the condition persists.

Audit trail

Every action — whether executed by autopilot, by admin approval, or rejected — is recorded in the audit log with:

Who approved it (or “autopilot” for Tier 0)
When it was executed
What Databricks API call was made
The result (success or failure with details)
The estimated vs. actual savings (tracked over time)

Next steps

Insights & Actions — Managing action plans in the UI
Waste Detection & Insights — Understanding what triggers action plans
Settings — Configuring guardrails and autopilot preferences