Skip to content

Action Plans & Automation

LakeSentry doesn’t just tell you what’s wrong — it can help you fix it. When an insight identifies waste or an optimization opportunity, LakeSentry generates an action plan with a specific recommendation, estimated savings, and the ability to execute the change directly.

All automation runs through a safety model designed to build trust incrementally. Nothing executes without your knowledge, and the most impactful actions always require explicit approval.

Every action plan is assigned to a safety tier that determines how it can be executed:

TierNameHow it executesExamples
Tier 0AutopilotAutomatically when autopilot is enabledTerminate idle clusters
Tier 1Approval requiredOnly after an admin explicitly approvesCancel runaway job runs, reduce auto-termination timeouts
Tier 2Manual onlyDisplayed as recommendation — you execute it yourselfSwitch to spot instances, convert to single-node, right-size clusters, upgrade runtime

These are provably safe and reversible. Terminating an idle cluster that nobody is using has no impact on running workloads.

Autopilot is disabled by default. You opt in per action type after reviewing what it would do. Even with autopilot enabled, guardrails prevent excessive execution.

These change infrastructure configuration and could affect running workloads if applied carelessly. Canceling a runaway job run or reducing auto-termination timeouts are generally safe operations, but they affect how resources behave going forward.

These actions always require an admin to review the plan, check the estimated impact, and click Approve.

These are recommendations that LakeSentry can’t (or shouldn’t) execute automatically. They might require changes to job code, architecture decisions, or coordination across teams.

LakeSentry surfaces the recommendation with evidence and estimated savings, but you take the action yourself.

Pending → Approved → Executed
↘ Rejected ↘ Failed
  1. Pending — An analysis worker creates the action plan from an insight. The plan is waiting for review (Tier 1) or autopilot evaluation (Tier 0).
  2. Approved — An admin approved the plan (Tier 1) or autopilot accepted it (Tier 0).
  3. Executed — The action was successfully executed. The associated insight is resolved.
  4. Failed — The Databricks API call failed during execution.
  5. Rejected — An admin reviewed the plan and decided not to proceed.

Guardrails prevent automation from doing too much too fast. They apply to both autopilot and approved actions.

You can configure which resources are eligible for automated actions:

  • Allowlist — Only resources on this list can be targeted
  • Denylist — Resources on this list are never targeted, regardless of other settings

Restrict when actions can execute. For example, you might allow autopilot only during business hours, or block execution during a known deployment window.

After an action executes on a resource, a cooldown prevents the same type of action from targeting that resource again for a configured period. This prevents oscillation (e.g., terminating and re-terminating the same cluster in a loop).

Limit how many actions can execute within a time window. This prevents a burst of autopilot actions from making too many changes at once.

An emergency stop that immediately halts all in-progress and pending actions. Use this if you notice unexpected behavior from automated actions. The kill switch is accessible from the Actions page and the global navigation.

ActionSafety tierWhat it does
Terminate idle clusterTier 0Terminates a cluster with no recent activity
Cancel runaway runTier 1Cancels a job run exceeding its historical runtime
Reduce auto-terminationTier 1Reduces auto-termination timeout to a recommended value (e.g., 120 minutes)
RecommendationWhat it suggests
Switch to spot instancesUse spot/preemptible instances for fault-tolerant workloads
Convert to single-nodeSwitch a 1-worker cluster to single-node mode
Upgrade runtimeMove to a newer Databricks Runtime version
Right-size clusterReduce fixed worker count based on utilization

Each action plan includes an estimated savings figure. This is calculated from:

  • Historical cost data — What the resource has been costing
  • The proposed change — How the action would reduce cost
  • Time horizon — Projected savings over 30 days

For example, an idle cluster burning $5/hour for 12 hours generates an estimated savings of ~$60 for that idle period, plus projected savings if the pattern continues.

When an action executes, LakeSentry:

  1. Checks preconditions — Verifies the resource still exists and the condition still applies (e.g., cluster is still running, run is still active)
  2. Calls the Databricks API — Applies the change (terminate cluster, cancel run, etc.)
  3. Records the result — Logs success or failure with details in the execution record
  4. Updates plan status — Marks the plan as executed or failed
  5. Resolves the insight — If successful, the associated insight is marked as resolved

If the Databricks API call fails, the action is marked as failed with the error details. Failed actions have no automatic retry — the next scheduled detection cycle will regenerate the action plan if the condition persists.

Every action — whether executed by autopilot, by admin approval, or rejected — is recorded in the audit log with:

  • Who approved it (or “autopilot” for Tier 0)
  • When it was executed
  • What Databricks API call was made
  • The result (success or failure with details)
  • The estimated vs. actual savings (tracked over time)