Skip to content

SQL Analysis

The SQL Analysis page lets you investigate Databricks costs at the individual query level. While Cost Explorer shows spend by workspace, team, or compute type, SQL Analysis reveals which specific queries are driving warehouse costs — and highlights performance issues that inflate your bill.

The top of the page shows headline metrics for the selected time range:

MetricWhat it shows
QueriesCount of queries executed across all warehouses
Error RatePercentage of queries that failed
Avg RuntimeMean execution time across all queries
CostAggregate query cost (DBU + cloud)

A trend chart below shows query count over the period, with day/week/month granularity options. Use this to spot unusual spikes in query volume.

Below the overview, a table ranks warehouses by query activity:

ColumnWhat it shows
WarehouseName and workspace
QueriesTotal queries executed
CostCombined DBU + cloud cost
Error %Percentage of failed queries
Avg RuntimeMean execution time
P95 Runtime95th percentile execution time
Avg QueueMean time waiting for capacity

Click any warehouse row to expand it and view query-level cost allocation for that warehouse.

The main query table shows individual queries with columns:

ColumnWhat it shows
SQLStatement preview and statement type
SourceQuery source group (Job, Notebook, etc.)
WorkspaceWorkspace and warehouse that ran the query
StatusFinished, Failed, Running, Canceled
RuntimeTotal execution time (sortable)
QueueTime spent waiting for capacity
CostAllocated cost for this query (sortable)
UserWho submitted the query
Last RunWhen the query was executed (sortable)

Narrow the query list using these filters:

FilterOptions
SearchSearch by statement ID or user name
Source groupJob, Notebook, Dashboard, Alert, Genie, SQL_Editor, API
StatusFinished, Failed, Running, Canceled

Workspace and warehouse filtering is available through the global filter bar. Additional drill-down filters (user, date, issue type, query fingerprint) are applied automatically when navigating from the Optimization tab.

Sort by any sortable column. Useful combinations:

  • Cost (descending) — Find the most expensive individual queries
  • Runtime (descending) — Find the slowest queries
  • Last Run (descending) — See the most recent queries first

The source breakdown panel shows where queries originate:

SourceWhat it means
JobQueries executed as part of a Databricks job
NotebookQueries from Databricks notebooks
DashboardQueries triggered by Databricks SQL dashboards
AlertQueries triggered by Databricks SQL alerts
GenieQueries from Databricks Genie AI assistant
SQL_EditorQueries from the Databricks SQL editor
APIQueries from external applications or BI tools
OtherQueries that don’t match any known source

Each source shows its query count and total cost. This helps identify whether your warehouse spend is driven by ad-hoc exploration (notebooks, SQL editor) or production workloads (jobs, API).

Click any query to see its full detail:

The complete SQL statement text. Long queries are displayed in a monospace code block and can be collapsed or expanded. A copy button lets you copy the full statement to your clipboard.

Summary stat cards at the top show key metrics at a glance:

Stat cardWhat it shows
RuntimeTime spent executing the query
Data ScannedTotal bytes read from storage
SpillBytes spilled to disk (memory pressure indicator)
CacheWhether the result was served from cache (Hit or Miss)
CostAllocated cost for this query

The resource metrics panel provides additional detail:

MetricWhat it shows
Rows ReadTotal rows scanned
Rows ProducedRows returned to the client
Bytes ReadTotal bytes scanned from storage
Bytes WrittenTotal bytes written
Bytes SpilledBytes spilled to disk during execution
Queue TimeTime waiting for warehouse capacity
Waiting TimeTime spent waiting for warehouse resources (cold start indicator)
  • DBU cost — Databricks unit charges for this query
  • Cloud cost — Underlying cloud infrastructure cost

LakeSentry automatically flags queries with performance issues:

A query is flagged as a cold start when the waiting time exceeds 30 seconds. Cold starts happen when a warehouse needs to spin up or scale out to handle the query. Frequent cold starts on the same warehouse may indicate the auto-stop timeout is too aggressive.

A query is flagged as a “scanzilla” when it reads more than 100 GB of data but produces fewer than 1,000 rows. This pattern indicates a full-scan query that could benefit from:

  • Adding partition filters to narrow the scan
  • Pre-aggregating data in a materialized view
  • Restructuring the query to push filters earlier

Queries that spill data to local disk during execution are flagged with the spill volume. Spilling happens when a query’s intermediate results exceed the available memory and must be written to disk, which significantly slows execution. This can indicate:

  • Queries with large joins or aggregations that need more memory
  • Warehouses that are undersized for the workload
  1. Set the time range to the period of interest.
  2. Sort the query list by Cost (descending).
  3. Look at the top 10–20 queries — they often account for a disproportionate share of warehouse spend.
  4. Click into each to check for performance flags (scanzilla, cold start, spill).
  1. Open the Cost Explorer and identify the warehouse or time period with the spike.
  2. Navigate to SQL Analysis and filter to that warehouse and time range.
  3. Sort by cost to find the specific queries driving the increase.
  4. Check if the spike is due to new queries, increased frequency, or degraded performance of existing queries.
  1. Filter to Finished status to exclude failed queries.
  2. Sort by Cost (descending) to find the most expensive queries.
  3. Switch to the Optimization tab and look for scanzilla patterns — these are the lowest-hanging fruit for optimization.
  4. Check spill analysis for queries that might benefit from warehouse scaling or query restructuring.