SQL Analysis
The SQL Analysis page lets you investigate Databricks costs at the individual query level. While Cost Explorer shows spend by workspace, team, or compute type, SQL Analysis reveals which specific queries are driving warehouse costs — and highlights performance issues that inflate your bill.
Overview
Section titled “Overview”The top of the page shows headline metrics for the selected time range:
| Metric | What it shows |
|---|---|
| Queries | Count of queries executed across all warehouses |
| Error Rate | Percentage of queries that failed |
| Avg Runtime | Mean execution time across all queries |
| Cost | Aggregate query cost (DBU + cloud) |
A trend chart below shows query count over the period, with day/week/month granularity options. Use this to spot unusual spikes in query volume.
Warehouse summary
Section titled “Warehouse summary”Below the overview, a table ranks warehouses by query activity:
| Column | What it shows |
|---|---|
| Warehouse | Name and workspace |
| Queries | Total queries executed |
| Cost | Combined DBU + cloud cost |
| Error % | Percentage of failed queries |
| Avg Runtime | Mean execution time |
| P95 Runtime | 95th percentile execution time |
| Avg Queue | Mean time waiting for capacity |
Click any warehouse row to expand it and view query-level cost allocation for that warehouse.
Query list
Section titled “Query list”The main query table shows individual queries with columns:
| Column | What it shows |
|---|---|
| SQL | Statement preview and statement type |
| Source | Query source group (Job, Notebook, etc.) |
| Workspace | Workspace and warehouse that ran the query |
| Status | Finished, Failed, Running, Canceled |
| Runtime | Total execution time (sortable) |
| Queue | Time spent waiting for capacity |
| Cost | Allocated cost for this query (sortable) |
| User | Who submitted the query |
| Last Run | When the query was executed (sortable) |
Filtering
Section titled “Filtering”Narrow the query list using these filters:
| Filter | Options |
|---|---|
| Search | Search by statement ID or user name |
| Source group | Job, Notebook, Dashboard, Alert, Genie, SQL_Editor, API |
| Status | Finished, Failed, Running, Canceled |
Workspace and warehouse filtering is available through the global filter bar. Additional drill-down filters (user, date, issue type, query fingerprint) are applied automatically when navigating from the Optimization tab.
Sorting
Section titled “Sorting”Sort by any sortable column. Useful combinations:
- Cost (descending) — Find the most expensive individual queries
- Runtime (descending) — Find the slowest queries
- Last Run (descending) — See the most recent queries first
Source breakdown
Section titled “Source breakdown”The source breakdown panel shows where queries originate:
| Source | What it means |
|---|---|
| Job | Queries executed as part of a Databricks job |
| Notebook | Queries from Databricks notebooks |
| Dashboard | Queries triggered by Databricks SQL dashboards |
| Alert | Queries triggered by Databricks SQL alerts |
| Genie | Queries from Databricks Genie AI assistant |
| SQL_Editor | Queries from the Databricks SQL editor |
| API | Queries from external applications or BI tools |
| Other | Queries that don’t match any known source |
Each source shows its query count and total cost. This helps identify whether your warehouse spend is driven by ad-hoc exploration (notebooks, SQL editor) or production workloads (jobs, API).
Query detail view
Section titled “Query detail view”Click any query to see its full detail:
Statement
Section titled “Statement”The complete SQL statement text. Long queries are displayed in a monospace code block and can be collapsed or expanded. A copy button lets you copy the full statement to your clipboard.
Performance metrics
Section titled “Performance metrics”Summary stat cards at the top show key metrics at a glance:
| Stat card | What it shows |
|---|---|
| Runtime | Time spent executing the query |
| Data Scanned | Total bytes read from storage |
| Spill | Bytes spilled to disk (memory pressure indicator) |
| Cache | Whether the result was served from cache (Hit or Miss) |
| Cost | Allocated cost for this query |
The resource metrics panel provides additional detail:
| Metric | What it shows |
|---|---|
| Rows Read | Total rows scanned |
| Rows Produced | Rows returned to the client |
| Bytes Read | Total bytes scanned from storage |
| Bytes Written | Total bytes written |
| Bytes Spilled | Bytes spilled to disk during execution |
| Queue Time | Time waiting for warehouse capacity |
| Waiting Time | Time spent waiting for warehouse resources (cold start indicator) |
Cost breakdown
Section titled “Cost breakdown”- DBU cost — Databricks unit charges for this query
- Cloud cost — Underlying cloud infrastructure cost
Performance flags
Section titled “Performance flags”LakeSentry automatically flags queries with performance issues:
Cold start
Section titled “Cold start”A query is flagged as a cold start when the waiting time exceeds 30 seconds. Cold starts happen when a warehouse needs to spin up or scale out to handle the query. Frequent cold starts on the same warehouse may indicate the auto-stop timeout is too aggressive.
Scanzilla
Section titled “Scanzilla”A query is flagged as a “scanzilla” when it reads more than 100 GB of data but produces fewer than 1,000 rows. This pattern indicates a full-scan query that could benefit from:
- Adding partition filters to narrow the scan
- Pre-aggregating data in a materialized view
- Restructuring the query to push filters earlier
Spill analysis
Section titled “Spill analysis”Queries that spill data to local disk during execution are flagged with the spill volume. Spilling happens when a query’s intermediate results exceed the available memory and must be written to disk, which significantly slows execution. This can indicate:
- Queries with large joins or aggregations that need more memory
- Warehouses that are undersized for the workload
Common investigation workflows
Section titled “Common investigation workflows”Finding expensive queries
Section titled “Finding expensive queries”- Set the time range to the period of interest.
- Sort the query list by Cost (descending).
- Look at the top 10–20 queries — they often account for a disproportionate share of warehouse spend.
- Click into each to check for performance flags (scanzilla, cold start, spill).
Investigating a cost spike
Section titled “Investigating a cost spike”- Open the Cost Explorer and identify the warehouse or time period with the spike.
- Navigate to SQL Analysis and filter to that warehouse and time range.
- Sort by cost to find the specific queries driving the increase.
- Check if the spike is due to new queries, increased frequency, or degraded performance of existing queries.
Identifying optimization candidates
Section titled “Identifying optimization candidates”- Filter to Finished status to exclude failed queries.
- Sort by Cost (descending) to find the most expensive queries.
- Switch to the Optimization tab and look for scanzilla patterns — these are the lowest-hanging fruit for optimization.
- Check spill analysis for queries that might benefit from warehouse scaling or query restructuring.
Next steps
Section titled “Next steps”- Compute (Clusters & Warehouses) — Warehouse-level cost and utilization
- Cost Explorer — Multi-dimensional cost analysis
- Metrics & Aggregations — How query metrics are computed