Storage
The Storage page tracks costs from Databricks’ predictive optimization system — the automated operations that compact, vacuum, analyze, and cluster your Delta tables. These system-initiated operations consume DBUs and generate cost, often without visibility into what triggered them or how much they cost.
Overview
Section titled “Overview”The top of the page shows headline metrics:
| Metric | What it shows |
|---|---|
| Total Cost | Aggregate estimated cost of predictive optimization operations for the selected period |
| Operations | Total number of optimization operations performed |
| DBUs Used | Total DBU consumption across all operations |
| Failed | Count of operations that did not complete successfully |
Two bar charts provide a visual breakdown:
- Operations by Type — How many operations of each type were performed
- Cost by Operation Type — How much each operation type cost
Overview tab
Section titled “Overview tab”The Overview tab shows a daily operations table with the following columns:
| Column | What it shows |
|---|---|
| Date | Day the operations were performed |
| Workspace | Which Databricks workspace |
| Operation | Operation type (COMPACT, VACUUM, ANALYZE, CLUSTERING) |
| Count | Number of operations on that day |
| Success | Count of successful operations |
| Failed | Count of failed operations |
| DBUs | Total DBU consumption |
| Est. Cost | Estimated cost in USD |
The table is sorted by estimated cost (descending) by default.
By Table tab
Section titled “By Table tab”The By Table tab shows optimization activity grouped by individual table:
| Column | What it shows |
|---|---|
| Date | Day the operations were performed |
| Table | Fully qualified table name (catalog.schema.table) |
| Workspace | Which Databricks workspace |
| Total Ops | Total number of operations on this table |
| Compact | Number of compaction operations |
| Vacuum | Number of vacuum operations |
| Analyze | Number of analyze operations |
| Cluster | Number of clustering operations |
| Est. Cost | Estimated cost in USD |
This view helps identify which tables are most expensive to maintain through predictive optimization.
Predictive optimization operation types
Section titled “Predictive optimization operation types”LakeSentry tracks operations performed by Databricks’ predictive optimization system:
| Operation type | What it does |
|---|---|
| COMPACTION | Compacts small files into larger ones for better read performance. Reduces the number of files scanned during queries. |
| VACUUM | Removes old file versions that are no longer needed. Reduces storage cost by cleaning up files left behind by Delta operations. |
| ANALYZE | Collects statistics on table data to improve query planning and reduce staleness. |
| CLUSTERING | Re-clusters data by frequently filtered columns. Improves partition pruning and reduces bytes scanned. Includes AUTO_CLUSTERING_COLUMN_SELECTION operations. |
Cost is estimated using DBU consumption and a standard DBU rate. The raw data comes from the Databricks system.storage.predictive_optimization_operations_history system table.
Filtering
Section titled “Filtering”The Storage page respects the global workspace filter and time range selector. Operations can also be filtered by operation type via the API.
Storage cost categories
Section titled “Storage cost categories”Databricks storage costs appear in billing data under several usage types:
| Usage type | What it covers |
|---|---|
| STORAGE_SPACE | Cloud storage for Delta tables and volumes |
| NETWORK_BYTE | Network transfer costs for cross-region reads |
These usage types are visible in the Cost Explorer compute types breakdown, not on the Storage page itself. The Storage page focuses specifically on predictive optimization operation costs (which are DBU-based), not on the underlying cloud storage charges.
Table cost attribution
Section titled “Table cost attribution”Table-level cost attribution (which tables cost the most to write to, which are unused) is available in the Cost Explorer Tables tab, not on this page. That view shows cost attributed through work units that write to each table, along with unused table detection.
Common investigation workflows
Section titled “Common investigation workflows”Identifying expensive optimization operations
Section titled “Identifying expensive optimization operations”- Check the Cost by Operation Type chart to see which operation types drive the most spend.
- Switch to the By Table tab and sort by Est. Cost (descending).
- Look for tables with disproportionately high optimization costs relative to their value.
Monitoring optimization health
Section titled “Monitoring optimization health”- Check the Failed metric in the page header. Any non-zero value warrants investigation.
- In the Overview tab, look for rows with high failed counts.
- Failed operations may indicate table configuration issues or resource constraints.
Reviewing per-table optimization activity
Section titled “Reviewing per-table optimization activity”- Switch to the By Table tab.
- Look for tables with unusually high operation counts — frequent compaction may indicate a write pattern that produces many small files.
- Review CLUSTERING operations on frequently queried tables to verify they are improving query performance (cross-reference with SQL Analysis).
Next steps
Section titled “Next steps”- SQL Analysis — Query-level cost investigation
- Cost Explorer — Explore storage costs alongside compute, including table cost attribution
- Waste Detection & Insights — How unused data and optimization opportunities are identified
- Metrics & Aggregations — How storage metrics are computed