Skip to content

Metrics & Aggregations

LakeSentry pre-computes metrics from the ledger into aggregated tables that power dashboards, reports, and insights. This page explains how the metric system works, what types of metrics exist, and how data accuracy is maintained.

Querying raw ledger data for every dashboard load would be slow. A question like “what’s my daily cost by team for the last 90 days?” requires joining usage line items with attribution rules, organizational hierarchy, and pricing data — potentially millions of rows.

Pre-aggregated metrics compute these joins once and store the results in dedicated tables. Dashboard queries then read from these tables directly, making page loads fast regardless of data volume.

LakeSentry computes metrics across several categories:

Track spend patterns over time, broken down by attribution dimensions.

MetricWhat it measuresRefresh
Cost attribution summaryDaily cost by attribution path, team, department, org unitDaily (60-day lookback)
Pipeline spend dailyDaily cost per DLT pipelineDaily
Team attribution dailyDaily cost per team with breakdownDaily
User activity dailyDaily cost and activity per userDaily
Weekend spend weeklyWeekend vs. weekday spend per workspaceWeekly
Weekly spendWeek-over-week spend trends with percentage changesDaily

The cost attribution summary is the most important cost metric — it’s the source of truth for chargeback reports, cost explorer views, and budget tracking. It aggregates costs along all attribution dimensions: workspace, date, attribution path, org unit, department, team, principal, shared bucket, and project.

Track how efficiently compute resources are being used.

MetricWhat it measuresRefresh
Cluster utilization dailyCPU, memory, idle time per cluster per dayDaily (14-day lookback)
Warehouse timeline minuteSQL warehouse state changes at minute granularityHourly

Cluster utilization tracks:

  • Average and P95 CPU/memory usage
  • Idle minutes (time with no activity)
  • Whether auto-termination is enabled
  • Recommended worker count (median-based sizing)

Track job execution, query performance, and serving efficiency.

MetricWhat it measuresRefresh
Work unit run costPer-run cost with percentile flaggingDaily (90-day lookback)
Job run diagnosticsPer-run diagnostic flags (duration, cost, failures)Daily
Query daily aggregateQuery count, duration, error rates by warehouseDaily
Query factIndividual query metrics with cost allocation and percentile rankingDaily
Spill analysis dailyDisk spill events by user per dayDaily
Cold start dailyWarehouse cold start latency metricsDaily

Track data platform health and efficiency.

MetricWhat it measuresRefresh
Pruning effectiveness dailyFile pruning success rate per tableDaily
Scanzilla dailyQueries reading excessive data relative to outputDaily
Lineage utilization dailyTable reference patterns and access frequencyDaily
Token efficiency dailyLLM token usage in model servingDaily

Track growth and change patterns across resources.

MetricWhat it measuresRefresh
Entity velocityGrowth/decline detection for workspaces, work units (jobs/pipelines), warehousesDaily (30-day window)
Serving endpoint dailyModel serving endpoint cost and trafficDaily
Serving requester dailyTraffic breakdown by requester per serving endpointDaily

Metrics are refreshed on different schedules depending on how frequently the underlying data changes and how time-sensitive the metric is:

ScheduleWhen it runsMetric types
HourlyEvery hourWarehouse timeline
DailyOnce per dayCost attribution, utilization, query metrics, run diagnostics, entity velocity, serving metrics, weekly spend trends
WeeklyOnce per weekWeekend spend

Each metric has a configured lookback window that determines how much historical data is recomputed on each refresh. This handles late-arriving data and corrections:

MetricLookbackWhy
Cost attribution summary60 daysAttribution rules may change, requiring recalculation
Cluster utilization14 daysHandles late-arriving utilization data
Work unit run cost90 daysLong lookback for accurate percentile computation
Entity velocity30 daysGrowth trends need a 30-day comparison window

Metrics use different strategies for updating data:

  • Delete and insert by window — Delete all data for the lookback window, then recompute and insert. Used for metrics where the entire window may change (like cost attribution after a rule change).
  • Upsert — Insert new records or update existing ones. Used for metrics where most data is stable and only new data needs adding.
  • Full refresh — Truncate and recompute the entire table. Used rarely, for small lookup tables.

Each metric is defined by a YAML specification file that serves as the authoritative definition of:

  • What data sources the metric reads from
  • What columns it produces
  • What grain (unique key) the metric is computed at
  • The refresh schedule and lookback window
  • The refresh strategy

These specifications are the “golden source” — the metric implementation in SQL must match the spec. This approach ensures metrics are documented, testable, and consistent.

Different parts of LakeSentry read from different metric tables:

FeaturePrimary metric source
Overview dashboardCost attribution summary, entity velocity
Cost ExplorerCost attribution summary, weekly spend
Work unit detailWork unit run cost, job run diagnostics
Cluster detailCluster utilization daily
SQL analysisQuery daily aggregate, spill analysis, scanzilla
BudgetsCost attribution summary (for actual spend tracking)
InsightsMultiple metrics depending on insight type

LakeSentry maintains data accuracy through several mechanisms:

  • Immutable raw layer — Original data from Databricks is never modified. Metrics can always be traced back to source.
  • Idempotent computation — Running a metric refresh twice produces identical results. There are no race conditions or ordering dependencies within a metric.
  • Lookback recomputation — Each refresh recomputes a window of historical data, catching any corrections or late-arriving records.
  • Dependency ordering — Metrics that depend on other metrics are computed in the correct order. Cost attribution runs before the attribution summary metric is refreshed.

If you ever notice a discrepancy, LakeSentry’s pipeline design means the fix is straightforward: recompute the metric from ledger data. No data is lost because the raw layer is append-only.