Data Freshness & Pipeline Status

LakeSentry’s data flows through several stages before appearing in dashboards. Understanding these stages and their expected latency helps you distinguish between normal pipeline lag and actual issues.

The data pipeline

Data moves through four stages, each adding latency:

Stage	What happens	Typical latency
1. Databricks system tables	Databricks writes usage events to system tables	1 minute – 4 hours (varies by table)
2. Collector extraction	The LakeSentry collector reads system tables and pushes data	Depends on schedule (default: once daily at ~8 AM UTC)
3. Ingestion & validation	LakeSentry validates, deduplicates, and stores raw data	1–5 minutes
4. Processing & aggregation	Data is transformed into metrics, cost rollups, and insights	5–20 minutes

End-to-end latency from a Databricks event occurring to it appearing in LakeSentry dashboards is typically 20 minutes – 5 hours, depending on the data type and collector schedule.

Expected freshness by data type

Different data types have different inherent latency at the Databricks level:

Data type	Databricks system table latency	LakeSentry display latency
Billing / cost data	1–4 hours	1.5–5 hours from the actual usage
Cluster events	Near real-time	20–40 minutes (next collector run + processing)
Query history	Minutes to 1 hour	20–90 minutes
Job run history	Minutes to 1 hour	20–90 minutes
Warehouse events	Minutes to 1 hour	20–90 minutes
Storage metadata	Hours (updated periodically)	1–5 hours

Checking pipeline status

Connector health indicators

Go to Settings > Connector to see the health of each connector:

Indicator	Meaning	Action needed
Green (Synced)	Data has been received from this connector	None — operating normally
Red (Error)	Connector status is “error” or “failed”, or no data in 30+ hours (triggers an email alert to admins)	Investigate — the connector may be broken or misconfigured. See Collector Issues.
Grey (Awaiting data)	Connector is configured but no data has been received yet	Wait for the first extraction to complete, or check the collector job.

Region connector detail

Click a region connector to see detailed status:

Last ingestion — Timestamp of the last successful data push from the collector
Tables received — List of system tables the collector is successfully extracting
Extraction checkpoints — Per-table watermarks showing how far the collector has progressed
Ingestion history — Recent ingestion events with row counts and durations

Data freshness on dashboards

Dashboard pages display a “Data as of” indicator showing the most recent data point. If this timestamp seems too old:

Check the connector health (above).
Consider the expected latency for the data type you’re viewing.
If the staleness exceeds expected latency, investigate the collector and pipeline.

Understanding lag

Normal lag patterns

Some lag patterns are expected and do not indicate a problem:

Morning cost updates — Yesterday’s billing data often finalizes overnight. Expect cost dashboards to update with the previous day’s complete data in the early morning (UTC).
Weekend/holiday gaps — If compute usage drops on weekends, there may be less new data to display. The pipeline is still running, but the deltas are smaller.
Post-deployment lag — After first deploying the collector, the initial extraction takes longer than incremental runs. The first dashboards may take 30–60 minutes to populate.

Abnormal lag patterns

These patterns suggest an issue that needs investigation:

Pattern	Likely cause	What to check
One region is fresh, another is stale	The stale region’s collector isn’t running	Check the collector job in Databricks for that region
All regions are stale	Collector infrastructure issue or LakeSentry pipeline delay	Check multiple collector jobs; if all are running, contact support
Specific data type is stale	Permission lost for that system table	Check “Tables received” on the region connector
Dashboard shows “No data” for recent dates	Collector checkpoint issue or Databricks table retention	Check extraction checkpoints

What to do when data is stale

Step 1: Check the collector

In LakeSentry, open Settings > Connector and note the “Last ingestion” time.
If last ingestion is recent (within the expected schedule), the collector is fine — skip to Step 3.
If last ingestion is stale, check the Databricks job:
- Is the job running? Has it run recently?
- Did the most recent run succeed or fail?
- See Collector Issues for detailed diagnosis.

Step 2: Check for Databricks-side delays

Databricks system tables sometimes have their own delays, independent of the collector:

Check the Databricks System Table Freshness dashboard (if available in your account console).
Query the system table directly to see if recent data exists:
```
SELECT MAX(usage_end_time) FROM system.billing.usage;
```
If the max timestamp is hours behind, the delay is at the Databricks level.

Step 3: Check LakeSentry processing

If the collector is pushing data but dashboards still appear stale:

Processing backlog — After large imports (first run or checkpoint reset), the processing pipeline may take longer than usual. This resolves on its own.
Pipeline error — Rare, but if processing fails on specific data, it can cause a backlog. The connector detail page shows ingestion errors if any exist.

Step 4: Trigger a manual refresh

If the scheduled extraction hasn’t run recently, you can trigger a manual extraction from LakeSentry:

Go to Settings > Connector in LakeSentry.
In the Data Sync panel, click the trigger button to start an immediate extraction.
Wait for the extraction to complete (progress is visible in the panel), then check your dashboards.

Optimizing data freshness

Collector schedule tuning

The default extraction schedule is once daily at ~8 AM UTC. You can adjust this per connector in Settings > Connector:

Schedule	Trade-off
Every hour	Most frequent data updates, higher compute cost
Every 4 hours	Good balance of freshness and cost
Daily at 8 AM UTC (default)	Lower cost, suitable for daily reporting and non-urgent monitoring
Paused	No automatic extraction — useful when temporarily disabling a connector

Multiple regions

Each region has its own collector and schedule. High-priority regions (production workloads) can run more frequently while development regions run less often.

Pipeline metrics

LakeSentry tracks internal pipeline metrics that can help diagnose freshness issues:

Metric	What it shows
Extraction duration	How long the collector took to extract data
Rows extracted	Number of rows pulled in the last extraction
Ingestion duration	How long it took to validate and store raw data
Processing duration	How long metric computation and aggregation took
End-to-end latency	Time from extraction to data appearing in dashboards

These metrics are visible on the region connector detail page under the “Performance” tab.

Next steps

Collector Issues — When the collector itself needs troubleshooting
Common Issues — Broader troubleshooting for dashboard and access issues
How LakeSentry Works — Understanding the full data pipeline architecture