Cost Attribution & Confidence Tiers

Cost attribution is how LakeSentry connects Databricks spend to the teams and people responsible for it. Rather than forcing every dollar into a bucket, LakeSentry uses confidence tiers to tell you how much to trust each attribution — so your chargeback numbers hold up under scrutiny.

The attribution model

LakeSentry uses a dual-axis model for cost allocation:

Vertical axis (Accountability) — Who is financially responsible? This maps to your organizational hierarchy: org units, departments, and teams.
Horizontal axis (Context) — Why was the cost incurred? This uses optional categories like projects or shared infrastructure buckets.

Every usage line item gets evaluated against attribution rules in priority order. The first matching rule wins. If no rules match, a waterfall fallback determines attribution based on user identity and resource ownership.

Confidence tiers

Each cost allocation carries a confidence tier that tells you how reliably the attribution was determined:

Tier	What it means	How it’s determined
Exact	Direct identifier links cost to a specific workload	`job_run_id` or similar identifier in the billing metadata directly maps to a known work unit
Strong	Explicit linkage through query or session metadata	Query source metadata links to a job, plus clear compute mapping
Estimated	Time-overlap correlation with limited candidates	Multiple possible attributions; allocated based on time overlap and compute usage proportions
Unattributed	No reliable linkage found	Could not determine who or what caused this cost

Attribution rules

Attribution rules are the primary mechanism for mapping costs to owners. You create rules that match billing records and assign them to teams. Rules are evaluated in priority order — lower priority number means higher precedence.

Rule types

Type	Use case	Example
Exact	Known high-cost resources	”Cluster `0123-456789-abcdef` belongs to the ML team”
Pattern	Categories of resources	”All clusters matching `prod-*` belong to Platform”
Proportional	Platform overhead	”Distribute NETWORKING and DATABASE costs across teams by their compute spend”

Exact rules

Match a specific resource by type and ID. Use these for resources you know the owner of — a dedicated training cluster, a specific production job.

Exact rules always require a workspace (since resource IDs are workspace-scoped).

Pattern rules

Match resources by criteria. All conditions use AND logic — every specified condition must match. Conditions you don’t specify are treated as “match anything.”

Available match criteria:

Criterion	What it matches
Resource type	`cluster`, `warehouse`, `job`, `pipeline`, `endpoint`, `app`
Resource pattern	Regex against resource name or ID (e.g., `^prod-.*`)
Principal domain	Email domain suffix of the user (e.g., `@analytics.company.com`)
Tags	Databricks custom tags — key-value pairs that must all match

Pattern rules can be global (apply across all workspaces) by leaving the workspace unset. This is useful for organization-wide tag mappings.

Proportional rules

Distribute platform overhead costs across teams based on their compute spend. Use these for costs that don’t belong to any single team — networking, database (Delta storage), predictive optimization.

The distribution is proportional: if Team A accounts for 60% of compute spend and Team B accounts for 40%, a proportional rule splits the overhead 60/40.

Attribution modes

When a rule matches, it uses one of these attribution modes:

Direct mode

Assigns 100% of the cost to a single team, optionally with a category. Direct mode rules can also mark a resource as shared infrastructure — you can assign a shared bucket label (like shared:platform:analytics) to group related shared costs together for reporting.

Split mode

Distributes cost across multiple teams by percentage. Percentages must sum to 100%. Each allocation can optionally include a category.

You can have up to 20 splits per rule. The UI provides a “Distribute evenly” helper to auto-balance percentages.

Proportional mode

Used with proportional rules to distribute platform overhead costs across teams. The distribution is based on each team’s compute spend as a proportion of total compute cost.

Evaluation flow

When a billing record arrives, LakeSentry evaluates it through this sequence:

Session-based attribution — For shared compute (SQL Serverless warehouses and ALL_PURPOSE clusters), if session allocations exist, split costs among actual users proportionally based on query duration or command count. If match, apply and stop.
Proportional rules — For overhead categories (networking, database, predictive optimization), match proportional rules by SKU pattern. If match, distribute to teams by compute spend and stop.
Exact and pattern rules (priority order) — All non-proportional rules are evaluated together in priority order (lower priority number = higher precedence). Exact rules match by resource type + resource ID; pattern rules match by tags, resource pattern, or principal domain. First match wins and evaluation stops.
Waterfall fallback — If nothing matched, fall through a priority chain:
1. Is the resource marked as shared? → attribute to the owner’s team as shared
2. Does the user have a team mapping? → attribute via user
3. Does the resource owner have a team mapping? → attribute via owner
4. Does the user exist but have no team? → attribute to user (no team)
5. None of the above → unattributed (workspace-level)

Session-based attribution

Some Databricks resources are shared by multiple users within the same billing period. SQL Serverless warehouses serve queries from many users, and ALL_PURPOSE clusters run commands from different notebooks.

For these cases, LakeSentry splits costs proportionally among actual users:

Resource type	Attribution metric	How it works
SQL Serverless warehouses	Query duration	Users running longer queries get a larger share
ALL_PURPOSE clusters	Command count	Users running more commands get a larger share

Sessions are detected using a 2-hour gap rule — a gap of more than 2 hours between consecutive billing records starts a new session. Within each session, user activity is summed and proportionally allocated.

Working with attribution rules

Priority guidelines

Priority range	Recommended use
1–50	Critical exact matches for known high-cost resources
50–100	Specific pattern rules
100–200	General pattern rules
200+	Catch-all and fallback rules

When multiple rules share the same priority, workspace-specific rules take precedence over global rules.

Testing rules before saving

Use the Simulate tab on the Attribution page to test a rule against historical data before activating it. The simulation shows how many resources would match, the record count, and the total cost affected over the period you specify.

Date bounds

Rules can have optional start and end dates. Use these for:

Temporary overrides — “Attribute all ML training to R&D during Q4”
Migrations — Old rule valid until Dec 31, new rule starts Jan 1
Retroactive corrections — Set the start date in the past to fix historical attribution

Tag-based mapping

The Tags tab on the Attribution page provides a shortcut for the most common pattern rule scenario: mapping a Databricks tag key+value to a team. Select a tag key, see all values and their costs, then assign a team from a dropdown. LakeSentry creates a global pattern rule for you.

Improving attribution coverage

If a significant portion of your costs are unattributed, here are ways to improve coverage:

Check the Unallocated Costs page — See which resources and cost categories aren’t matched by any rule.
Create exact rules for your top unattributed resources — a few rules can cover a large portion of spend.
Use tag-based mapping — If your Databricks resources have consistent tags (cost_center, team, env), map those tags to teams.
Set up identity mappings — Map Databricks principals (user emails, service principals) to teams in the Organizational Hierarchy.
Add pattern rules for naming conventions — If your clusters follow patterns like prod-analytics-*, create pattern rules to match them.

Next steps

Attribution Rules — Creating and managing rules in the UI
Tag Governance — Tag compliance and categorization policies
Organizational Hierarchy & Budgets — Setting up the team structure that attribution targets