Cost Attribution & Confidence Tiers
Cost attribution is how LakeSentry connects Databricks spend to the teams and people responsible for it. Rather than forcing every dollar into a bucket, LakeSentry uses confidence tiers to tell you how much to trust each attribution — so your chargeback numbers hold up under scrutiny.
The attribution model
Section titled “The attribution model”LakeSentry uses a dual-axis model for cost allocation:
- Vertical axis (Accountability) — Who is financially responsible? This maps to your organizational hierarchy: org units, departments, and teams.
- Horizontal axis (Context) — Why was the cost incurred? This uses optional categories like projects or shared infrastructure buckets.
Every usage line item gets evaluated against attribution rules in priority order. The first matching rule wins. If no rules match, a waterfall fallback determines attribution based on user identity and resource ownership.
Confidence tiers
Section titled “Confidence tiers”Each cost allocation carries a confidence tier that tells you how reliably the attribution was determined:
| Tier | What it means | How it’s determined |
|---|---|---|
| Exact | Direct identifier links cost to a specific workload | job_run_id or similar identifier in the billing metadata directly maps to a known work unit |
| Strong | Explicit linkage through query or session metadata | Query source metadata links to a job, plus clear compute mapping |
| Estimated | Time-overlap correlation with limited candidates | Multiple possible attributions; allocated based on time overlap and compute usage proportions |
| Unattributed | No reliable linkage found | Could not determine who or what caused this cost |
Attribution rules
Section titled “Attribution rules”Attribution rules are the primary mechanism for mapping costs to owners. You create rules that match billing records and assign them to teams. Rules are evaluated in priority order — lower priority number means higher precedence.
Rule types
Section titled “Rule types”| Type | Use case | Example |
|---|---|---|
| Exact | Known high-cost resources | ”Cluster 0123-456789-abcdef belongs to the ML team” |
| Pattern | Categories of resources | ”All clusters matching prod-* belong to Platform” |
| Proportional | Platform overhead | ”Distribute NETWORKING and DATABASE costs across teams by their compute spend” |
Exact rules
Section titled “Exact rules”Match a specific resource by type and ID. Use these for resources you know the owner of — a dedicated training cluster, a specific production job.
Exact rules always require a workspace (since resource IDs are workspace-scoped).
Pattern rules
Section titled “Pattern rules”Match resources by criteria. All conditions use AND logic — every specified condition must match. Conditions you don’t specify are treated as “match anything.”
Available match criteria:
| Criterion | What it matches |
|---|---|
| Resource type | cluster, warehouse, job, pipeline, endpoint, app |
| Resource pattern | Regex against resource name or ID (e.g., ^prod-.*) |
| Principal domain | Email domain suffix of the user (e.g., @analytics.company.com) |
| Tags | Databricks custom tags — key-value pairs that must all match |
Pattern rules can be global (apply across all workspaces) by leaving the workspace unset. This is useful for organization-wide tag mappings.
Proportional rules
Section titled “Proportional rules”Distribute platform overhead costs across teams based on their compute spend. Use these for costs that don’t belong to any single team — networking, database (Delta storage), predictive optimization.
The distribution is proportional: if Team A accounts for 60% of compute spend and Team B accounts for 40%, a proportional rule splits the overhead 60/40.
Attribution modes
Section titled “Attribution modes”When a rule matches, it uses one of these attribution modes:
Direct mode
Section titled “Direct mode”Assigns 100% of the cost to a single team, optionally with a category. Direct mode rules can also mark a resource as shared infrastructure — you can assign a shared bucket label (like shared:platform:analytics) to group related shared costs together for reporting.
Split mode
Section titled “Split mode”Distributes cost across multiple teams by percentage. Percentages must sum to 100%. Each allocation can optionally include a category.
You can have up to 20 splits per rule. The UI provides a “Distribute evenly” helper to auto-balance percentages.
Proportional mode
Section titled “Proportional mode”Used with proportional rules to distribute platform overhead costs across teams. The distribution is based on each team’s compute spend as a proportion of total compute cost.
Evaluation flow
Section titled “Evaluation flow”When a billing record arrives, LakeSentry evaluates it through this sequence:
- Session-based attribution — For shared compute (SQL Serverless warehouses and ALL_PURPOSE clusters), if session allocations exist, split costs among actual users proportionally based on query duration or command count. If match, apply and stop.
- Proportional rules — For overhead categories (networking, database, predictive optimization), match proportional rules by SKU pattern. If match, distribute to teams by compute spend and stop.
- Exact and pattern rules (priority order) — All non-proportional rules are evaluated together in priority order (lower priority number = higher precedence). Exact rules match by resource type + resource ID; pattern rules match by tags, resource pattern, or principal domain. First match wins and evaluation stops.
- Waterfall fallback — If nothing matched, fall through a priority chain:
- Is the resource marked as shared? → attribute to the owner’s team as shared
- Does the user have a team mapping? → attribute via user
- Does the resource owner have a team mapping? → attribute via owner
- Does the user exist but have no team? → attribute to user (no team)
- None of the above → unattributed (workspace-level)
Session-based attribution
Section titled “Session-based attribution”Some Databricks resources are shared by multiple users within the same billing period. SQL Serverless warehouses serve queries from many users, and ALL_PURPOSE clusters run commands from different notebooks.
For these cases, LakeSentry splits costs proportionally among actual users:
| Resource type | Attribution metric | How it works |
|---|---|---|
| SQL Serverless warehouses | Query duration | Users running longer queries get a larger share |
| ALL_PURPOSE clusters | Command count | Users running more commands get a larger share |
Sessions are detected using a 2-hour gap rule — a gap of more than 2 hours between consecutive billing records starts a new session. Within each session, user activity is summed and proportionally allocated.
Working with attribution rules
Section titled “Working with attribution rules”Priority guidelines
Section titled “Priority guidelines”| Priority range | Recommended use |
|---|---|
| 1–50 | Critical exact matches for known high-cost resources |
| 50–100 | Specific pattern rules |
| 100–200 | General pattern rules |
| 200+ | Catch-all and fallback rules |
When multiple rules share the same priority, workspace-specific rules take precedence over global rules.
Testing rules before saving
Section titled “Testing rules before saving”Use the Simulate tab on the Attribution page to test a rule against historical data before activating it. The simulation shows how many resources would match, the record count, and the total cost affected over the period you specify.
Date bounds
Section titled “Date bounds”Rules can have optional start and end dates. Use these for:
- Temporary overrides — “Attribute all ML training to R&D during Q4”
- Migrations — Old rule valid until Dec 31, new rule starts Jan 1
- Retroactive corrections — Set the start date in the past to fix historical attribution
Tag-based mapping
Section titled “Tag-based mapping”The Tags tab on the Attribution page provides a shortcut for the most common pattern rule scenario: mapping a Databricks tag key+value to a team. Select a tag key, see all values and their costs, then assign a team from a dropdown. LakeSentry creates a global pattern rule for you.
Improving attribution coverage
Section titled “Improving attribution coverage”If a significant portion of your costs are unattributed, here are ways to improve coverage:
- Check the Unallocated Costs page — See which resources and cost categories aren’t matched by any rule.
- Create exact rules for your top unattributed resources — a few rules can cover a large portion of spend.
- Use tag-based mapping — If your Databricks resources have consistent tags (
cost_center,team,env), map those tags to teams. - Set up identity mappings — Map Databricks principals (user emails, service principals) to teams in the Organizational Hierarchy.
- Add pattern rules for naming conventions — If your clusters follow patterns like
prod-analytics-*, create pattern rules to match them.
Next steps
Section titled “Next steps”- Attribution Rules — Creating and managing rules in the UI
- Tag Governance — Tag compliance and categorization policies
- Organizational Hierarchy & Budgets — Setting up the team structure that attribution targets