Skip to content

MLflow

The MLflow page tracks your machine learning experiment activity in Databricks. It aggregates MLflow experiments and runs with their metrics, giving ML teams visibility into experiment activity, success rates, run durations, and user engagement.

The Experiments tab shows MLflow experiments with their aggregated run statistics:

ColumnWhat it shows
ExperimentExperiment name and ID
RunsTotal number of runs, with a badge showing how many are currently running
SuccessNumber of successful runs, with a badge showing failed runs if any
UsersNumber of distinct users who ran experiments
Avg DurationAverage run duration across all runs
Last RunWhen the most recent run completed
  • Runs (descending) — Find the most active experiments
  • Experiment (ascending) — Browse experiments alphabetically
  • Success (descending) — Find the most successful experiments

The page header shows aggregate statistics across all experiments:

MetricWhat it shows
ExperimentsTotal number of MLflow experiments
Total RunsTotal number of runs across all experiments
SuccessfulNumber of runs that completed successfully
FailedNumber of runs that failed

A bar chart showing the top 10 experiments by run count. Use this to quickly identify which experiments have the most activity.

The Runs tab shows individual MLflow runs across all experiments:

ColumnWhat it shows
RunRun name and ID
StatusFINISHED, FAILED, or RUNNING
UserWho initiated the run
DurationHow long the run took
MetricsNumber of distinct metrics logged in the run
StartedWhen the run started

The Daily Activity tab shows per-day aggregated MLflow activity for a selected time range:

ColumnWhat it shows
DateThe date of recorded activity
ExperimentExperiment name or ID
RunsNumber of completed runs, with failed count if any
UsersNumber of distinct users active that day
Avg DurationAverage run duration for that day
FilterOptions
Time rangeAnalysis period (applies to Daily Activity tab)

LakeSentry maps MLflow data into the work unit model. Each MLflow experiment becomes a work unit of type mlflow_experiment, and each MLflow run becomes a work unit run. This enables MLflow experiments and runs to participate in the same attribution and tracking framework as jobs and pipelines.

  1. Sort the experiment list by Runs (descending).
  2. Look at the top experiments and their success/failure ratios.
  3. Experiments with high failure rates may benefit from investigation.
  4. Check average duration to spot experiments with unusually long runs.
  1. Switch to the Daily Activity tab and set the time range to the last 30 or 90 days.
  2. Look for experiments with increasing daily run counts.
  3. Use the Users column to understand whether activity is from one user or a team.
  4. Cross-reference with Compute to see if the underlying clusters are efficiently utilized.
  1. Check the page header stats for the overall Failed count.
  2. Sort the experiment list by Success to see failure ratios.
  3. Switch to the Runs tab to find specific failed runs and their users.