For more information or if you need help retrieving your data, please contact Weights & Biases Customer Support at support@wandb.com
The real challenges in machine learning go beyond just building an ML model. In contrast to conventional software systems, the performance of ML systems can degrade faster, requiring close monitoring and frequent retraining. MLOps—short for Machine Learning Operations—bridges the gap between data science and IT operations, ensuring that models are not only built but deployed, monitored, and maintained at scale. In a landscape where experimentation is easy but reliable production is hard, MLOps is the foundation that turns prototypes into business value.
Practicing MLOps means that you advocate for collaboration, automation, and continuous improvement at all steps of building and operating an ML system. The key to successful MLOps is people, processes, and tooling. For more information on how to implement a holistic MLOps approach, you can refer to this whitepaper.
We’ll dig into best practices and how Weights & Biases can help with your MLOps practices, but first, let’s ground ourselves and start by looking at the differences between DevOps and MLOps.
MLOps is to ML systems what DevOps is to traditional software systems. Both are based on the same concepts of collaboration, automation, and continuous improvement. Yet the differences between common software and machine learning systems lead to different considerations. Mainly, DevOps and MLOps differ in team composition, development process, and infrastructure management.
DevOps is a set of practices, paradigms, and developer tools for developing and operating large-scale software systems. It aims to enable a flexible and efficient software development and delivery process, which helps developers reliably build and operate machine learning systems at scale, ultimately leading to better business outcomes.
This is done by setting systems in place to improve collaboration among different teams, using automation to accelerate processes, and applying continuous improvement concepts.
Two core concepts for continuous improvement in DevOps you need to know are:
In traditional software development, CI/CD is used to automate testing, building, and deploying software, which can also be adapted to ML projects.
While a machine learning system is a software system, and similar practices apply in MLOps, there are significant differences between ML systems and software systems. As shown in the below figure, in traditional software systems, developers define rules and patterns that determine how the system processes and responds to specific inputs. In contrast, ML systems learn the patterns and relationships between the inputs and outputs through training.
One difference between DevOps and MLOps lies in their team composition. While some roles are present in both, some are unique to MLOps teams.
Typical roles in both DevOps and MLOps teams are:
MLOps teams include a few additional roles to accommodate the unique nature of ML projects:
For an in-depth explanation of the roles in an ML project, you can learn more in Lecture 13 of The Full Stack’s “Deep Learning Course.”
This section covers some of the considerations and challenges unique to MLOps by discussing which aspects of developing an ML system are different from developing software systems and their implications:
This article discusses how an MLOps pipeline considers these unique MLOps challenges in detail in the section “The MLOps Pipeline.”
DevOps and MLOps also differ in infrastructure management. The main difference lies in data storage, precisely artifact version control. Version control in DevOps and MLOps ensures traceability, reproducibility, rollback, debugging, and collaboration.
The core artifacts in a software system are source code, libraries, object files, and executables, which are versioned in a code versioning system and artifact storage. However, in an ML system, you have additional artifacts of datasets and models, which require separate versioning systems (see Intro to MLOps: Data and Model Versioning).

Recently, both the European Union and US Government have started looking into meaningfully regulating machine learning and it’s a safe bet these laws are the beginning of increased scrutiny versus the end. The regulations themselves do vary in meaningful ways, but there’s some overlap here:
Interestingly, both laws also encourage innovation, both within government and industry. They’re an attempt to regulate possible downstream ill effects but not to stymie research or blunt the cutting edge of ML.
An experiment tracking solution that logs data and model lineage is not only table stakes for serious machine learning pipelines—it’s vital for compliance.
Compliance regimes—including the regulations above—require reporting on which data informs which models, how models were trained, and, to the extent possible, model interpretability. Spreadsheets and half measures will not satisfy those requirements.
Instead, you’re advised to track as much of the model training pipeline as possible. While this has additional benefits (think debugging, stopping poor performers early, maximizing GPU usage, etc.) being able to trace your model lineage, performance, and outputs are frequently what regulators will ask for. Weights & Biases is made to do precisely this.
After the business use case is defined and its success criteria are established, the MLOps pipeline involves the following stages, as illustrated in the image below:
This section briefly discusses each of these stages. They can be completed manually or by an automatic pipeline, as discussed in the section “The Three Levels of MLOps.”
In contrast to traditional software systems, ML models learn patterns from data. Thus, data is at the core of any machine learning project and the MLOps pipeline states here. The data engineering team usually conducts this stage. Before the data can be used for model training in the next stage, it needs to go through a few essential steps:
The first step is the data collection. First, you must what data you need and identify if relevant data exists and where it exists. Then you can select and collect the relevant data for your ML task from various data sources and integrate them into one cohesive format.
Next, you conduct data exploration to ensure the data quality as well as gain an understanding of the data by performing an exploratory data analysis (EDA).
To enable transparency, this early-stage analysis step must be documented and reproducible. For this, you can use W&B Tables or W&B Reports.

Finally, in the data preparation step, the data is processed for the next stage of the MLOps pipeline. The data preparation can include:
The output of this step is the data splits in the prepared format. As this step modifies the source data, it is recommended to apply some sort of data version control, such as W&B Artifacts, which is the practice of storing, tracking, and managing the changes in a dataset. This ensures the data is centralized, versioned, and easily accessible between different teams. Data versioning can also be used as a reliable record of your data that shows data sources, data usage information, and data lineage, which enables better collaboration.

Once the data is ready, you can start developing and training your machine learning model. This stage on the MLOps pipeline requires the most ML knowledge and is usually done by the data science team. Model development is an iterative process, which requires multiple experiments on model training, evaluations, and validations on their feasibility and estimate of their business impact before making the decision on what model will be promoted to production:
As ML models learn patterns from data through a training process, model training is an integral part of model development. Usually, it is beneficial to build a simple baseline model and then iteratively improve the model e.g., by tweaking the model architecture or applying hyperparameter tuning (see Intro to MLOps: Hyperparameter Tuning). To confirm functionality and accuracy, you can e.g., use cross-validation testing.
As experimentation is an iterative process that includes iterations with different stages, it is important to establish rapid iteration and collaboration with stakeholders and different teams. To be able to collaborate, replicate an experiment, and reproduce the same results effortlessly, the training process should track datasets, metrics, hyperparameters, and algorithms through experiment tracking (see Intro to MLOps: Machine Learning Experiment Tracking). W&B Experiments can facilitate these tasks.
![]()
Next, the model goes through the model evaluation step. The purpose of model evaluation is to ensure model quality. The model quality is evaluated on a holdout test set, an additional test set that is disjoint from the training and validation sets. The output of this step is a set of logged metrics to assess the quality of the model.
The final step of the model development stage is the model validation step. The purpose of the model validation is to determine whether the model improves the current business metrics and should proceed to the deployment stage. While this step can be done manually, it is recommended to use CI/CD.
In addition to running unit and integration tests for each release, conducting A/B tests to verify that your new model is better than the current model can be useful. To enable A/B testing and observability of your deployment workflows, it is necessary to version and track your model deployment candidates, which is an artifact of the MLOps pipeline.
This model artifact can be logged to a model registry, such as the W&B registry, to record a link between the model, the input data it was trained on, and the code used to generate it. In fact, our registry allows you to both track model and data lineage across your project. You can learn more about it here.
After a model is developed and its performance is confirmed for production, the next step in the MLOps pipeline is deploying the trained model to the production environment to serve predictions and realize business value.
Is is when the MLOps pipeline moves to the “Ops” stage. This stage is usually conducted by the MLOps team. To enable a smooth and streamlined transition between the ML and Ops teams, the model artifact is usually stored in a model registry, such as the W&B model registry. A model registry is a central repository for models and is commonly used in model version control, which is the practice of storing, tracking, and managing the changes to an ML model.

Deploying a model to production can be done using common deployment strategies (e.g., Blue/Green Deployment, Shadow Deployment, Canary Deployment). Furthermore, deployment can be done manually or automated with a CI/CD pipeline.
Once the machine learning model has been deployed to production, it needs to be monitored to ensure that the ML model performs as expected and to detect performance decay early on.
While you could only monitor the model’s performance by monitoring the business metrics associated with your ML project, it can be helpful to also monitor the inputs to the model. By monitoring the input data distribution, you can detect and react to data drift more quickly.
Ideally, you should have automatic and systematic monitoring coupled with an alerting system when monitoring models in production. This can help you automatically trigger model retraining when necessary.
The level of automation of these steps defines the maturity of the ML process, which reflects the velocity of training new models given new data or training new models given new implementations. The following sections describe the three levels of MLOps according to Google, starting from a completely manual level (Level 0), over a partly automated level (Level 1), up to automating both ML and CI/CD pipelines (Level 2).
Note that the following describes a simplified version of the three levels of MLOps. For more details, please refer to the original source.
Usually, organizations just starting with ML projects don’t have a lot of automation and follow a manual workflow. However, as your organization becomes more experienced with ML projects, you may want to gradually implement a more automated MLOps pipeline to drive business value.
In level 0 MLOps, every step and every transition in the MLOps pipeline – from data engineering to model deployment – is manual. There is no CI, nor is there any CD. The absence of automation usually results in a lack of active production monitoring.
The following diagram shows the workflow of a level 0 MLOps pipeline:

Level 0 MLOps is common in businesses just starting to use machine learning. This manual process might be sufficient when you only have a few models that rarely change (e.g., only a couple of times per year). However, in practice, ML models often break when they are deployed in the real world and need to be constantly monitored and updated frequently. Thus, active production monitoring and automating the ML pipeline can help address these challenges as described in the following level of MLOps.
In level 1 MLOps, active performance monitoring is introduced and the MLOps pipeline is automated to continuously re-train the model on specific triggers and to achieve CD of the model. For this, you need to introduce automated data and model validation steps and pipeline triggers.
The following diagram shows the workflow of a level 1 MLOps pipeline:

Level 1 MLOps differs from level 0 in the following aspects:
The triggers that automatically execute the pipeline to re-train the ML model can be one or more of the following:
Level 1 MLOps is common in businesses that deploy new models regularly but not frequently based on new data and only manage a few pipelines. However, if the ML model is at the core of your business, your ML team needs to constantly work on new ML ideas and improvements. To enable your ML team to rapidly build, test, and deploy new implementations of the ML components, you need a robust CI/CD setup described in the following level of MLOps.
In level 2 MLOps, the entire MLOps pipeline is an automated CI/CD system to enable data scientists to improve the ML models by rapidly iterating on new ideas rather than only on specific triggers. In addition to the CD of the ML model in level 1, CI/CD is introduced to the automated pipeline.
The following diagram shows the workflow of a level 2 MLOps pipeline.

The main difference between level 2 MLOps and level 1 MLOps is the introduction of CI/CD to the MLOps pipeline. By doing this, the ML and Ops aspects are no longer disconnected. To accomplish this, the development of the ML pipeline must first be automated in parts (e.g., the data and model analysis steps are usually still manual processes). Then you can implement a CI/CD pipeline that is executed when new code for the ML pipeline is pushed to the source code repository:
Level 2 MLOps is common in businesses with ML at their business’s core. This system allows data scientists and engineers to operate in a collaborative setting, rapidly explore new ideas, and automatically build, test, and deploy new pipeline implementations. A level 2 MLOps strategy allows frequent (daily, hourly) retraining of the model with fresh data and deploying updates to thousands of servers simultaneously.
There is no one-size-fits-all MLOps strategy. To develop the best MLOps strategy for your business case, you will need to establish the following:
Having a clear picture of these points will help you develop a holistic approach to MLOps that drives business value, reduces risks, and increases the success rate for ML projects. As a first step to developing your MLOps strategy, you can start by conducting a maturity assessment based on 40 questions.
Whether your organization is just starting to experiment with machine learning projects or already building and operating ML applications at scale, having a strong MLOps strategy is a critical factor for success.
By applying the DevOps principles of collaboration, automation, and continuous improvement to your ML projects, you can achieve faster time-to-market for your ML applications, ultimately driving more business value at optimized costs and resource utilization. With a well-defined MLOps strategy, companies can bridge the gap between development and deployment, setting the stage for innovation and sustained success in the ever-evolving landscape of AI technologies.
As machine learning becomes a core component of your business, embracing MLOps becomes more than a strategic choice. It’s a necessity for organizations aiming to harness the full potential of their ML initiatives.