Page Nav

HIDE

Breaking News:

latest

Ads Place

Lack of Foresight in the ML Development Process

https://ift.tt/QbuYDdE Enable solutions to silent problems with proactive organizational planning Photo by Scott Graham on  Unsplash Y...

https://ift.tt/QbuYDdE

Enable solutions to silent problems with proactive organizational planning

Photo by Scott Graham on Unsplash

You’ve collected and prepared your data, engineered great features, trained, evaluated, and deployed your model. You and your organization are happy: it’s showing great results in practice and is successfully advancing your organization’s business objectives. You’re finally done!

Well… not exactly.

What could go wrong?

All is well for a couple of weeks, maybe months, but eventually, someone realizes that your model is not performing quite as well as you thought. This phenomenon is called model drift. Model drift is largely caused by data drift and concept drift.

  • Data drift is when the model’s predictors (independent variables) change. For example, in an email spam prediction model, suppose that we use the rate of outbound emails as a feature. If, after we train our model, the email service now implements a cap on the rate of outbound emails, the distribution of this independent variable has fundamentally changed.
  • Concept drift is when the model’s predicted target (dependent variable) changes. Using the previous example, this could be caused by the change in the concept of how users interpret “spam”. A little-known publication could become more popular and reliable over time, so it would be inappropriate to classify emails from that domain as spam, which they may have been before.

Both types of drift lead to a degradation in model performance over time. If not monitored and corrected, deployed models quickly become inaccurate and unreliable.

Typical Solution

This is good information to know, but this is a solved problem. It’s an easy fix, our monitoring tools will detect the model drift, and then we just collect more data, re-train and re-deploy… right?

Problems

Under the same conditions as the initial development phase, this is a valid assumption. But over time, especially multiple months, the following questions can arise:

  • Who worked on the data collection, feature engineering, model creation, evaluation, and deployment? Are they even at the organization anymore?
  • Where does the data live? How do we know what version of the data the model was trained on?
  • Where do the models live? Is “model_1_best_weights” or “updated_model_1_v2” the model deployed in production?
  • Where’s the code for data processing and model development? Does the code even exist anymore? Why does reading the code make me want to cry?

These questions may seem drastic. In fact, they really should be. But the inspiration for this article was the answers: they left months ago, the data is lost, the model has vanished, and the code is unreadable. Good luck presenting this to your client.

Why do these problems arise?

I’ve been lucky to work in many organizations and have seen various stages of the ML development process. I’ve seen some very problematic situations, and some decent situations, but have never seen this process done extremely well. Why is this?

It would be easy to blame the engineers, data scientists, and development team. But in reality, in most situations, these problems are much more ingrained into the organization and culture.

The problem of difficult-to-correct model degradation arises from an organizational lack of foresight. Fundamental long-term problems proliferate in short-sighted organizations.

What are the arguments against proactive action?

I’ve noticed the following arguments in favor of the development practices that tend to create these problems, especially prevalent in smaller, newer start-ups.

This problem is a non-issue. The process of developing a functional, deployable model is much less important than the model itself.

For one-off systems or analysis, I’d agree with this point. Ad-hoc systems don’t need to be perfect, they just need to work temporarily to get to a conclusion. However, many incorrectly view the ML development process as ad-hoc, leading to this viewpoint. On the contrary, the process should be quite comparable to the fundamental practices of traditional software engineering.

We need to iterate quickly to push a product out the door.

While this may be true, sub-par development practices can actually increase the time to ship. “With bad code quality, it is easy for errors and questionable edge cases to go unnoticed. This leads later down the road to time-consuming bug fixes and, at worst, production failures. High-quality code allows you to fail early and fail fast.” [1] Counterintuitively, slowing down in the process will allow the organization to speed up in results.

This is simply a proof-of-concept, there’s no need to consider maintainability.

The approach of many “fast-paced” organizations is to start by focusing on a high-speed, low-quality proof-of-concept. This produces quick, but short-sighted results that do not transfer well to an MVP (minimum viable product). While this process can be quite efficient in organizations unsure about their data needs, for organizations that aim to be data-driven, we already understand that these projects are a needed aspect of the core business.

A combination of these arguments will often lead to our described problem.

What can we do about it?

Hopefully, by this point, you realize that this is a significant problem that can occur silently in organizations. I’ll propose a set of guidelines on the preemptive actions an organization can take to prevent this problem before it even occurs.

1. Monitoring

The bare-minimum step is to simply monitor the performance of the models. While this doesn’t enable us to fix the problem, it does allow its initial detection. If we don’t know a problem exists, how will we know to correct it?

The goal of monitoring is “to make sure that the model generates reasonable performance metrics when applied to the confidence test set.” [2] Additionally, the confidence set should regularly be updated to account for the distribution shifts described above.

2. Importance of the Iterative Process

An organization should stress the importance of the iterative nature of the ML development process, giving teams ample time to account for this. The maintenance cycle should not be under-estimated.

“Most production models must be regularly updated. The rate depends on several factors:
• how often it makes errors and how critical they are,
• how “fresh” the model should be, so as to be useful,
• how fast new training data becomes available,
• how much time it takes to retrain a model,
• how costly it is to deploy the model, and
• how much a model update contributes to the product and the achievement of user goals.” [2]

3. Data Versioning

Many data versioning tools market themselves as “git for data”. The primary purpose of any data versioning tool is to sync different versions of code and data (training data, testing data, models, etc.). When a model needs to be updated, we can obtain a perfect copy of the state of development at the last update. After the model update, if our monitoring tool indicates a decrease in performance, we can quickly and easily revert to a previous deployment. I’m a proponent of DVC, but plenty of alternative solutions exist.

4. Experiment Tracking

Experiment tracking tools allow for the tracking and visualization of all experiment-related data (hyperparameters, model configurations, results, etc.) across multiple runs. Tools like Weights & Biases, MLflow, and Neptune, among many others, are all great options. This will allow for a separation between different model versions.

5. Documentation

A developer’s least favorite pastime. This is well reflected in the convoluted mess of sporadic comments in Jupyter Notebooks and unfinished READMEs in many projects. Unfortunately, for future engineers’ sanities, choices on model architecture, replication steps, conclusions, and all other relevant information not included in previous sections should be well documented.

Conclusion

We’ve seen how a flawed development process can lead to difficult-to-correct model degradation. It is not the lack of presence of monitoring tools at the root of this issue, but rather the short-sighted organizational behaviors that contribute to these long-term problems. I proposed a set of organizational guidelines to address the span of issues described above.

I hope that you can now avoid the pain that caused me to write this article.

This post is just the beginning! If you learned something, stay tuned for future articles.

Sources

[1] E. Berge, How to Write High-Quality Python as a Data Scientist (2022), Towards Data Science

[2] A. Burkov, Machine Learning Engineering (2020), Québec, Canada: True Positive Inc.


Lack of Foresight in the ML Development Process was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Towards Data Science - Medium https://ift.tt/r4egEVS
via RiYo Analytics

No comments

Latest Articles