Logo

dev-resources.site

for different kinds of informations.

How to deal with data changing and machine learning models doing worse after training

Published at
1/31/2022
Categories
drift
machinelearning
production
Author
jesperdramsch
Categories
3 categories in total
drift
open
machinelearning
open
production
open
Author
13 person written this
jesperdramsch
open
How to deal with data changing and machine learning models doing worse after training

Machine Learning in Production 101

I just finished some writing for the UN (ITU) about machine learning models in production.

Wonder how to deal with data changing and models doing worse after training? (when deployed)

This is for you.

πŸ“Œ We're talking about Drift

Our training data is static. Contact with the real world is non-stationary.

This drift can happen in three ways:

  • The input data changes
  • The labels for the data change
  • The inherent relationship changes

βš—οΈ Input data changes!

One way to monitor these is by checking the distribution of the new data vs the training data.

We can use these tests:

  • Continuous: Kolmogorov-Smirnov test
  • Categorical: Chi-squared test

Solution:

Retraining the models regularly.

🧩 Target / Label changes

These can be natural changes similar to the input changes.

In that case, you can use the same approach.

But sometimes, our categories change, because we make discoveries or mgmt decisions.

Solution: Updates are best reflected in automated pipelines

πŸ”€ Concept shift

This one sucks.

ML models learn the relationship between input and label (ideally).

When that relationship changes our entire historic data set is obsolete.

Essentially what happened in early 2020.

Solution: New data, but setting up auto alerts is essential

πŸ“– More info?

I wrote an ebook about machine learning validation.

I give it away to my newsletter subscribers.

I have just made the biggest update to the ebook, including production models and machine learning drift.

Subscribe to receive weekly insights from Late to the Party on machine learning, data science, and Python.

Conclusion

We hope training data represents real-world data in machine learning, but it doesn't always.

  • Set up MLOps automation
  • Retrain for input data changes
  • Care for label changes
  • Hope it's not concept drift, where the relationship of data changes
production Article's
30 articles in total
Favicon
The Making of the Zip Ship Hi-Tech Ultimate Go-Cart Indiegogo Campaign Video
Favicon
Synchronize Files between your servers
Favicon
Dulces Suenos Spanish Pop (Sample Packs)Download
Favicon
PostgreSQL fΓΌr django aufsetzen - django in Produktion (Teil 2)
Favicon
Industrial Juicers: Enhancing Juice Production Capabilities
Favicon
Cloudflare Tunnels VS ngrok
Favicon
In Laravel, always use the env() within config files and nowhere else.
Favicon
How to Set Up Multiple PostgreSQL Instances on a Single Server
Favicon
Use same Dockerfile for Dev & Production
Favicon
Integrating Vite with Flask for Production
Favicon
Everybody Dumps Production At Least Once
Favicon
The Dangers of Using the Same Database for Development and Production
Favicon
Dev Deletes Entire Production Database
Favicon
Mastering Chrome DevTools: Edit production code on-the-fly in your browser ✏️
Favicon
Best way to run Migrations in Production
Favicon
Why should you use a hidden replica set member
Favicon
Software upgrade checklist in production
Favicon
Running CockroachDB on k8s - with tweaks for Production
Favicon
Where engineering and creative production worlds clash!
Favicon
Increasing Product Release Velocity by Debugging and Testing In Production
Favicon
Next.js in Production: Best Practices and Common Pitfalls
Favicon
Deploy a containerised Fast API application in Digital Ocean
Favicon
Production incidents - 7 practical tips to help you through your next incident
Favicon
Fix Page not found error when visiting a route directly in react
Favicon
AWS Amplify - Deploy your application in minutes.
Favicon
Trying Streamyard for various things
Favicon
How to deal with data changing and machine learning models doing worse after training
Favicon
[BTY] Day 10: Real-time machine learning: challenges and solutions - Huyen Chip
Favicon
Installing Gem in Production Rails console
Favicon
Production-Ready Docker Configuration With DigitalOcean Container Registry Part I

Featured ones: