NannyML: Shaping the Future of Model Supervision
Machine learning models can excel at extracting insights from data, leading to better recommendations, better churn prediction, and better business decisions. But how do you know your model performs well post-deployment when dealing with new real-world data?
As models are deployed everywhere, data scientists—and more so decision-makers—need to know when they can trust a model prediction and when it has gone off track. And that’s when they need a “nanny” to take care of their models! NannyML detects silent model failures that happen post-deployment.
Founded by Hakim Elakhrass, Wojtek Kuberski, and Wiljan Cools, NannyML raised a 1M Euro pre-seed round in October 2020, led by Lunar Ventures and Volta Ventures, with the participation of prominent angels, including Stijn Christiaens (Co-Founder of Collibra), Jonathan Cornelissen (Co-founder of Datacamp), Lieven Danneels (CEO of Televic), and others—to provide supervision for machine learning models.
Learn more about the future of model supervision from our interview with the CEO, Hakim Elakhrass:
Why Did You Start NannyML?
My co-founders and I initially met at a hackathon in 2015—we simply stayed friends, not knowing that we would later found a startup together. Then, in 2018, my co-founder Wojtek and I started a machine learning consultancy, primarily helping with deploying machine learning models in the real world.
Our consultancy proliferated—with projects across various industries—and Wiljan joined us in the summer of 2019. We focused on models in production, where “production” could mean many different things: from a command-line tool to delivering a desktop running the machine learning model to the client.
Most of the value of machine learning comes from when it’s used. But that’s also when most of the effort is spent on maintaining models in production. And how do you know you can trust a model? We couldn’t find any satisfactory solution for this problem—and decided to give it a try ourselves and build a product more scalable than just consulting.
How Does Model Supervision Work?
Model performance changes for two reasons: either it’s a bug (coding issue or data quality problem) or a change in the underlying system that generates that data. NannyML focuses on the second case when the distribution that generates the data changes. Here, the questions are: How does the data change? And is it a material change?
For example, if a model detects customer churn, you want to know when customer behavior changes and prevent your churn from going through the roof. But customer behavior is volatile and extremely noisy, so we knew from the beginning that just looking at data variability wouldn’t cut it.
Also, you can only benchmark a model against historical data. This means that to calculate the actual model performance, you would need to know the truth about what has happened: Did the customer actually churn? We need to estimate model performance before historical data is available.
We had to research how to estimate model performance—a data science problem whose solution requires hard research. We found a way to estimate model performance directly and robustly and figured that a material change is a change in the input data that correlates to a change in the predicted model performance.
As of now, we focus on tabular data. But in the future, we may target NLP as well: The stability of machine learning models goes from very stable, e.g., for computer vision (a dog will always look like a dog), to very unstable, e.g., for tabular data (capturing noisy consumer behavior). NLP is maybe somewhere in between, where the process that generates language may be volatile.
How Did You Evaluate Your Startup Idea?
From our experience in consultancy, we know that machine learning will be omnipresent and that model monitoring will play an essential role. However, monitoring is also just a small part of it—it’s really about post-deployment data science.
We didn’t rigorously evaluate the market size; we had the gut feeling it would be large enough. However, we got some estimates from how much companies spent in other complex industries on maintenance, e.g. the maintenance of an airplane, and then took a geometric average across different industries.
Our vision is to go from machine learning to automated systems in general and link their performance to business impact. We believe all enterprises will operate in the future like a quant hedge fund: Evaluating tons of data to derive business decisions.
Companies make decisions based on complex models, but most don’t have a risk department. With NannyML, we give decision-makers superpowers and the ability to do what financial institutions used to do.