Tracebloc – Shaping the Future of Sourcing Machine Learning Models

Tracebloc – Shaping the Future of Sourcing Machine Learning Models

Big data, cloud computing, and large-scale models have pushed the performance of machine learning. Models are now getting deployed everywhere, automating a plethora of tedious tasks and amazing humans with their capabilities, not only at playing Go or by generating artificial poems, images, or entire podcasts. 

We interact with machine learning daily in a variety of ways: From unlocking our phones, same-day delivery from Amazon, the Instagram recommendation system and social media content moderation, to weather forecasting, or the emergency brake system for cars.

Yet, many enterprises still have difficulties employing machine learning for their use cases, as they often lack the staffing and proper expertise. They try to counter this by partnering with the right vendors or scouting relevant machine-learning models to solve their challenges. Over 50% of all use cases across industries are already implemented with the help of vendors. Yet, sourcing for the right vendors and assessing their models is challenging, as AI is a black-box technology, but it is crucial to address the use cases successfully. 

This is where tracebloc comes in, a platform where enterprises can showcase their machine learning challenges without sharing their data, and data scientists worldwide can submit models to be trained and benchmarked to solve these challenges. Founded by Lukas Wuttke in 2020, tracebloc raised a seed round from Advanced Blockchain AG the same year.

Learn more about the future of sourcing machine learning models from our interview with the CEO Lukas Wuttke

Why did you start Tracebloc?

When I was working as a consultant for several years, I saw that enterprises were sourcing tech generally in an efficient way – apart from when it comes to machine learning. It was simply very challenging for their procurement and the project leaders to benchmark vendors of machine learning models due to the complexity of the technology and the pace at which the market is moving. Enterprises could not assess if a vendor is the 3rd best or 69th best to solve their challenge – and maybe only the top two vendors could solve that challenge.

Sourcing machine learning models is currently inefficient, time-consuming, and budgets are used poorly, making it very hard for project leaders to create value for their organizations. At that time, I often asked myself: What would it take for enterprises to utilize the innovative force of the AI community, i.e. researchers, consultants, startups, or even students? And I realized two things: First, procurement will play a major role in driving innovation, as it is the gateway and bridges the enterprise data scientist teams with the AI ecosystem. And second, a tool is needed that automatically lets enterprises identify the best vendors and machine learning models for their variety of use cases repeatedly, which means models and use case data need to be connected at the sourcing stage already.

Following this insight, I started building tracebloc, a platform for solving this sourcing problem. It connects enterprises posing machine learning challenges with the data scientist community, i.e. researchers, consultants, students, startups, or even other enterprises developing models.

How does it work?

Tracebloc is a mix between a federated learning and a Kaggle-style competition platform. Every data scientist knows competitions where they develop models for a given dataset and benchmark these against each other. Yet, the tricky part is sharing the datasets, which you often can’t do for legal and compliance reasons in an enterprise.  

Tracebloc allows showcasing machine learning challenges without sharing the data. Data scientists can submit their models on our platform, where they get trained in a federated manner on the respective infrastructure of the enterprise(s) posing the challenge, and they receive feedback on how their model performs compared to others. 

Enterprises learn what the market currently offers, identify the best teams to work on follow-up opportunities, and deploy their budget most effectively. And data science teams get access to various opportunities to work with large enterprise clients and deploy their technology.

In addition to benchmarking a model’s performance, we also capture the computational effort required to train the model, i.e. how many FLOPS (floating point operations) were used and the associated carbon footprint in grams CO2e (carbon dioxide equivalent). Thereby we are building a ‘Github for sustainable AI models,’ where everyone can look up a model’s sustainability metrics and have a good starting point for their next ML project.

We provide the infrastructure for the whole process, from data scientists uploading their models, weights, and training plans to our platform to orchestrating the federated learning for hundreds of concurrent training cycles. Our tool enables collaboration between data scientists from different organizations and convenient monitoring functionality for each training. We also provide an automated exploratory data analysis for each dataset, which data scientists can use for exploratory data analysis, helping them design their models alongside the feedback they receive from training runs. And we also protect the original datasets, e.g. to prevent model inversion attacks, where a malicious user attempts to recover the original dataset.

We are currently in beta and have successfully tested our infrastructure for computer vision use cases with datasets containing up to 500K images distributed over different geographies. As a next step, we’ll tackle natural language processing and time-series data, which are more difficult to approach as they may require substantial data pre-processing.

How did you evaluate your startup idea?

We started by building machine learning models for enterprises and getting first results. But how could we know that these were actually good? We looked for a way to connect with the machine-learning community and see where our models stand compared to others on the market. And machine learning is a global effort, yet moving datasets across geographies is hard – how could we leverage the brightest minds across the world to develop the best models for our clients? That’s how we started focussing on building tracebloc as a platform for sourcing models instead of building individual models. 

Advice for fellow deep tech founders: It’s always important to first start with the user experience. Tech founders often come from a strong tech perspective, but in the end, a customer is not paying for the technology but for solving a problem. They don’t care whether you use machine learning or anything else to solve it – they care about solving it. 

Also, keep in mind how large and complex selling to enterprises can be, with lots of different stakeholders having lots of different problems. Talk a lot to clients and map out all the different stakeholders – my background in consulting helped me a lot in this regard. 

Who should contact you? 

We’re always happy to talk to fellow machine learning enthusiasts and potential clients that we could help source better machine learning models – feel free to contact us through our website. Sign up for our platform – it’s free, and you could help us build a ‘GitHub for sustainable AI’ and community.

Further Reading

DGAP-News: Advanced Blockchain AG invests in Tracebloc GmbH, a company using machine learning – Press release on Tracebloc’s initial funding round by Advanced Blockchain. 

AI’s massive appetite for computation power – Twitter thread by tracebloc on what is currently going on in the AI ecosystem. 

Compute Trends Across Three Eras of Machine Learning – Arxiv paper on the evolution of compute with the advent of deep learning.

The Computational Limits of Deep Learning – Arxiv paper on how progress across a wide variety of machine learning applications strongly relies on increased computing power.

The Imperative for Sustainable AI Systems – An article by Abhishek Gupta on The Gradient.