TamedAI: Shaping the Future of AI for Automatic Information Extraction
Whether it’s processing invoices, insurance documents, or other paperwork, AI can automate a lot of the tedious work that humans had to do previously.
While custom natural language processing (NLP) pipelines and computer vision models could extract information from documents even in the past, with the advent of large language models, querying documents has become as simple as asking a colleague.
Learn more about the future of AI for automatic information extraction from our interview with the co-founder and CTO, Ole Meyer:
Why Did You Start TamedAI?
My co-founder Nils and I met as machine learning researchers at university, even before it was such a hot topic. Today, with ChatGPT being one of the most commonly used products, people know what machine learning can be capable of. But back then, it wasn’t that obvious.
When the transformer paper by Google came out, this changed everything. Machine learning models have become a lot more capable, and previously insurmountable problems have become manageable. The pace of development in machine learning has been astonishing – it’s clearly one of the fastest-moving research fields.
While researchers were super hyped about all these developments, few people took notice outside of research. Researchers were mostly concerned with what’s cool and shiny, but the industry needed machine learning to be useful right away so they could see the value. Observing this mismatch, we founded TamedAI to tame the fancy things we did in research and make them productive and useful for real-world applications.
How Does Automatic Information Extraction Work?
We had worked with language models early on as part of our research at university when, in 2019, a big German media company asked us to develop a German version of OpenAI’s language model GPT-2. We attempted what seemed nearly impossible back then, focusing on language transfer and turning the English GPT-2 into a German version. That was research! However, it enabled the media company to have the first language-based applications in German already in 2020.
We then started building our AI to process entire documents, which contain not just language but also layout structuring the flow of information and visual information in charts that aren’t language. We had to adapt our language models to the flow of information, allowing us to process documents on an entirely different level compared to previous attempts solely based on NLP. Our goal isn’t AGI but to process documents in the best possible way.
Under the hood of our software is a language model that we trained end-to-end to recognize information in documents. We tried a lot of different things but eventually discovered that if we use language models, they’ll process documents pretty much like a human – and humans also have just one system, the brain, to process information and no separate OCR system. We built one transformer-based model that can process language and other modalities like images – so it can translate between different languages and different modalities like image to text.
One of the main challenges in machine learning is to get high-quality training data. While we have some strategic collaboration there, the data belongs to our customers, and we don’t use it for training. It would not even be legally allowed in many domains, e.g., for the general public or in insurance. Instead, we curate high-quality data and then use it to train smaller models, an approach that has repeatedly shown that it can beat larger models for specific tasks. We don’t optimize for more data, but we optimize the training data.
For example, if we want to recognize all the article numbers in a document, there is currently no normalized set of training data. Raw documents have a lot of noise and may have different ways of writing the article number, e.g., with or without a comma. So we first normalized the data into a standard format, and through this, we already saw big improvements – data quality is just super important.
Another important consideration is the efficiency of our models. Our typical customer processes about 1000 documents per month, and many customers pick us because we can run on-premises and not in the cloud. So we need to keep our models small to be efficient.
Generally, prompt engineering is not good at addressing more complex use cases. You have to formulate your requirements in natural language, which is bad input if you want high precision. We need something more precise. Fine-tuning our models allows us to reach a very low error rate that you couldn’t reach with mere prompt engineering.
With Flavours, we have developed a technology and brought it to our customers that enables us to make customer-specific adjustments without having to fine-tune the actual model. The customizations are data-based but very efficient. This means that, unlike with prompt engineering, we do not search for the right instruction but derive it automatically from a few examples and can thus also include very complex and/or implicit specialist knowledge that is otherwise difficult to formulate.
It also makes our models available in different flavors for different customers: when a customer comes to us with a real use case and has some amount of data, like 50 documents, we fine-tune our models with just a few documents in a short time. We’re currently customizing models in less than a week and targeting less than a day in the future. Most often, our customers have a use case and want the most precise, adaptable, and cost-effective model on the market to address it.
How Did You Evaluate Your Startup Idea?
Our idea and product evolved, starting with custom machine learning pipelines and incorporating large language models over time. Our large language models can’t write love poems, but they’re quite useful for extracting information.
Our main focus is processing any document in the best possible way. Of course, customers often ask us to address their specific problems, but we have a very broad customer base – from hospitals, insurance companies, or the chemical industry – so we focus more generally on building a product for document processing rather than a bespoke service.
The quality of our extraction makes the difference – the market is not new, but no one is happy with their current solution. When we benchmark our solution against the state-of-the-art developed decades ago, we beat it by more than 80%.
What Advice Would You Give Fellow Deep Tech Founders?
When you’re founding a company in a super hyped area, evaluating the go-to-market is sometimes quite complicated, as you need to figure out whether an AI system is really needed and creating value or if everyone is just hyped about AI and wants to give it a try.
One of our early pitfalls was focusing too much on big companies, with many hierarchy levels and busy C-levels. Oftentimes, C-levels can tell you a lot about the vision, but don’t know what’s happening on the levels below. It can be quite difficult to find the person who is relevant to you – the one who has a lot to say but also feels the pain that you’re solving.
Starting with insurance companies turned out to be tricky. While SMEs take about a week, big customers may take three to four months or even more than a year before deciding. If you can close them as a customer, that’s great, but getting there is really hard. And you need to have a plan on how to manage your cash flow and not die along the way.