deepset: Shaping the Future of Natural Language Interfaces
Whether it’s customer service, online teaching, or product sales – computers increasingly help to facilitate human communication. However, this requires them to understand the quirks and intricacies of natural language – language as it has naturally evolved through human history.
Language models can analyze the frequencies and relationships between words, but a single model doesn’t make up for a product. That’s why you need to build entire natural language processing (NLP) pipelines – breaking a text into chunks, processing its constituent parts, and finally deriving outputs, e.g. its sentiment, to be used in the final product feature.
Founded by Milos Rusic, Malte Pietsch, and Timo Möller in 2018, deepset provides the tools to develop natural language interfaces quickly and efficiently. It has created the open-source framework Haystack, which helps build search systems, and it raised in April 2022 a $14M Series A from Google Ventures, Harpoon Ventures, System.One, Lunar Ventures, and Acequia Capital.
Learn more about the future of natural language interfaces from our interview with the CEO Milos Rusic:
Why did you start deepset?
Malte and I met during our studies, and our passion for maths and stochastics attracted us to machine learning. Then after our studies, we worked at other startups and got a decent amount of experience. However, over time it became increasingly clear to us that natural language awareness for computers will become crucial for all software products – from chatbots to generative networks to virtual sales assistants – and that better tools for natural language processing will unlock this potential.
So we had this belief about the future but also read a lot of publications to base our startup on solid research – the state-of-the-art in machine learning makes big leaps forward. In particular, we saw how transfer learning – moving machine learning models between different use cases – was successful in other domains, and our hunch was that it would make its way also into NLP.
From there, our startup followed a quite organic and mission-driven path. We founded deepset in 2018, first by building NLP systems for enterprises and then translating our learnings into building the open-source framework Haystack and now also deepset Cloud. We haven’t had a definitive answer on how NLP would become part of every software product, but we knew it would happen. And then Google released its language model BERT and the NLP hype started. Within three weeks after the original release, we had e.g. trained our own version of BERT for a customer in aerospace.
However, a single language model is just a component, not an actual product. By developing the Haystack framework, we help developers get all the components together to build NLP pipelines – at first specifically for semantic search and retrieval, but later on also beyond search. And we moved from services to building our own product, deepset Cloud, to enable enterprises to build full-stack NLP applications – putting a semantic processing layer between their data and their enterprise applications.
How does it work?
At its core, NLP is just a “translator”: You insert human language and obtain numbers, i.e. something a computer can work with. The computer then can compute new things based on these numbers and their representation as vectors, such as the answer to a question. Transformers are a very effective way to build such number representations – and that’s our core technology.
Now, what to do with it? How does this answer my question e.g. in a semantic search engine? That’s when you need an entire NLP pipeline and why we build Haystack – allowing others to build applications on top of it. Yet, when you build an NLP application, there are quite some technical challenges, like: How do you know the output is actually good? How to make it better? And that’s why we developed deepset Cloud – our first product, to manage, track, adjust, and fine-tune your NLP pipelines even after you deployed them in an application.
Still, building an open-source framework was crucial, and in many cases, there is not even a way around it. Everything below an application needs the transparency of an open-source project for people to understand and trust what is happening on the lower levels. At the same time, open-source projects attract communities, which is extremely powerful if your technology and product should make it across the chasm.
Your community helps you improve your framework, they may even become leads, but even more importantly, they build on top, write about it, and tell others – spreading awareness and building trust in the technology as more and more people are using it. This is what get’s technology out of the cradle. If your framework is self-serviced, this can really become a flywheel: more people use it, so more people talk about it, and more people use it.
How did you evaluate your startup idea?
Generally, there are two kinds of startups: The ones operating in a super competitive, mature market – like a red ocean where the sharks eat the little fish. And the ones like deepset that operate in a wide blue ocean, where there’s no established market structure, very few fish, and lots of room to swim and evolve. It was pretty clear to us early on that this had the potential for a venture-scale startup – there was simply not much out there.
We believed that NLP would be part of every product, some signs that the technology matured, and first customers paying for it – so we knew we were on to something. And that’s the moment when you really want to double down and scale this. I love the stories of how startups shaped an entire era – here we had the perfect timing and opportunity to shape the natural language interfaces of the future.
Some advice for fellow deep tech founders: First, you should really have the willingness to go very deep into the space you’re building in. It’s okay to spend some time getting familiar with the space and learning about the technology, but your aspiration needs to be to develop solid expertise. Don’t just go opportunity hunting or try to fill some market gap if you really want to build a deep tech startup.
Second, don’t rely on others to figure out how to use your technology – think hard about how your technology could be applied. Eat your own dog food. This doesn’t say you need to verticalize – just demonstrate what the tech could be good for! That’s why we created lots of tutorials on utilizing Haystack for building applications around language models.
To get started, just build something with the technology you’re excited about. Maybe for someone who doesn’t know how to use the tech but could be a first customer. Then understand what the gap between your service and an actual product is – and this gap is your opportunity.
Last but not least, and this is true not only for startups but also generally in life: You should really care about which people you work with – be very aligned on where you want to go, have the same skin in the game, and have the same conviction about the opportunity. It helps a lot when all forces are aligned.
Who should contact you?
deepset raises $14M to help companies build NLP apps – Crunchbase article on deepset’s series A raise with Google Ventures.
deepset: Moving Open Source NLP Forward – Post by Google Ventures about their investment in deepset.
How to Evaluate a Question Answering System – Blog post by deepset outlining how to evaluate extractive QA pipelines in Haystack.