Lakera: Shaping the Future of Securing AI Models
When ChatGPT was released, the quest was on—not just to get the best cake recipe or vacation tips but also to coax the underlying large language model into revealing its secrets.
Enthusiasts and experts alike were eager to discover the system prompt engineers had used to instruct ChatGPT or to unearth the internal code name “Sydney” used during the development of Bing AI.
These types of vulnerability became known as prompt injections, where savvy users crafted specific prompts to trick the AI model. Over time, they became way more sophisticated than simply telling the bots on X to ignore all previous instructions and write a poem about visiting a beach. And unlike traditional software systems, AI models are non-deterministic, making them significantly harder to secure against such attacks.
Lakera offers the leading real-time security platform for generative AI applications. Like a protective layer, the platform shields AI models against threats like data loss or toxic outputs without compromising user experience. Founded by David Haber, Matthias Kraft, and Mateo Rojas-Carulla in 2021, the company recently announced a $20M Series A led by Atomico and joined by Citi Ventures, Dropbox Ventures, and existing investors including redalpine, Fly Ventures, and Inovia Capital.
Learn more about the future of securing AI models from our interview with the co-founder and CEO, David Haber:
Why Did You Start Lakera?
Because this is the biggest thing I could be working on.
I started coding when I learned to write at the age of five, making my way from writing simple to more complex programs. Eventually, I studied computer science and learned about Prolog, a programming language used to define rules and express logic, and old-school AI, which used such rule-based systems to make decisions.
That’s how I got into machine learning. It was like automating all the Prolog programming by learning from real-world data—instead of hard-coding the rules, training a machine learning model turned out to be a much more efficient and apt way to detect patterns and make decisions based on them.
After university, I helped build four different startups, all working with AI. They focused on safety-critical AI applications, health care, and autonomous navigation for aviation. What I observed during that time was that everyone talked about machine learning models. But what was not thought about enough was that eventually, you had to deploy not just a model but a full-fledged system that is safe and secure. Safety and cybersecurity need to be naturally baked into everything you develop.
I realized that the only way to make machine learning models work in production was to make them safe. Back then, this insight was ahead of the curve, but fast-forward to today, with ChatGPT and plenty of other AI services being used daily, and we don’t have to explain this anymore.
However, just as we can’t expect traditional software engineers to be cybersecurity experts, the same goes for AI engineers. That’s why we founded Lakera AI to equip AI engineers with the cybersecurity they need and move from thinking about models to developing holistic machine learning systems. AI has enormous potential to address global challenges, but we’ll only realize its potential if we manage to make it safe and secure.
What is AI Security vs AI Safety?
While safety is about ensuring that AI systems operate without harming others, security focuses on protecting AI systems from malicious attacks and ensuring the confidentiality, integrity, and availability of the system and its data. Our focus is on securing AI models from malicious attacks, e.g., to reveal business secrets.
For example, when a large language model is deployed in an application and does well, ultimately, hundreds of millions of people will interact with it, and the model will be bombarded with different cybersecurity attacks. This is not just a theoretical risk but something we observe practically within a matter of seconds. We put a protective layer around large language models to secure them against attacks.
Traditional cybersecurity has been dealing with securing software for decades. The challenge with AI models now is that they evolve and get better over time—like living things. You can’t install an old-school firewall and be done. Security has to evolve as models evolve.
One example is prompt injection: What people don’t realize is that AI models are like Turing machines and can execute arbitrary code if given the corresponding instructions. Prompt injection manipulates the input given to an AI model to alter its behavior or output, e.g., to produce toxic outputs, reveal business secrets, or execute arbitrary code.
How Do You Make AI Models Secure?
Attacks on AI models are by now so complex and sophisticated that there’s no way to write a fixed set of rules to prevent them. The English language is simply too loose, and users can input a wide range of things. That’s why we train AI models to defend AI models.
At our core, we turn prompts into statistical structures and compare them against each other to find suspicious patterns, such as prompt injections. We monitor attacks constantly and use open-source and publicly available data to understand what attacks can look like.
Also, we built an educational game called Gandalf, which helped us a lot in building a vast and continuously growing database of AI threats. The goal for the user is to trick the model Gandalf into revealing a secret password, and the difficulty level increases with each level as we add more guards. Apart from adapting the system prompt, we leave the language model unaltered while adding checks for both the input and output to the model.
The game went viral, attracting over 250,000 users by now who tried their prompt injection skills to trick Gandalf; in addition, we have built an internal red team to come up with prompt injections, which, taken together, have generated a data set of over 40M prompt injections and allowed us to train our AI models on billions of data points to detect prompt injections.
It’s a Flywheel. The more users play the game and the more we work with customers, the better our data becomes, and thus, our AI models become better. Customers can trust us that as soon as we find a new prompt injection, our models will be able to defend against it.
How Do You Benchmark Your Models?
The way we’re building AI models is not that different from the healthcare or aerospace sector, where you build against certain safety levels, and you need to demonstrate that your system won’t fail in a million hours for certification. That’s a tough problem, and getting AI models to 99% accuracy is not good enough.
We leverage our datasets of previous attacks to constantly evaluate and benchmark our AI models. We also rigorously benchmark and test on public and proprietary datasets. It’s like having a static baseline of known attacks and a moving target to catch up with all the new ones that are discovered.
How Will Your Product Evolve in the Future?
Currently, we focus on question-and-answer use cases, and we’re glad to work with some of the coolest startups and grown-up companies as our customers. An example use case for them could be a chatbot that summarizes internal business documents and makes them accessible to employees—but it needs to take into account the different roles and access rights of its users.
We’re generally vertical-agnostic: from a technical perspective, it’s always taking care of an AI model, but the context changes. We look at this through the lens of different use cases: what would be different attack vectors for that particular use case, who interacts with the models, or what’s the level of integration of the chat with the application? That way, we make sure that our customers have 360-degree protection.
You can get started using Lakera AI within three minutes! Simply sign up on our platform and experience what kind of protection we can offer for your use case.
Next, we’ll expand to other use cases, e.g., where an AI model is acting more autonomously, e.g., as a trading bot or as part of a medical device. We’re moving to a world where human-computer interactions will be based on language, and working with multimodal, general-purpose AI models becomes the norm. Our ambition is to become the go-to choice not only for language models but also to secure any multimodal models.
What Advice Would You Give Fellow Deep Tech Founders?
As a startup, you always want to move fast—time is never in your favor. As soon as you take on venture capital, the clock is ticking. Surround yourself with people who can move fast. This includes not just people who work at your company but everyone around your company, be it users, customers, or suppliers. Be quite selective with whom you work and choose folks who move as fast as you.