Axelera AI: Shaping the Future of AI Inference at the Edge
From recognizing faces in real-time to guiding self-driving cars and sorting packages, machine learning underpins a whole new wave of automation. But, it needs to quickly process massive amounts of data which computers were traditionally not designed for.
In the past, computers had to run all kinds of complex programs, but in a serial manner. In contrast, machine learning requires the execution of very simple operations, multiplication and accumulation of numbers, but in a highly parallelized manner. This has led to the development of new chips for machine learning applications.
Axelera AI is building such an AI chip. By leveraging in-memory computing, it’s able to perform the operations needed for machine learning much faster and more efficiently than a traditional computer. Founded in 2021 by Fabrizio Del Maffeo and Evangelos Eleftheriou, Axelera AI has just closed an oversubscribed series A by CDP Venture Capital, Verve Ventures, and Fractionelera – the latter being a consortium of investors created specifically to finance Axelera AI – bringing the total raised amount to USD 50 million. Existing investors include Bitfury, imec and imec.xpand, the Belgian sovereign fund SFPIM, as well as Innovation Industries.
Learn more about the future of AI inference at the edge from our interview with the co-founder and CEO, Fabrizio Del Maffeo:
Why Did You Start Axelera AI?
I had never planned to become an entrepreneur. Growing up in the north of Italy, with my parents working normal jobs, you would usually get a degree and stay with a big corporation till retirement. But, I always felt that need to build things. At the start of my career in big corporations, I always worked in divisions that were like starting a business from scratch. So naturally, people eventually asked me why I wasn’t building my own company.
At first, I thought I would never do this. I didn’t really care for small businesses or doing it for the money. Either it would become a large, impactful, unicorn-style business, or it wouldn’t be exciting. And I hadn’t found the right opportunity.
That changed when I was managing a subsidiary of Asus Group, UP Bridge The Gap, and together with Intel, we were developing the world’s first x86 professional maker computing board. That’s when I started looking more deeply into AI chips, not only for drones but also for computer vision for industrial applications.
It was the early days for edge AI, but we launched UP AI Edge, the first embedded artificial intelligence platform for edge computing, and even did a Kickstarter campaign. We got feedback from many early customers who had bought a sample chip that there was a need and market opportunity but that the technology wasn’t there; our chip just wasn’t good enough.
That’s when I got in touch with Bitfury and decided to join them to create Bitfuty AI to deliver chips and solutions for Edge AI and more. With their initial support, I put a plan together, read publications, and reached out to experts at IBM Research in Zurich, imec in Belgium, and ETH Zurich, where I got in touch with Luca Benini, the chair professor in electrical engineering. Through all these connections, I gathered a virtual team, and after one and a half years, we were able to raise a first funding round, form a company and go all in on the idea of creating ‘Nvidia for edge AI computing.’
How Does In-Memory Computing Work?
Computers today are designed to be computationally flexible, relying on central processing units (CPUs) that can run all kinds of calculations, from opening a browser tab to training a neural network. However, all data is processed serially. In the past, this was sensible since input came mainly from the keyboard, and a computer had to perform a wide variety of tasks.
With the advent of machine learning, everything changed. Neural networks turned out to be the best way to process – and thus make use of vast amounts of data. Fundamentally, neural networks involve only very simple operations, so-called matrix-vector multiplications, which involve just the multiplication and accumulation of numbers. But they need to perform these in a highly parallelized way.
This discrepancy led to the development of custom chips tailored for machine learning. And that’s why we developed a chip that leverages in-memory computing to accelerate running neural networks for interference.
Using static random-access memory (SRAM), we can store the matrices in memory cells and perform the matrix-vector-multiplications in place. This way, we avoid pulling them from the chip’s memory each time, which would produce a lot of overhead – avoiding the so-called von Neumann bottleneck. These memory cells work in a highly parallelized way: While a typical CPU might do about 16 operations per cycle, we do about 260,000. This shows how highly optimized our architecture already is.
We implement the activation function of a neural network and other control logics using RISC-V processors and reach a core efficiency of 14.7 TOPs/W and an overall energy efficiency of about 11 TOPs/W (tera operations per second and per watt) while keeping high int-8 precision.
Staying fully digital allows us to reach this high precision. While there are analog ways to implement a neural network, like memristors, typically, these get only to 4-bit precision, maybe 6-bit. Reaching 8-bit with these analog processors is nearly impossible. Unfortunately, analog computing still suffers from problems of noise, drift, and temperature, and it’s difficult to scale to the latest manufacturing node (7nm / 5nm / 3nm). Also, the non-deterministic way to perform calculations in the analog domain is an issue: if you input data a million times, you expect a million times the same output.
Our approach is to stay with digital chips, where we have lots of IP and experience. Also, it’s not just building the chip but delivering an overall great customer experience. Customers don’t want to spend their time optimizing their machine learning; they simply want to get results efficiently and without headaches.
That’s why we also have a software team working on optimizing the neural networks themselves. On one hand, this is about picking the right neural network architecture, and on the other hand, it’s about making already trained, large neural networks more efficient, e.g., through pruning, quantization, or knowledge distillation. Most neural networks today are still designed for CPUs, but with our tools, it will become easy for everyone to develop networks for the most cutting-edge AI chips.
How Did You Evaluate Your Startup Idea?
Our goal is to build a platform that democratizes access to AI, so we have to design an architecture that lets customers use it in the simplest possible way. Edge AI is a fragmented market with many use cases.
I have built products for twenty years, and based on my experience, I knew the right approach would be to have 50 first customers, not one. In many such cases, it’s wrong to commit to a single customer, as no one customer represents the market. You need to capture a distribution and thus need to talk to hundreds of customers.
That’s why we didn’t do pilot projects with individual customers – too much focus on the needs of a single customer. Instead, we were very transparent about our chip’s capabilities and price and got thus far more than 400 prospects reaching out to us.
If your goal is to really democratize access, you need to communicate prices openly to the market; you won’t get to negotiate with every single customer. Be open, make it accessible in terms of price, support, and the overall customer experience – and learn from the market feedback.
What Advice Would You Give Fellow Deep Tech Founders?
It seems to be always the same advice, but engineers especially need to hear it over and over again: technology never wins over customer experiences. Even if you think you have great technology, think even harder about: What problem does it solve? What is the job to be done? You can learn this only by observing your customer, not by asking them.
For example, many people buy a car to commute to work. Do they buy a car because they want a car or because they want to move from A to B? If you think about it, the problem you’re solving is commuting, but there could be many more ways to solve this problem even more efficiently. Designing a car is just one way to solve the problem.
Focus on the problem from a customer perspective, and talk to as many customers as possible. Never rely on only one customer, especially if it’s a large corporation. Large corporations have plenty of time, people, and money to consume your time.
On a final note, if you want to be an entrepreneur, you need to take action. The difference between people who fail at entrepreneurship and those who do not is that the latter take action. Acting means picking up the phone, daring to call people, and accepting that you will be rejected. One of my investors turned me down twice before putting the largest amount into our funding round. You need to be prepared to push back.
Many people are scared of being rejected, so they don’t take action. The percentage of ‘No’ is going to be overwhelming. But you need to be strong, never give up, and never stop taking action as an entrepreneur. If you’re scared to lose your face, don’t do it.