Gavin Crooks: How Normal Computing Solves Linear Algebra Problems With Thermodynamics
As Moore’s law runs out of steam and machine learning heats up, the race is on to find new ways to compute—and thermodynamic computing is a hot new contender.
Unlike traditional microprocessors, which rely on binary logic and electrical circuits to process information, thermodynamic computing leverages the natural tendency of physical systems to minimize energy and reach equilibrium states for computing.
We had the pleasure of speaking with Gavin Crooks, research scientist at Normal Computing, about how they built a prototype thermodynamic computer to solve linear algebra problems faster and more efficiently and pave the way to probabilistic machine learning beyond transformer models:
Why Did You Join Normal Computing?
One of the most fundamental questions is how much energy computing requires. In reality, we have seen the amounts of energy and money people spend on compute steadily go up, and there seems to be no upper bound on how much compute people want.
That is the story of the past 50 years and Moore’s law, leading to an exponential decrease in the cost and energy consumption of computation. Yet, those improvements are petering out. As we’re lowering the energy scale of computing, i.e., reaching a lower energy dissipation per unit operation, we’re approaching the thermodynamic regime. It’s where fluctuations dominate, and controllably switching the state of transistors becomes increasingly difficult, to the point that you’ll have errors.
People want low-energy, fast, and reliable computations, but it’s a trade-off where you typically have to pick two but won’t get the third.
When you make processors very energy-efficient, no matter the underlying technology, you’ll need to take into account thermodynamics in the basic physics of your processor. And researching thermodynamics, particularly non-equilibrium systems, has interested me for a long time.
Mainstream thermodynamics is about equilibrium systems and how we reach equilibrium, and it has been very well-established. However, in equilibrium, not many things happen. Out-of-equilibrium systems show much more interesting dynamics, which is what thermodynamic research of the past 20 years has been all about.
A startup is the best place to figure out how to make energy-efficient, thermodynamic processors work. I got in touch with the founders of Normal Computing early on, before they even had founded Normal Computing. I saw them execute excellently from there, raising money, hiring great talent, and going on an exciting mission—so I decided to join them.
How Does Thermodynamic Computing Work?
We’re building the world’s first thermodynamic computer, which leverages thermodynamic principles to compute much more efficiently, especially for AI applications. It’s a new computing paradigm, like quantum computing, but we may be able to show its potential and commercial viability much sooner.
Current microprocessors are deterministic, like clockwork, and operate with extreme reliability. At the other extreme are small physical systems on a low-energy scale, where noise and fluctuations dominate. As we move to smaller length scales and better energy efficiency of a processor’s constituents, the fundamental physics matters more. We need to take fluctuations into account, but we can also harness them for computing.
Most computing paradigms view noise and dissipation as a hindrance that causes errors, whereas thermodynamic computing views them as a resource. So we lean into the physics, rather than fight against the physics.
A digital computer today provides many levels of abstraction between the high-level application software, the operating system, the CPU, the switching of transistors, and the movement of electrons. This has introduced inefficiencies, and our goal is to remove some of these abstraction levels to bring computing closer to the fundamental physics level.
For example, a neural network is a system with continuous variables that can be trained with backpropagation. It’s inherently analog, i.e., it depends on the absolute value of the activation potential. Today’s processors simulate this analog process digitally, using zeros and ones. That introduces extra complexity and overhead and makes it less efficient. We take a step back and look at what analog systems and fundamental physics we can use to implement such computations more natively and efficiently.
A physical system settling to thermal equilibrium is a natural process that occurs without external interventions and hence can be a highly efficient way to perform a computation, i.e., without the need for external energy input.
How Do You Make Thermodynamic Linear Algebra Work?
Let’s first think: what’s the simplest thing you could implement on a thermodynamic computer? From a physics point of view, the simplest thermodynamic computer is a harmonic oscillator whose fluctuations are distributed in thermal equilibrium according to a normal distribution, also known as a Gaussian.
Normal distributions often occur in nature, which is explained by the central limit theorem. Thus, finding a way to implement them and drawing samples efficiently is important for many applications. Today, people implement normal distributions as a function on digital processors and compute samples by crunching zeros and ones. This is inefficient, especially if your normal distribution has not just one but multiple dimensions (think of modeling a portfolio with many instead of just a single asset).
We have built a prototype device that implements high-dimensional normal distributions by connecting LC circuits. These circuits consist of an inductor (L) connected to a capacitor (C), and they’re a fundamental building block in electronics used, e.g., for filtering or tuning signals.
When we operate them in a thermodynamic, low-energy regime, they behave like coupled harmonic oscillators, and their fluctuations are distributed following a high-dimensional normal distribution—the more oscillators, the higher the dimensionality. Our prototype device is for eight dimensions, and by tuning the capacitance of the oscillators, we can tune the mean and the width of the normal distributions.
It turned out that our device can not only draw samples from high-dimensional normal distributions and serve as a Gaussian random number generator. But it can also do a surprising amount of computation, including linear algebra operations and matrix inversion.
For matrix inversion, the input matrix determines the capacitances of the LC circuits. We then observe the dynamics of the oscillating LC circuits and how they’re correlated to each other—the correlations are described by the covariance matrix, which happens to be the inverse of the input matrix.
We have shown, theoretically, that this approach to thermodynamic matrix inversion has a better complexity scaling than established matrix inversion algorithms. It could make a difference, especially for large matrix inversions. However, implementing those in practice and proving that we can achieve speed-ups and efficiency gains is still on our agenda.
What’s on Your Road Map for the Future?
We’ve started by doing the simplest thing possible and implementing normal distributions, and it’s great to see how much we could already do with it. Matrix inversion turned out to be a natural operation for the hardware we built, and it could already be useful for modern-day machine learning and transformers. Matrix inversion and Gaussian sampling are subroutines of probabilistic machine learning algorithms, so in the long run, we’ll use our in-house expertise to do much more in probabilistic machine learning.
If you’re reading The Hardware Lottery, a great paper by Sara Hooker, you’ll learn that hardware and software co-evolved, and research ideas prevail because they’re suited to the available software and hardware and not necessarily because they’re inherently superior. GPUs were created for gaming and computer graphics, but they prevailed because they turned out to be a fantastic fit for scientific computations like molecular dynamics and, in recent times, for linear algebra and backpropagation for neural networks.
Bigger GPUs can handle bigger matrix multiplications, so the hardware and software co-evolved. However, we landed in a local optimum, where GPUs are pretty good at handling machine learning tasks, and all the infrastructure is already there. But they’re definitely not the best possible compute hardware, and different hardware could be much better suited to run AI much more efficiently.
We just published a paper demonstrating natural gradient descent using our thermodynamic hardware. It provides a proof of principle, which could also improve neural network training down the road.
We’re going to specialize our hardware even more toward training special kinds of neural networks like Bayesian Neural Networks (BNNs) and Energy-Based Models (EBMs), miniaturize it to the chip level, and manufacture it at scale like microprocessors today. Our thermodynamic processors will allow people to train large AI models faster and improve them, as we can unlock new types of algorithms with potentially better features, such as uncertainty awareness, at a much lower power consumption than running current machine learning models on GPUs.
What Advice Would You Give Deep Tech Founders?
I’ve worked at a couple of startups, one right out of graduate school on the e-commerce side, then one in quantum computing, and now I’m with Normal Computing. I’d say you have to embrace the uncertainty—what you do this month might be totally different from next month. But that’s part of the fun and the challenge of working in startups.