LeanXcale: Shaping the Future of SQL Databases for Data Batch Processing
Storing data for later processing has become the foundation of many modern-day businesses—and the limitation. As processors got amazingly fast, databases struggled to keep up. Storing and accessing data became the main bottleneck for many everyday processes like making a bank transfer: If you want real-time payments, you need faster databases!
Following more than a decade of research on scaling databases, Ricardo Jimenez-Peris founded LeanXcale in 2015. It follows a new paradigm, NewSQL, to make databases more scalable and thus faster, overcoming the limitations of traditional SQL databases. In 2018, it raised a €2M seed round from Bullnet Capital.
Learn more about the future of SQL databases for data batch processing from our interview with the founder and CEO, Ricardo Jimenez-Peris:
Why Did You Start LeanXcale?
As a former computer science professor and researcher on distributed databases, I’ve dedicated my life to making databases run faster and smoother. As part of my research, I looked into scaling databases. Back then, the main way to do so was through a technique called full replication, which involved having a full copy of the database on every computer. However, the scalability was very low (logarithmic in technical terms).
It seemed that there was a scalability limit on what scalable databases could achieve. But after talking to a friend who is an expert on storage and distributed systems, I realized that just because no one has gone beyond that limit doesn’t mean that the limit is real. So, a few days later, I decided to start my research on database scalability from scratch with no a priori assumptions and figure out how to design the most scalable database from first principles.
After nine months, like a pregnancy, I had a solution for the scalability problem and decided to pursue this line of research further, applying for research funding, which was eventually granted. As part of this research project, we developed an entirely new database. When my friend saw what we had developed, he proposed building an MVP and bringing this innovation to the market. He had co-founded a startup before to bring his Ph.D. research work to the market, which impressed me, and I started to evaluate the possibility of following suit.
Since I felt unprepared for the startup world, I took a course on entrepreneurship at my university, the Technical University of Madrid (UPM). Then, to validate the idea of creating a startup, I went on a trip to California to present the new database to people at companies like HP, Salesforce, and Twitter and saw that people were really interested. That’s when I decided to quit my job as a professor and founded the next global database company.
How Does Your Scalable SQL Database Work?
When databases were invented decades ago, they were optimized for running on a single computer with a single processor. However, a single processor can’t do a lot; that’s why we’re connecting them today in computing clusters. However, people didn’t know how to scale a full database over many processors.
Different kinds of databases have appeared in the last two decades, trying to solve problems traditional SQL databases did not solve or did not solve well. For instance, data lakes managed to scale information storage, while NoSQL databases were proposed to process semi-structured data. However, the scalability of what SQL databases did was still unsolved.
SQL databases do basically three things: store data, query data, and guarantee data coherence despite failures and concurrent updates. The last one is called transactional management. People could scale data storage (e.g., data lakes) and query processing (e.g., data warehouses) but did not know how to scale transactional management. Our core invention is a process that can scale transactional management across multiple computers.
Along the way, we made several other innovations, such as improving the efficiency of ingesting data by almost two orders of magnitude with respect to traditional SQL databases. This not only made databases more scalable but also sped up data ingestion into the database, which used to be another important bottleneck.
Also, companies often need to calculate KPIs (key performance indicators) for their stored data. We found an innovative way to calculate these KPIs in real-time as you ingest the data. For example, think of a bank that likes to show its customers a dashboard with real-time analytics. The queries to get the KPIs to display on the dashboard used to be computationally very expensive.
Another example is clearinghouses. When you make a bank transfer, it’s not just simply one bank sending money to another. In between, there’s a clearinghouse checking both banks’ balances, which are the KPIs in this case. On our YouTube demo, you can see how LeanXcale assists clearinghouses in checking transactions faster.
In summary, we made three key innovations that enable us to accelerate data processing by one to two orders of magnitude and perform real-time processing for use cases in which batch processes are being used.
How Did You Evaluate Your Startup Idea?
Databases are generally hot technology with huge potential for growth since storing data is the foundation for data processing. But they are also difficult to sell. Companies won’t trust their data to a startup they don’t know, and that may disappear. They are concerned about their operational data.
So for our initial go-to-market, our main challenge was to find a use case requiring an operational database that didn’t actually involve operational data. It seemed impossible. Yet, operational databases are needed for most batch processes, and batch processes represent 80% of database use. Companies have hundreds to thousands of these, and they have a time window for them to complete. For instance, they are executed overnight and have to be completed before starting business the next day. But many of them take significantly more time, sometimes going over half the window, which forestalls a repeat if anything fails. In some cases, the processes use most of the allotted time and would soon be unable to complete in that window. In other cases, the companies need to do real-time processing but cannot with current database technology and, therefore, still do night batch processes.
With our technology, we were able to speed up such batch processes by up to 20-70X, removing the pains caused by long processing times. For instance, for Dun & Bradstreet Spain (Informa), we accelerated the process on the same hardware as the database leader by 72-fold, from 27 hours to 22 minutes.
What Advice Would You Give Fellow Deep Tech Founders?
As a researcher, it may take you a while to internalize it, but the most important thing to understand is that you’re not just solving a tech problem. To create a company, you need to have product-market fit, and thus you need to solve a market problem. The earlier you understand where people are willing to pay money for the solution of a problem, the better because then you know what you have to build. It may sound trivial, but it’s easy to miss.