What happens when the biggest databases break down

Some of the biggest relational database products in the world, which often carry multi-million dollar licensing and integration price tags, can do a lot. But sometimes it’s still not enough. A typical relational database has multiple tables, perhaps with information on customers and products, and you try to relate things between them. But when the time factor is added in, the traditional database model becomes very limited. That’s where techie.com’s “Editor’s Choice Award” winner for the 2013 Lightning Bolt Awards, TempoDB, brings a new solution to the table with a whole new category of database.

The Chicago-based startup has a disruptive take on the database market, which solves a very big problem: How to store and analyze high-resolution, time-based data. This type of data typically comes from sensors, perhaps hundreds of them, delivering data on a per-second, or even a per-millisecond basis. Analyzing that time-based data can deliver new insights into industries like oil and gas, alternative energy, and manufacturing, but the problem has always been that even the most robust existing databases would just break down under the sheer weight and complexity of the time-series problem.

The Internet of Things
The story starts with the “Internet of things”. Today’s connectivity goes far beyond networking computers – it’s about networking devices, consumer items, even refrigerators, coffee pots, and smartphones. But beyond the consumer focus, the Internet of things is part of a new Industrial Revolution that connects everything together to allow for a new and more efficient type of connectivity and collaboration – collaboration not just between people, but also between things.

The Internet of things has two levels. The first is simply the enhanced interconnection that has existed since the first college student figured out how to program the soda machine down the hall, so that it would dispense soda via an Internet command. What happens though, once the “things” in the Internet of things are all connected? That’s the second, and perhaps most important part of the equation. That heightened level of interconnection has the potential to generate enormous amounts of data. The resulting Big Data can lead to new levels of understanding that were never before possible.

Andrew Cronk, TempoDB’s CEO, has a vision that brings together the Internet of Things and Big Data. “The use case is, there are connected things already, but there are not many of them connected. Maybe there is one sensor, and they’re measuring once an hour or once a day, and they filter back daily to the central system, and storing that data.” That’s what Cronk calls the “data historian” market. “But once we start moving towards the Internet of Things, on the far end, you have all these new connected devices. Things that never existed in the world before – things like Fitbit and the Nest thermostat.” And somewhere in between the mundane Internet of things, like connecting your home appliances to your smartphone, and the new frontier, there’s a middle ground. “In the middle, you have the idea that there is already all sorts of equipment out in the world, but we just don’t know if it’s all working well. So there is an opportunity to connect them, sensorize them, and improve the way they operate.” A canonical example would be jet engines or gas turbines, and what Cronk calls “condition monitoring”, is a huge opportunity if it can be analyzed using a time-series database. “The idea is that if we can measure it, we can improve it.”

What has really enabled the Internet of things is the ability to manufacture cheap sensors, and to connect them via wireless technology. “In the past, you might have had 10 wired sensors running back someplace, and you ‘sneakernet’ it back to your office once a month. Now there are thousands of sensors distributing every second. That’s what really excites me about the Internet of things,” says Cronk.

Time Series Database-as-a-service
Like most great ideas, there is a progression. The Internet of things allowed for a greater level of connectivity between devices through sensors. Those sensors then generated larger amounts of data, which could then be analyzed for deeper understanding. Finally, that large amount of time-series data required a purpose-built database that went beyond traditional RDBMS capabilities, and TempoDB was born.

The problem faced by many has been the sheer volume of data involved in a time-series problem, which made applying database storage and analysis impossible without a purpose-built database that specifically addressed the sensor-based time issue. Cronk describes it as an index problem. “Normally you might have a table that has your customers and products, and you’re trying to relate things between them. But with time series, you have little bits of information, and you have billions of them. And it breaks the normal indexing model.”

The Startup Story
TempoDB launched as part of the TechStars Cloud accelerator in January 2012, and they currently reside in a co-working space in downtown Chicago, called Catapult Chicago, a collaborative start-up community, where early stage companies are rapidly becoming the next generation of disruptive, emerging tech companies in and around Chicago.

Cronk’s own background is in computer engineering. He worked for a large company, but quickly realized he didn’t fit into the large corporate culture and wanted to spend time trying new things. That of course, doesn’t always work well in a big company. “This was around the time when there was a Business Week cover in 2006 that had Kevin Rose on it, about how this kid made $60 million in 18 months, and it was about Digg.” That story was all it took to inspire Cronk. “I thought, wow, why am I doing this low-level embedded programming? What’s this web programming thing all about? So I jumped off and made my own web company.” That was his first startup, a crowdfunding company, which he later sold. “That was a good experience for my first company, and bootstrapping the whole thing. My biggest win was to get a Wikipedia page up for crowdfunding, because people kept deleting it, saying it was irrelevant and not a real thing. So I feel vindicated now seeing things like Kickstarter and Indiegogo doing very well and revolutionizing that.”

He got the idea for TempoDB while he was working with a geothermal energy company. “Geothermal is alternative energy and renewable, but you have to prove that it’s actually working. You bury this thing in the ground, and tell the customer, check your bills in a year to see if you’re saving any money. That’s not going to work.” Cronk was brought into the company to figure that part out. “I said, let’ put sensors everywhere. Let’s connect them online, and measure data every second.” They did exactly that, but ran into a problem. “We broke every database we tried to serve the data in. The requirement was full resolution, and to provide fast random access. So we broke MySQL, and all the normal suspects you would expect.”

Cronk was talking with some of the other folks at the company, and realized that other companies probably have the same problem, and with the Internet of things, there will be more devices and more data to be measured, and the problem will become huge. And so, the idea to create a time-series database was hatched.

With that insight, Cronk applied to TechStars in San Antonio, which immediately recognized the problem from personal experience. “We applied, and the guy who was running the program told us that his company had this same problem in a big way. They were trying to measure everything about their servers, their network, everything. This was the exact same problem, just in a different domain. We got in. We packed up our lives into my hatchback, and we drove down to San Antonio from Chicago.”

The big data value
Time-series data is possibly the biggest of the Big Data issues, and it has practical applications. Once those billions of time-series data points are collected, the possibilities are endless. An example is in oil and gas. “They are drilling holes in the ground, and they want to measure everything about that operation, so there are a bunch of sensors at the wellhead, and they want to store everything about what’s happening.” That process can then be characterized so as to better determine where and how to drill subsequent holes, based on what happened in drilling the previous ones. “For the first time, they can store all this information at a high resolution. In the past, they might have said, we can choose ten sensors, and we’ll store it at a one-minute resolution. Now, they can use 100 sensors, and a one-millisecond resolution. That high resolution capture is very valuable.”

The technology has practical applications in renewable and alternative energy as well. “If we can prove that geothermal and alternatives like wind and solar are actually working via data, it will lead to more of them being used. That’s one of my personal missions,” says Cronk.

Be the first to comment on "What happens when the biggest databases break down"

Leave a comment