A healthy mix of time series processing technology and clever human ingenuity helped one U.K.-based energy consulting firm build an electricity consumption monitoring system that can handle truckloads of real-time data.
Working under a European Union (EU)-funded initiative designed to gauge the effects of energy monitoring on consumers, London-based consulting firm Hildebrand set out to build a website that energy customers could visit to see how much energy they were using at any given time.
The site, which is now being rolled out and tested in various European cities, collects huge amounts of data gleaned from low-cost monitoring devices installed on meters in selected households. It then breaks down the energy usage for each household by several metrics, including time of day, day of the week, and even individual electrical appliance. The so-called Smart Metering tool also calls on data analytics capabilities and colorful dashboards so users can compare and contrast current and past information and figure out where energy is being wasted. Hildebrand says the goal of the project is to help consumers do a better job of making environmentally friendly and financially responsible energy decisions.
The EU required a Web-based tool that could process meter readings, conduct analytical processing and potentially support traffic from 3 million households -- and building it wasn’t easy. According to Clive Eisen, CTO at Hildebrand, just reading the meters in real time meant that the system would have to handle 50,000 data inserts per second. He added that it would take a new time series processing database management system and a good amount of customization to make everything work.
“There were a whole bunch of questions that I didn’t immediately have an answer to, although, from my background, I knew you couldn’t do that with any of the traditional relational databases on the market without spending the sort of budget that would render it an impossible prospect,” Eisen said. “You’re talking about vast amounts of money to have a box or a collection of boxes big enough to record 50,000 inserts a second. That’s a lot of data.”
When the project began about two-and-a-half years ago, Eisen had several problems to figure out, including how best to deal with 50,000 inserts per second, how to analyze the data once it came in, and how to make the project financially feasible for the power generation firms that might eventually want to use it.
IBM and time series processing beats Oracle, open source
Eisen initially spoke with IBM about working with Hildebrand to bring the project to fruition because he had a long working history with Informix, which IBM acquired in 2001. But he also took some time to look at the alternatives.
He was immediately interested in using the IBM-Informix database management system (DBMS) with optional TimeSeries DataBlade module, which increases the capabilities of databases by adding support for managing time-sensitive data, according to IBM. A time series refers to any dataset that needs to be accessed sequentially and analyzed in chronological order.
“I then looked around at what the alternatives were and, frankly, there is no equivalent offering from Oracle,” Eisen said. “They have cartridges with plug-ins, but they don’t have anything specifically for time series [processing], and they certainly don’t claim anything in terms of performance.”
Other alternative technologies that Hildebrand considered included the open source Hypertable DBMS and Hadoop, an open source, Java-based system for handling data-intensive applications. Both Hypertable and Hadoop use column-wide database tables, which are known for being faster than relational databases when it comes to handling large amounts data.
Eisen was less than impressed with the “usability” of Hadoop and Hypertable, and he was concerned that finding technical support for the offerings would be difficult.
“You know [Hypertable or Hadoop are] fine if you want to have a play or you’re building a small website,” he said. “But ultimately, if you want to build a business that’s going to be working 24 hours a day, seven days a week, you’ve got to be able to pick up the phone and shout at somebody when it doesn’t work.”
In the end, Hildebrand decided to go with the Informix-TimeSeries combination, and the team immediately went to work at IBM testing labs to build and test the Smart Metering system.
Hildebrand was focused on building a system that could potentially be rolled out to 3 million homes. But Eisen knew it was unlikely that all or even most of those households would simultaneously log on to the Smart Metering site and start doing analytics. With that in mind, Hildebrand conducted its testing under the conservative assumption that 2% of households would be using Smart Metering for analytical purposes at any given time.
That hypothetical 2% of households could conceivably use the system to compare things like total Microsoft X-Box usage in May of this year vs. May of last year, Eisen said. Or, in another example, a person could use the Smart Metering site to find out the current tariff on energy and then calculate how much he would have to pay if he switched energy providers – something that U.K. households are entitled to do every 42 days.
During the first major test of the system, Hildebrand – using a quad-core, dual processing Intel-based server -- simulated 3 million homes sending readings once per minute and was able to capture almost all of it. Then Hildebrand moved to a slightly larger server and met its goal of handling 50,000 inserts per second while delivering analytic responses to 2% of the simulated homes in one to three seconds.
The new monitors will give researchers the chance to study energy usage data and find out whether giving consumers better tools to monitor energy consumption will lead to a smaller carbon footprint.
Customization needed for better time series processing
Working inside the IBM development labs to simulate a live environment that could handle the load, Eisen and his team found it necessary to do some considerable customizing.
“Even with the performance that [the TimeSeries] gets,” Eisen said, “if you’re putting in truly random data -- and by random data, I mean data coming in from 3 million houses, so you don’t know what order it’s coming in or how you’re going to file it -- it still doesn’t cope.”
As a result, Hildebrand developed its own relational, in-memory front-end database that basically acts as a filter for TimeSeries. Eisen said the front-end database aggregates and sorts incoming data before it gets into the time series processing database. It also improves overall performance by providing a place to store some information in-memory. He said the customized front end database is used primarily to drive any information that needs to appear on the Smart Metering sites instantaneously.
“You want to get [some] data into the application [quickly] because the user might be looking at the website at that moment, and they just turned their kettle on, and they would kind of like to see the dial go up just to know that it’s working,” he said.
Hildebrand considered using IBM’s SolidDB, an in-memory relational database management system used for similar tasks, but SolidDB doesn’t support TimeSeries.
“The data goes into a [customized] in-memory set of structures, then gets into the website and gets aggregated,” Eisen explained. “Certain averages are calculated out of it, and then it gets slashed into the TimeSeries periodically.”