- Fotolia

GPU database serves up analysis of tweets, other data feeds

As a student, Todd Mostak took on large-scale tweet analysis of historic events in the Middle East. Today, he leads startup MapD, which offers a database built on graphics processing units.

In 2012, Todd Mostak was working on his thesis at Harvard University and doing computer analysis of reaction to the Arab Spring uprising that began two years previously across the Middle East. After running into a few difficulties, such as handling large volumes of social media data and getting processing time on Harvard servers, Mostak began to consider a graphics processing unit as a vehicle for Twitter data visualization.

That led him on a path to developing a GPU database. GPUs were relatively easy to obtain, having become widely available on add-in cards for computer gaming, and they offered extraordinary memory bandwidth in comparison to general-purpose CPUs.

The work Mostak was doing required creating computer visualizations of Twitter data. The visualizations would depict the ebbs and flows, eddies and currents of sentiment in the troubled region and allow users to drill down to the level of individual tweets. He saw the bandwidth-rich GPUs as a good fit and capable of handling much more than just data from Twitter.

Eventually, Mostak set out to create a company around the idea of a specialized database management system that was tailored to run on GPUs. In 2014, he and his colleagues estimated the system could run analysis on over 1 billion rows of tweet data in tens of milliseconds. The vision took shape as a product recently when his company, MapD, released its namesake GPU database and analytics platform at this spring's Strata + Hadoop World 2016 conference.

Low-level tuning

Mostak's team has fine-tuned the MapD platform by caching active data in GPU database memory, compiling queries on the fly using the Low-Level Virtual Machine (LLVM) framework and creating a system that can support vectorised queries when possible.

MapD's product is a columnar database specifically tailored to run SQL queries in parallel across GPU cores. The object is to deliver immediate visual insights into complex data sets, according to Mostak, who is MapD's CEO. He said the GPUs serve both to analyze the data and to render it for users' viewing.

The work on the SQL columnar database that underlies the system began at MIT, where Mostak had gone to join the Computer Science and Artificial Intelligence Laboratory, working with noted database engineer Michael Stonebraker.

"I realized computer science might be a better fit for my interests," Mostak said.

Data in, insight out

An early adopter described the MapD package as a combination of visualization and processing power especially suited for GPUs. Abdul Subhan, a principal architect at Verizon Communications Inc., suggested MapD could be useful in "any use case where you have tremendous amounts of data, but need an answer fast." He estimated that the product can perform a 3.2 billion-row data set query in milliseconds.

Subhan's present use cases range from network operations to tracking the status of software updates on devices, although he contemplates future uses in ad campaign tracking, as well.

"The database is fast, because it is using the true power of the GPUs, so the data is available almost immediately to the processors," he said.

He indicated that MapD's SQL interface had advantages compared to Hadoop-based products, as the latter require very specific programming skills and knowledge of programming languages. By comparison, MapD's front end supports typical data load styles that should be familiar to working database administrators and sys admins. Subhan evaluated the product with an eye toward cost per unit of power and space consumed vs. query speed. Overall, "it's a small footprint," he said, suggesting that configuring GPUs in 2U servers can significantly reduce hosting requirements.

Analyst group Gartner has given good grades to MapD, as well, including the company in its list of ''Cool Vendors in DBMS, 2016.'' In the report, Gartner analyst Nick Heudecker said users looking for systems with situational awareness in the face of quickly arriving data should consider this GPU database. At the same time, he noted challenges that MapD faces as it reaches into organizations unfamiliar with GPUs.

Next Steps

Find out about GPU database use for graph data

Learn about one large vendor's use of Spark for analytics

Check-in for a reality check on AI and analytics

Dig Deeper on Database management system (DBMS) software and technology