PRO+ Premium Content/Business Information

Thank you for joining!
Access your Pro+ Content below.
June 2017, Vol. 5, No. 3

Big data queries eyed to head off Hadoop performance problems

Data isn't the only thing that needs to be governed in big data systems. The queries run by data scientists and other users also have to be watched to make sure they don't bog down processing in Hadoop and Spark clusters. Hadoop performance problems became an issue at BT Group PLC after use of its data lake environment started rising rapidly in early 2016 as production applications began proliferating. "We had a bow wave of demand from users," said Jason Perkins, head of business insight and analytics architecture at the London-based company. Eventually, the communications and TV services provider had to "close the doors" to new users for a few months while it added more compute nodes to the Hadoop system, Perkins said. Properly balancing the "very mixed workload" of big data processing jobs remains a challenge, he added. And it could become a greater challenge -- BT plans to expand the number of applications in the cluster from about 100 as of April to 500 by year's end. A fix for what ails Hadoop queries Carl Steinbach ...

Features in this issue

News in this issue

Columns in this issue