juanjo tugores - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Data wrangling a key to meeting civilization-scale challenges

Agriculture, social media, IBM's Watson system and other global topics were discussed at EmTech MIT 2014, with big data wrangling often at the fore.

Failure to take a systematic view of threats to future agricultural productivity has led to a "whack-a-mole" approach to such pressing global challenges, Molly Jahn told attendees at last week's EmTech MIT 2014 conference in Cambridge, Mass. Data integration and analytics acumen can be part of that view, Jahn's presentation and others showed.

According to Jahn, a University of Wisconsin agronomy professor and former acting deputy undersecretary of research at the U.S. Department of Agriculture, looming pressures on sustainable food crops could be addressed with more comprehensive data wrangling strategies. That is, strategies for herding and caring for data types of all kinds.

Jahn said agriculture and climate scientists could benefit from multi-disciplinary, real-time systems that can visualize land and water use and population migrations, much as commercial systems support visualization of global financial market data or electrical grid operations.  More data on food supply chains is becoming available to help make that possible, she added.

"We are harvesting new kinds of agriculture information with the advent of cheap micro-satellite operations," Jahn said.

Still, Jahn has noted that the massive data sets now collected on agriculture also present data challenges that need to be overcome. More and more data is being gathered on soil moisture, weather and crop conditions, but new storage techniques, analytical methods and search algorithms are required.

Detecting disease through data

Successfully sorting through social media feeds is intrinsic to the work of Rumi Chunara, a Harvard Medical School Researcher who spoke to the EmTech audience about mining new data sources to track disease outbreaks. Her work looks to see if Web activity can help to more quickly and precisely identify contagions that still represent one of the biggest problems this planet faces.

Doctors accepting patients in an office can gradually get a sense that a flu epidemic may be occurring, Chunara said, but sometimes that realization comes late in the disease cycle. Through Twitter feeds and Facebook posts, the public can contribute information that brings a better view of public health, she said.

The growth of such unstructured social media data represents the biggest change to occur in the field of bioinformatics in recent years, Chunara said in an interview after her presentation. But analyzing such data remains a bit of an art, as the social noise can sometimes obscure the true data signal.

When Chunara and fellow researchers looked at Twitter messages during a 2010 cholera outbreak in Haiti, they discovered that early in the outbreak, tweets correlated well with Haitian ministry reports. But as time went on, the tweet correlation lagged, as the Twitter community moved on to other issues. 

"Later on [in the cycle of an epidemic], you have to go back to other types of data," she said -- for example, official government data familiar to health researchers.

Chunara also emphasized that social media analytics models need to be continually reviewed, pointing to Google Flu Trends as an example of an epidemic tracking application whose models did not keep up with changes in users' behavior. "Social data is not straightforward," she said. "The major lesson is that it is not going to replace our regular ways of using data."

Watson, heal thyself

A growing surfeit of data can itself cause problems in some sectors, according to Mike Rhodin, senior vice president of the IBM Watson Group. At EmTech, he outlined IBM's recent experiences bringing the cognitive computing capabilities of Watson, the system that defeated two Jeopardy! champions in a 2011 challenge match, to bear on these problems -- ones that are obscured by the data glut.

Rhodin took part in an interview led by Jason Pontin, editor-in-chief and publisher of MIT Technology Review magazine, which put on the event. Rhodin discussed uses of Watson in insurance, cancer research, consumer and other applications, joking that, "There is not a lot of commercial application to playing Jeopardy!"

IBM's Watson czar was plainspoken about the obstacles Watson faces as it goes about its mission. In fact, working on issues of context -- having continuous dialogs with users, for example -- remains a challenge for Watson, just as it has for artificial intelligence systems throughout the course of AI history.

"The challenge of deep Q&A was the first challenge that we took on with Watson," Rhodin said. "What we discovered really was it was just the first building block." Since then, the company has also added reasoning and summarization engines to the system to try to expand its potential usefulness.

He also said that IBM is tuning Watson to successfully "carry context from question to question," likening the process of adding to the Watson core to the build-out of operating systems in the early days of computing, when file systems, databases and other modules came to join the basic software core.

Jack Vaughan is SearchDataManagement's news and site editor. Email him at jvaughan@techtarget.com, and follow us on Twitter: @sDataManagement.

Next Steps

Read about agriculture apps and the IoT

Discover how one firm mined social media data

Learn about Watson's use in cancer research

Dig Deeper on Enterprise data integration (EDI) software

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.