Big data became something of a household term last year, but it did so in a swirl of controversy. Data snooping by the National Security Agency and data breaches at Target stores were just two of the prominent news events that took some of the wind out of big data's burgeoning sails. That wasn't necessarily bad, though, as the storm of hype around big data technologies was threatening to burst those sails at the seams.
Data processing has a long history at this point, but the recent controversies suggest that it may be entering a new era -- one in which information ethics will become a top-of-mind topic. While most of our coverage here is about the nuts and bolts of data management, we're also hearing from people on the IT and application front lines who are thinking about the ethics of data, perhaps more deeply than in years gone by.
Big data applications have something to do with that; so do the open data initiatives being launched by many government entities. But like almost any political, philosophical or ethical issue, the challenge doesn't present itself in stark black and white. Data professionals have to find a balance between making data open, protecting individuals' privacy and -- in businesses, at least -- using data to make money.
Longhorn data gets freer range
Privacy and access to data are dual concerns for Stephanie Bond Huie, vice chancellor of strategic initiatives at the University of Texas in Austin. She helped lead an effort to open up data for measuring the UT system's performance across its various academic and health services institutions, but at the same time she had to be mindful of protecting sensitive data that isn't meant to be seen by public eyes.
David WellsConsultant and TDWI instructor
Huie told me that she and her colleagues have transformed their approach to delivering data analytics. Where once-a-year "report books" previously held sway, highly interactive and up-to-date data analysis dashboards became the rule. To meet the challenge, her group implemented a data warehouse fronted by SAS Institute Inc.'s analytics software. By using SAS Visual Analytics, Huie and her team are enabling citizens to see info such as out-of-state versus in-state enrollment for different academic departments.
Last month, the university also announced the launch of a website that provides salary and student-loan debt statistics on students one year and five years after graduation -- certainly information of economic interest as the cost of education inexorably rises. But data about the students themselves must be masked.
"The issue for us has been making sure we are protecting the individual student data," Huie said. "It was important that we had secure procedures in place, to make sure people can't hack into the site." For example, the production servers that push data to the website don't hold any student data.
The drive to open up the data on the university's performance wasn't without controversy. The effort came in part at the behest of forces for political reform backed by Republican Gov. Rick Perry. The proponents pushed hard to bring the data to light, including politically sensitive metrics like the workloads of professors. UT's chancellor, Francisco Cigarroa, has embraced measured initiatives of data openness while at the same time asserting university autonomy.
Information ethics on the back burner?
Huie's efforts are clearly conscious of the real people behind the data points. Will that type of thinking become pervasive? That's what I was wondering when I met up with David Wells, an independent consultant at Seattle-based Infocentric and an instructor for The Data Warehousing Institute, at a TDWI conference late last year.
As so often has been the case of late, the NSA and its digital "hoovering" was in the news. Funny -- the surveillance agency is being scrutinized for activities that sometimes aren't very different than the practices of top Web companies that collect massive amounts of information as part of their big data business strategies. That was the backdrop as I asked Wells if there really is such a thing as a data profession, with accepted norms of data ethics.
For more on
Discover the ethical pitfalls of data collection
Read about researchers' views on ethics and big data in the cloud
"It's not a profession yet," he said, adding that he doesn't think there's enough interest in ethical considerations to qualify it as one. "There are a handful of articles about the ethics of data, but there should be many. There should be books, and classes. What we have to do is to get people to understand that it matters."
Clearly, the issue is close to Wells' heart. In an article published last year in TDWI's Business Intelligence Journal, Wells and late co-author James B. Thomann outlined the dual ethical and legal issues brewing in business intelligence. The scope for BI, IT and business managers, Wells and Thomann wrote, spans how they guide the conduct of employees and how organizations collect data and use the intelligence derived from it.
Unfortunately, as Wells points out, it is crises that most often drive people's attention. The "NSA fiasco," he remarked, may be what is needed to bring the issue of balancing data access and privacy to a head.
Coming months will tell how much attention stays focused on information ethics issues. Data and analytics are playing an increasingly pivotal role in the rush to innovation, as well as in the drive to succeed in commerce. Sometimes, however, it's necessary to pause and consider the social implications of data collection, access and usage -- if the term "data professional" is to ring true.