Insights based on user-generated data have been a cornerstone of many organizations' business strategies for some time. But with more data being collected -- and more insights being pulled from that data -- than ever before, managing that information responsibly becomes increasingly difficult.
Indeed, the majority of organizations are behind the curve when it comes to data management ethics, according to James Cotton, international director of the Data Management Centre of Excellence at Information Builders, based in New York. And technology can only get organizations so far in ensuring data management ethics, Cotton said. An organization-wide cultural shift is needed, as well.
In this Q&A, Cotton discusses the biggest challenges around data management ethics and details steps organizations -- and data professionals specifically -- can take to ensure they're gathering, managing and analyzing data responsibly. The first step, he said, is simple: Understand where the data is coming from, how it's being used and who has access to it.
Editor's note: The following transcript has been edited for clarity and length.
How can organizations ensure ethical data management? What steps can they take?
James Cotton: I would say the steps an organization can take are, first of all, to look closely at what data they have, how the data is being used and what decisions or services they ultimately would like that data to be part of.
Ethical use of data doesn't sit with a single individual within an organization. It has to be mandated from the top. As you might expect, you often see that companies hoard a lot of data -- they record a lot of data, and they're not quite sure why they're recording it. All they know is that they might not have a use case for it today, but they might two, three, four or five years down the line.
That generally tends to create problems. In order to reduce risk, just physically not having access to something you don't need is a very simple first step. The second step, of course, is understanding what data is actually useful. I think you'd be surprised that the majority of companies have absolutely no idea where their data comes from, how it's being used within their organization or where it actually ends up.
What is the role of data professionals in data management ethics?
Cotton: If you'd classify yourself as either a data professional or a data scientist, [data management ethics] is definitely something you should be aware of. They should definitely be aware of what data they have at their fingertips, where that information came from and how it was transformed in its journey onto their desk where they do their analysis.
For the most part, I would say that the majority of advanced scientific analysis of data is pretty harmless. However, very few people have a real understanding of where the data that analysis is based on came from. Actually knowing where it came from is something that most organizations are very ill-equipped to do.
Just to give you an example: The fact that someone has red hair is not something we consider personally identifiable information. But if you also record the fact that that person lives in a village in the north of Finland and the village only has 20 people, then potentially that data has become personally identifiable. So, it's not just the information; it's the context within which it is being used.
And, by their very nature, data scientists tend to sit at the end of that chain. They might be part of preparing data sets, but they're really focused on getting results out of data. And everything that has happened before then doesn't have their focus, generally.
What are some of the biggest data management ethics issues that can arise in an organization?
Cotton: I could have one data set that has lots of attributes that do not uniquely identify an individual, but when I combine it with a different data set that has attributes that also don't uniquely identify people, the combination might well be able to.
James CottonInternational director of the Data Management Centre of Excellence at Information Builders
So, what we tend to see is that organizations are starting to quantify this data and score it based on relevancy, accuracy and a few other metrics in order to know which combinations they can safely make and which they can't. Even if I make combinations that aren't safe at an individual level, as long as I only report on it at a summarized or aggregated level, I'm probably still within the law when it comes to what I am allowed to do with data.
The majority of organizations are trying to attack this problem based on the current laws, rules and regulations that are out there. Europe might be a little further down the line than some of the other countries. But those rules and those regulations are years behind where the technology is at the moment and probably always will be.
How does organizational culture need to change to ensure ethical data management?
Cotton: Organizational change is at the heart of ethical data management. Though technology can play a large part and is a great enabler of the foundations required for data management, it remains mainly a human exercise in which every part of the organization needs to know that it's a shared responsibility. As I said, a great starting point is often knowing what data an organization has, where the data came from, who has access to it, how it moves through the company and for what purpose, and how it's being transformed.
Limiting access to data so people only use what they need to get their job done can help avoid many data-related issues further down the line. Though this might sound simple, it can be a challenge for many companies due to the dispersed nature of the data we have and the way we work with it. When designing new processes or adapting existing ones, it pays off to design for maximum transparency from day one. This is a challenge worthy of an organization's best minds.