michelangelus - Fotolia
CAMBRIDGE, Mass. -- Different forces are at work in data management today, as a push for data innovation is tempered by an equal push for data security and compliance. Chief data officers are tasked with finding equilibrium.
These and related topics were on the docket recently when we caught up with Joe Caserta at the 2016 MIT Chief Data Officer & Information Quality Symposium here this month. Among the overriding forces that concern Caserta, the founder and president of New York-based consultancy Caserta Concepts, are real-time data processing, software development and big data analytics. Our conversation at the event turned to these matters, and naturally, the status of the chief data officer job.
Do you see challenges to the emerging chief data officer job, which seeks to balance compliance mandates with analytics innovations?
Joe Caserta: Well, there is a shift underway that I am seeing -- organizations are becoming more analytics-centric. We are seeing it as a corporate initiative, rather than just some departmental initiative. If you speak with chief data officers and the like, their work is often very much about governance, security and compliance problems. But none of that is really new. People had been struggling with it, and they are still struggling with it.
What's really new is the analytics portion of the job. It's a challenge for any individual as it requires a different part of the brain than other data management tasks do. And I'd say when analytics is owned by the chief data officer, it's not necessarily successful. The CDO needs either a peer or a subordinate that owns the analytics side.
Sometimes it's described in terms of offense and defense -- that a CDO is tasked to attack on the data innovation side, but to defend on the compliance and governance side.
Caserta: There is some truth to that. We try to guide people on how to strategically advance their business through analytics, and what we usually find is that most IT departments are on the defense. They are putting out fires and keeping the lights on. If that is the environment, it is very tough to be progressive. As they say, it is hard to practice fire prevention when you are busy putting out fires.
Still, the CDO's job is becoming more popular, because you need someone removed from the headaches of running an IT shop if you want to think strategically about your data.
If you can make the investment in a separate role or a separate division that thinks purely about data, and not about applications, development, support and infrastructure, I think you have a better chance of success.
There is another trend that has gained influence, and that is DevOps. How did we get here?
Caserta: We forget, but, before big data and analytics became the mainstays, shops would take all of their data out of transactional systems, build a data warehouse, do some data cleansing and run some reports and, maybe, if you were really, really good, that could become the golden copy of your data, which you could send back to your applications. That's what we called the closed loop. It was data warehouse nirvana.
But the IT and application development groups would have their release cycles, and the data warehouse group would have its release cycles. Never the two would meet, and they didn't really care about each other.
Now, the big data platform has really become the back end of some of the applications, especially for analytics like recommendation engines and applications that measure customers' propensity to buy. Now the two worlds are starting to come together. DevOps is becoming mandatory. It's not optional anymore.
A challenge though is having multiple levels of data quality, right?
Caserta: That is the data quality tightrope you have to walk along. One of the things that slowed progress in data analytics was the need to have data perfection before the data could be used. That was really stifling 'Corporate America.'
That needs to change. One takeaway I'd like people to have is that regulatory requirements for reporting your numbers to Wall Street are not the same as the governance requirements for doing data discovery and data exploration.
We have to start thinking about the use case, and how to do just enough governance. It's significant. We've worked with people looking at doing natural language processing on doctor's notes in order to try and predict suicide attempts, and to do suicide prevention. They couldn’t do it because the data governance officer in the company said that is not an allowed use for the data.
We need to start changing the way we think about governance, and to think about data discovery as more of an allowed use.
For fun, I have been thinking of starting a committee to establish a definition of real time. Yesterday I heard you say there is no such thing.
Caserta: Sure. Real time depends on your perception and your experience. Take a look at real time at Facebook. When you post something onto somebody's page on Facebook, they don't get it the same second. It's going to show up in a few minutes. And that's good enough.
The most important distinction, however, is between real time and batch. If you are doing batch, it executes a process, it runs and then it finishes. You can run that continuously, and even if it's perceived as real time, it's still batch.
What should not be obscured is one thing that is definitely happening. And that is: The line between the transaction platform and the analytics platform is getting really blurred. Most of the time today, calculating predictive analytics is still done in batch, it's true. But being able to serve up those [analytics] scores, depending on what page your customer is seeing, what product or category they are viewing, or what their peers are doing, that requires real-time processing. That's what's new. The technologies are changing to make the analytical and operational platforms more tightly coupled.
Learn more about the chief data officer job
Look at the prospects for the CDO
Review the role of the CDO in data governance