Proprietary data management software vendors like Informatica Corp., DataFlux and others will increasingly add "big data" processing capabilities to their portfolios in the near future, according to Forrester Research Inc. -- and that means now is a good time for organizations to figure out if they stand to benefit from the technology, the Cambridge, Mass.-based IT analyst firm says.
SearchDataManagement.com wanted to learn more about the upcoming influx of big-data processing technologies and got on the phone with Brian Hopkins, a principal analyst with Forrester and one of the authors of a new research paper called Big Opportunities In Big Data. Hopkins explained why big data is actually a misnomer and talked about the definition and potential benefits of big-data processing. He also had some advice for organizations that want to determine whether big-data processing tools are right for them. Here are some excerpts from that conversation:
How will big-data processing technology change the IT industry in the near future?
Brian Hopkins: I think what technology consumers are going to see over the next 12 to 18 months is a whole rush of vendors incorporating big data capabilities into their technologies [and some] have already actually been doing it. They're approaching the point where they're going to start releasing versions of products that have that capability and so [consumers of technology are] going to be asking themselves, Is this important to me? I think that it's really important to come up with an initial view of what [big-data processing] means, [why] it's important and what [organizations] should do about it right now while the technology is still in the very early stages.
Why do you think the phrase big data is a misnomer?
Hopkins: It's a bit of a misnomer because people [equate big data to] big volume. [But] it's really about volume, velocity, variability and variety. [Velocity obviously refers to] how quickly the data comes at you and so that incorporates into the scope of big data the notion of capturing a stream of data. Then high variability and high volume are also issues. [For example, there] may be a variety of formats [as opposed to] just one relationally structured data set. You could have data from a Web log, unstructured content from the Internet, content files that are tagged with metadata and hierarchical file systems. [The concept of big data addresses] how you deal with this variety of formats and how you draw meaning from them.
How does variability come into play in big-data processing?
Hopkins: When I say variability, I mean variance in meaning, in lexicon. The best example of that would be the variability problem that the [supercomputer] Watson at IBM was trying to take on. [Watson] would get an answer and would have to dissect that answer into its meaning and then use some really sophisticated parallel processing technology to try to figure out what the right question was within that three-second response time.
Why will big-data processing technology be attractive to some organizations?
Hopkins: The thing I keep coming back to is that [big-data processing] is about making decisions earlier in the process. Firms that can consume a lot of data make some decisions about what they should do faster than firms that can't. The typical decision making process in most firms that have a data warehouse and [business intelligence] infrastructure [goes like this]: Let's go capture some data. Let's integrate that data together and put it in a warehouse. Let's put some analytics on top of that. [Then] after we've done the analytics, let's make some decisions and then go execute those decisions. The problem with that approach is integration is expensive. It takes time, it takes resources and by the time you get to the decision making [part] it may be too late. [Big-data processing technology] says, Hey, let's enable you to try to change the model up. Let's enable you to capture some data and iteratively discover what's in that data and maybe make some early decisions.
What recommendations to you have for organizations as they assess the need for big-data processing technology?
Hopkins: The first thing we're recommending is that you as enterprise architects look at what big data is, draw some conclusions yourself about what it means [and] then have a dialogue with your business about the capabilities that it brings, because the biggest challenge with big data is not the technology. [We've] been doing high performance computing and massive parallel processing for years so it's not that new, but what you really need to do is have a conversation with your business about the way it changes the processes you go through to make decisions based on information.
What else should organizations interested in big-data processing think about?
Hopkins: We're encouraging clients not to forget that [big-data processing technology is] a tool in a larger information management set of tools. Capturing large or rapidly streaming, high-variable, high-volume [data] introduces the same issues that you have with other kinds of information: Who owns it? How do you assure quality? What are the security implications of it? Do you need to keep this data for compliance reasons? [There] are all these issues that don't go away that are very similar to other information management issues. So we're encouraging our clients to really think about big data as just another important component of their larger information management approach.