Applications, networks, servers, handheld devices, laptops and pretty much every other machine in the IT infrastructure generate massive amounts of data or “events” that can be automatically logged into files for future reference.
Many organizations see these log data files as useless and either delete them, let them pile up in a server somewhere or fail to save them in the first place. But Kord Campbell, a former Splunk executive and the co-founder and CEO of San Francisco-based startup Loggly Inc., thinks that is sure to change.
Campbell says more companies -- and especially cloud-based software providers -- are becoming increasingly interested in
SearchDataManagement.com recently got on the phone with Campbell, who calls his new Software as a Service (SaaS) company a search engine for machine-generated data, to learn more about log file management. Campbell talked about the history and origins of log files and explained why he considers them “the original big data.” Here are some excerpts from that conversation:
For more on log data management
Learn about more log data management tools
Read more log file management tips
Get the lowdown on log data management best practices
Where did the phrase ‘log file’ come from?
Kord Campbell: Back in the day the Portuguese were famous for exploration and a lot of that centered on trying to find new routes to gain quicker access to spice—because spice was the “big data” of the day if you will, a commodity that everybody wanted to get their hands on. But whenever [the Portuguese explorers] would go out and sail around, they had this problem: They didn’t really know where the hell they were. And that’s kind of important when you’re trying to find something.
That’s true. But what does this have to do with log files?
Campbell: One of the things [the explorers did to combat this problem] was take a log off a tree and put it on the deck of a ship. They would then chip off part of this log and throw it in the water and [keep track] of how long it would take for the chip to float past the length of the ship. This helped them determine how fast they were going, and knowing how fast you are going is very valuable in knowing where you are in the world. They started keeping a record of that and it became the log book. [Later, when the] computer age came about, some guys wanted to capture some stuff that happens across time on the machine, and somebody said, “We’ll call it a log file.”
What does a typical log data file look like today?
Campbell: A log data file is literally a bunch of lines in a file. It’s a text file that you can read. And it’s prepended typically by a date stamp and then whatever text you want to put in the log line. Sometimes we refer to it as ad hoc data or unstructured data, but in reality a lot of times log files are highly structured. They have a very specific structure because what generates the log files themselves is software. There’s a wide variety of use cases of log files and an awful lot of different types of logs coming out of different machines.
Is it fair to say that many companies today aren’t doing much in terms of log data management and analysis?
Campbell: Yes. Large numbers of companies don’t actually log but they will start logging because they are going to be required to do that to improve the health of their business. But large numbers of companies, especially SaaS, Platform as a Service and Infrastructure as a Service companies, and the users of those services are all doing logging today.
Why do you refer to event log files as the original big data?
Campbell: Log file data is immense, often phenomenal. Did you know that [BlackBerry maker Research In Motion] generates about 38 terabytes (TBs) of log file data a day? And [online gaming company Zynga Inc.] generates about 10 TBs a day. They’re probably going to be at about 100 TBs a day within a few years because they’re announcing an initial public offering and will want to understand how people use their applications better. They need to crank up the logging to do that. Today they use that big time data -- and it is all time series data -- for operational management [and determining] the health of their applications. They also do alerting and monitoring with it so that if things break, they’re able to alert someone.
Could you give me another example of a use case for log data management?
Campbell: I always have to ask people what they do with logs because there are hundreds and probably millions of use cases for log files. One guy I talked to was actually taking logs out of wind turbines scattered all over Oklahoma, Texas, Colorado and Kansas. I’m talking about thousands of wind turbines spinning out there with servers in them recording the health of the windmill, how fast it was going and in which direction was it pointing. All of the ad hoc, unstructured data and sometimes structured data coming out of these machines needed to be put in one location. And that’s the kind of problem that we solve. We put it all in one spot and make it searchable so you don’t have to go to a zillion different servers looking up what you need to look up.