The owners of the popular travel site Expedia.com are using on-premises software from Splunk Inc. to help make sense of vast amounts of machine-generated data, according to a company official who spoke last week at the O’Reilly Strata Conference in Santa Clara, Calif.
While he didn’t list names, Eddie Satterly, the senior director of architecture and engineering at Expedia Inc., said his team evaluated three on-premises packages before ultimately turning to Splunk for machine data management tools. Satterly said the decision came down primarily to Splunk’s business-user-friendly interface and its ability to scale quickly on commodity hardware.
The savings associated with cloud computing and other hosted offerings have been well documented in recent years, but Satterly said he opted not to consider hosted software for this particular job. Loggly Inc. is one example of a vendor that provides hosted software for managing machine data.
“At our core, we’re really a technology company,” Satterly said, “and there is so much [intellectual property] and business intelligence data that would have had to been put into the solution that we chose not to even look at hosted.”
Machine data -- which has been called the original “big data” -- includes all of the information generated by all of the machines that run enterprises, according to Sanjay Mehta, the vice president of product marketing at Splunk, who joined Satterly onstage at the conference.
“It’s the log files. It’s the histories. It’s the web server logs. It’s the information that is thrown off from network switches, applications, networks devices, security devices and so on,” Mehta said. “It essentially holds a trace of all of the activity and behavior of your customers, transactions, machines [and many] other things.”
For more on machine generated data
Read about Loggly’s hosted tools for managing machine-generated data
Learn about more log data management tools
Splunk software collects machine data in real time from “virtually any source,” indexes the data and makes it available for searching, browsing, analysis and visualizations. Mehta said organizations like Expedia typically collect and analyze machine-generated data to identify security threats or fraudulent activity; to identify patterns in consumer behavior that can be capitalized on; to monitor the progress of new products or services and to generally provide higher levels of operational intelligence.
Expedia, which boasts roughly 4000 technology workers, is currently using Splunk to collect and index about six terabytes of machine data per day. Satterly said that data emanates from about 27,000 servers, network switches, appliances and other devices.
“A year ago, there were a little over 20 tools that we used to manage machine data,” he said. “Some were in-house developed, some were external tools and some were open source -- but all were replaced in a three-month time frame with Splunk.”
From machine data to competitive advantage
Expedia says that one of the benefits of managing machine data has been an improved overall experience for users of its family of web sites.
Potential customers that log on to an Expedia site are less likely to spend money on vacations or business trips if -- for example -- a picture of a specific hotel doesn’t show up or site performance is degraded in some way. Expedia’s efforts to monitor server, application and a host of other logs have allowed it to detect such problems much more quickly than in the past.
“Now we detect those [problems] early on,” Satterly said. “We’ve instrumented all of the code to make sure that when this happens we’re immediately alerted.”
Understand the user base before evaluating vendors
Organizations interested in managing machine-generated data need to keep their entire user base in mind when shopping around. It’s a lesson that Expedia learned well about two years ago when it was using a different product for monitoring log data.
Without giving its name, Satterly said the tool set made the organization realize that communication with business users is paramount when evaluating vendors.
“The tool was great for [developers] and operational guys, but the business people wouldn’t even log in to the interface,” Satterly said. “They weren’t interested. It didn’t do what they wanted.”
Expedia to continue expanding big data environment
Expedia is now in the process of integrating Splunk with its big data environment. The company currently runs the open source Apache Hadoop Distributed File System to store and analyze clickstream data and several other types of information.
The company also runs Cassandra, the popular NoSQL database, to collect “next-level detail from our applications,” which includes details about search and application usage patterns. Expedia will use Splunk and its interface as a gateway to its larger big data environment.
“We’re using Splunk to do basic SQL queries into the Cassandra data store [so that we can see] everything from the event log on the Windows server all the way to the application,” Satterly said.