New York-based affiliate marketing and lead generation company LinkShare Corp. is no stranger to dealing with large data volumes.
The company -- which decided to move its data warehouse environment from IBM DB2 to two separate Oracle Exadata database machines about two years ago -- currently manages more than 17 terabytes of raw data and about 10 billion rows, according to Michael Brandt, LinkShare’s director of business intelligence.
LinkShare matches online advertisers with publishers who run Internet-based banner ads and other marketing messages. LinkShare also provides both advertisers and publishers with a management portal that includes tools for running queries and business intelligence reports. For example, an advertiser might run a report and find out that a newly placed banner ad is performing poorly. They can then choose to quickly swap the advertisement out for a more effective message.
SearchDataManagement.com recently got on the phone with Brandt to find out how it’s using the Oracle Exadata database machines. Brandt explained why the company decided to move off DB2 and talked about different options on the table for sustaining future growth. Here are some excerpts from that conversation:
Why did LinkShare decide to move its data warehousing environment from IBM DB2 to Oracle Exadata?
Michael Brandt: We are just exponentially growing and we really had a need for a database solution that is scalable for our ever-growing needs. Our legacy DB2 architecture, which was a shared-nothing multi-parallel processing architecture, really wasn’t scalable. We needed something like an Exadata, or some sort of an appliance, that would allow us to easily add more nodes as needed for space as well as performance, or CPU and bandwidth.
Does the service that LinkShare provides fall under the heading of “big data” analytics?
Brandt: You know, 17 terabytes is big, but it’s not absolutely huge in this market. Probably 80% of the people in data warehousing would look at it and say, “Wow, that’s a big system and that’s certainly a lot of queries and a lot of customers and a lot of stuff going on.” But there are certainly much bigger Oracle customers out there than us.
With all the growth LinkShare is experiencing, have you considered moving to an Apache Hadoop environment at some point in the future?
Brandt: Yes, absolutely. I see that Microsoft recently made an announcement that they are getting rid of their existing warehouse and they’re going with Hadoop. That is surprising, especially since Hadoop is built off Linux and it is open source. I’m looking at it a little bit to see what it does have to offer for us. Obviously, we made a major investment with Exadata, but you know, life expectancy of these things is three to five years and we’re already two years into Exadata.
Could you envision a situation where Hadoop would one day serve as a complement to the Oracle Exadata implementation?
Brandt: It possibly can, yes. We have a traditional warehouse and we still get good numbers from it, but we use it in multiple ways. We want to get quick, fast reports to our customers, which can be done through aggregation and roll-ups and stuff like that, and we also have our ad hoc team doing deep dive analysis against the data. But we’re all using the same warehouse architecture and we know we can probably improve the system by creating -- not data marts -- but more aggregated types of solutions.
Does LinkShare provide reports to advertisers and publishers in real time?
Brandt: I can’t say it’s truly real time. We just call it “mini-batch” which means that we run our clicks every five minutes. But since we control our clicks -- we have our own click service throughout the United States -- we can quickly bring that data in.
LinkShare has gone live with two separate Exadata appliances. Could you explain the benefits of this approach and how the company divides up workloads between the two data warehouse appliances?
Brandt: With two data warehouses, we now have the luxury being able to bring one down for maintenance, and maybe have it down for six hours or so, and fail all the traffic over to the other server. Of course we’re going to get a little degradation, because now we’re throwing an additional 30,000 to 40,000 requests to one side. Our advertiser requests typically remain in one of our [data warehouse appliances] and our publishers remain on the other. However, they look at each other’s data, obviously. Merchants are looking at publisher sales data and vice versa. We were going to [make the two appliances work as a grid] but the problem was latency. You certainly don’t want a customer coming in and running a report and seeing that their clicks and sales was this much, and then running it again 10 minutes later and [come up with a lower number] because that query went to the other data warehouse.