This article originally appeared on the BeyeNETWORK.
In ancient Greece, there was a concept called hubris. Hubris is defined as: "Exaggerated pride or self-confidence often resulting in fatal retribution”
In a recent article, I wrote about how to preserve the “hidden” value of data in a telecom service provider’s data warehouse by loading essentially raw and unaltered call detail records (CDRs). My theory was that the associated “pushback” from the extract, transform and load (ETL) team and cost for additional disk-drive space was good trade for the future value and flexibility of having access to “raw” detail records.
Apparently the ETL gods were watching and provided me with a couple of non-fatal “retribution” reminders:
- Bulk loading of “raw” details records is not the best idea in the world.
- ETL teams should not be “a miracle occurs here” cloud to make business intelligence work.
Let’s consider this month’s article my offering to the ETL gods for my hubris.
No Free Lunches
Murphy’s Law asserts that: "Anything that can go wrong will go wrong."
So, it seemed perfectly natural that after I wrote my article, I would be asked to pitch in on an effort that ETL-ed raw records into a business intelligence environment. The decision to load the raw records was not so much strategic (i.e., We’ll need the flexibility in the future.) as it was tactical (i.e., We need these records loaded. We’ll decide which fields we need later.). It was a classic quick action plan: Can we start coding before we have the requirements? Being a relatively helpful person, I agreed to help the team get the raw records loaded and started configuring the ETL tool to load the records.
After “coding” the first 75 fields of the first raw record type (I’m not going to tell you how many fields I got to configure…let’s just say the ETL gods should have been satisfied…), I decided that I had been shown the error, or at least the data-type casting error message, of my ways. All of those fields had significance, but not all of them had significance to the effort that the team was working toward.
Also, loading all of those extra fields had an impact on the processing of the raw records and the loading of data into the business intelligence environment. In other words, there were other opportunity costs associated, not just the ETL configuration effort. The database performance (loading the data on the front end and the reporting performance on the back end) was another area where raw records have hidden costs to go with their hidden value.
A greater level of data analysis would have shown the right fields for our effort or, at the very least, the wrong fields for future efforts. That validation would have gone a long way toward justifying the efforts taken by the ETL team.
The upside of this story is that the effort was a pilot for a future endeavor, and discovering the cost of “coding/loading before we have the requirements” was part of the lessons learned in the effort. And, lessons learned are always applied in the future, right?
Keep It Simple Sir
Last month, I attended the TDWI conference in Las Vegas and participated in a session where my group performed a data modeling exercise. In this exercise, we were given a fairly vague situation from which we were tasked to create several levels of a data model. For our experience level and the amount of information, I would say that our results were pretty good. However, during the course of our exercise, I heard a couple of comments from the room that roughly translated into: “It’s okay. The ETL team can fix it for us.”
This was acceptable based on:
- The amount of information given to us for the exercise.
- The amount time that we had to complete our task.
- The fact that we were data modeling and not ETL-ing.
However, in a real-world implementation, it would have been unfair to assume that the ETL team could magically fix any and all problems with our data models. It seemed a lot like the classic “a miracle occurs here” cartoon and that we should have applied more thought to simplifying our data models in accordance with Occam’s Razor. Occam's Razor states:
“One should not make more assumptions than needed.”
Occam’s Razor basically boils down to keeping things simple – not meaning simple in terms of “stupid,” but rather “simple” in terms of not being overly complex. Our “classroom” data models were good based on our assumptions (and there were a lot of assumptions…), but they were a little too “cute” for our own good at times. Again, this was an exercise designed to test our imaginations and not our server processing power, so it was perfectly fine to have “out of the box” thinking. On the other hand, had this been the real-world, our ETL team would have stopped talking to us after they figured out how much effort was involved on their part to implement the transition between our operational and reporting data models.
The lesson from my recent experiences is that there is certain amount of “toss it over the wall” thinking when it comes to ETL teams in the business intelligence world. In organizations with smaller data load requirements, this can be “acceptable” (i.e., not system killing).
However, in the world of the telecom service providers, the amount of customer and transaction data can be overwhelming. The business intelligence and data warehouse organizations that serve these telecom service providers need to be mindful that that there is a maximum limit to the amount of data that you can load into a data warehouse in a given time frame when the data formats are perfect – let alone when you fully expect ETL “miracles” to occur.
The more effort that goes into the data analysis and data modeling processes, the closer one can expect to get to that maximum limit.
Earlier in this article, I mentioned my experiences at the TDWI conference. It’s not my policy to promote one particular vendor/book/technology/conference over any others; however, I really enjoy attending the TDWI conferences. They represent a great opportunity to compare notes with others in the business intelligence field and solidify your general business intelligence knowledge base from “masters” in the industry. I highly suggest TDWI conferences to anyone looking to broaden their network and/or knowledge.
In that vein of sharing knowledge, I am interested if the readers of this channel would be interested in a blog on telecom business intelligence. Blogs are a great chance for a group to share information and views on topics. However, they are a group effort, not just the effort of one person. Send your thoughts on a telecom channel blog to me at John.Myers@BlueBuffaloGroup.com.
The "New" AT&T
Finally, it was recently announced/leaked that AT&T has agreed to purchase BellSouth. In the great business known as telecom in the United States, things that go around, come around. For the Cingular Wireless guys who have been talking “smack” about AT&T Wireless (i.e., network quality, brand, etc.) since the merger, do you think that their hubris will be on display when they carry the business cards of the “new” AT&T Wireless in 12-18 months?
John has more than 10 years of information technology and consulting experience in positions including business intelligence subject-matter expert, technical architect and systems integrator. Over the past eight years, he has gained a wealth of business and information technology consulting experience in the telecommunications industry. John specializes in business intelligence/data warehousing and systems integration solutions. John may be contacted by email at John.Myers@BlueBuffaloGroup.com.