My plan is to focus on issues that are critical to business leveraging data for a competitive advantage in the data warehouse/business intelligence environment. It occurred to me that in order to do that, I need to assume that there are some people out there who are moving from an organization that did not have a data warehouse effort to one that does (perhaps moving from a mid-size company to a larger one.) For that reason, I thought it would be useful to have a short primer on data warehousing to refer back to. To that end, I have posted three "chapters" on the subject.
I have to admit, the first time I heard the term "data warehousing," I pictured little Oompa Loompas inside our computers, shuttling our data around on little forklifts. It's not like that.
You're probably a little more savvy than I was and have a better grasp of the concept of data warehousing. But in the interest of being thorough, I'm going to pretend you're as clueless as I was.
If your company is like most, you have a number of operational systems that may or many not "talk" to each other. Maybe you have a sales system, a call-routing system, a customer service tracking system, and a financial system, to name a few. There are two key points to note about these systems. First, they are "transactional" in nature. That is to say, the data is optimized in such a way that if Bill Clinton calls in to inquire about his order of "First Husbands for Dummies," your system will very quickly retrieve the information about that one particular transaction.
What's different about a data warehouse is that it's optimized for analysis rather than transactions. Initially, the analyst (or manager) wants to know how many books have been sold, by which authors, to which stores, etc. They are looking for many transactions that share one or more characteristics so it's kind of like looking at a leaf (a transaction) through a microscope v. looking at a landscape through a camera lens. You need different tools for these very different tasks.
The big problems you face when trying to gather large amounts of data from a transactional system are: 1) performance of your query, which is likely to take a very long time since that is not what the transactional system is designed to do; 2) pulling data from a transactional system requires some degree of technical expertise, which puts you at the mercy of the IT department or your analyst who may not be available at the very moment you need your answers. And 3) performance of the transactional system, since running a sizable query in the midst of the work day is sure to have a deleterious effect on your sales order system, causing a one minute customer service call to take two minutes, your 50- person customer service team to grow to 100 people, and your profits to go down the proverbial toilet. The other alternative is to hire vampires to run your queries at night, but that solution comes with its own inherent problems. Not the least of which is that when your VP is breathing down your neck and wants his numbers "now!" tomorrow is rarely good enough.
(to be continued)
Thursday, April 3, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment