Data Warehouses and Heterogeneous Marts
As long as we’re on the subject of relational databases, one of the primary uses of a relational database is the data warehouse. There is still a big need for the data warehouse concept in any modern environment. The ideas of sharing the data, the platform, a model, the methods and the tools across different data sets and subject areas brings many benefits.
Sharing the data, as long as concurrency is not a technical issue, is nothing but beneficial.
Sharing a platform means organizations with smaller budgets can still be able to get their data from a robust platform and don’t have to go through their own acquisition and provisioning – things that they may be only dubiously equipped to do.
Sharing the model hopefully means the organization has given some thought to the one way every item should be represented. For a user, it means they are going to be using data that is certified at the enterprise level and with data up to quality that passes a standard.
Sharing methods means you don’t have each group building analytic databases in their own way. It means the Kimball/Inmon/hybrid conversation, as well as the 20-30 knock-on conversations, have occurred and the organization is geared up to support a way forward such that each decision that needs to be made is not the beginning of a new adventure.
Some tools are better than others for analytic data access, yet many tools overlap in terms of capabilities. This is especially true for those capabilities that are actually used in an organization. While multiple tools are necessary, multiple overlapping tools are not. Getting good at a core set, and involved in the culture of that core set, is a key to success.
So, there are clearly efficiency benefits from a data warehouse. It is effectiveness measures that are pulling data off a shared data warehouse and putting new workloads on different platforms.
The costs of an in-memory platform are generally prohibitive today for the multi-use data warehouse. Therefore, we do not see in-memory used that way much.
Making data warehouse data columnar in orientation generally would help a data warehouse more than it would hurt. However, people don’t generally like any downsides with their upgrades. A data warehouse community is not just multiple people. It’s very disparate user groups.
Likewise, you’re not going to make the data warehouse consist of streaming data or store it on a graph database or a big data platform optimized for unstructured data, which is generally an application-specific, new type of managed data.
You may have the same persuasion challenge with getting the data warehouse or its tools, in the cloud, although occasionally we see those challenges be overridden. The data warehouse can be the “lowest common denominator” approach to storing data, which is not bad at all for the mid-specification analytic workload.
I expect data warehouses to see evolutionary change, but new applications and those who want specific features may just source their data from the data warehouse – and in the process, get “clean” data – or source the data from original source. And there are many of these “marts” being built today. Hence, we continue to see a tremendous expansion of platform features in databases today as marts go searching for their best-fit platform.
This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.