Relational Databases Advance
Big data, its emerging use cases and the underlying technologies including Hadoop are making a solid pitch for the 2014 information management budget dollar, as they should. It would, however, be easy to believe big data was the exclusive area of investment on the part of platform vendors over the past 5-10 years. It would be easy to believe relational database technologies have been neglected – that vendors have all but conceded the emergence of the next wave of data platform to the schema-less, scale-out, eventual ACID world. These would, however, be incorrect views.
While the relational model has seen its limitations addressed at the high end of data size by distributed file systems, they continue to dominate production data systems and developmental data systems. It’s about best-fit and both models are moving forward and have a large upside due to the overall interest in what they contain – data.
There were a few years there where clients were considering Hadoop or NoSQL for the lower end of what they were useful for and where they provided cost advantage. That lower end has been pushed up in the past couple of years. With the added capabilities to the relational database ecosystem, it makes it harder to make the jump for the workload that could go either way.
Most of the time, to meet the demands, analytic workloads belong in analytic databases. Transactional databases have added several features that make them analytic. These include in-database analytics, scale out, a columnar orientation and a high use of memory.
Columnar orientation optimizes those queries that access a small percentage of the overall bytes in the tables. I’ve found there can be quite a large percentage of queries in an analytic environment meeting this condition. However, you have to choose how to group columns at table definition time. Choosing nothing is equivalent to row orientation. Usually you can group columns or isolate columns in storage.
The analytic workload – and I’m throwing the data warehouse in there – is going to be almost always optimized with columnar. However, some things will regress and that fear is keeping many from making the step forward to columnar except in new, isolated workloads where the performance expectations across the board are not set yet.
In-memory databases are about performance advantages and about where the data is stored (and how it is persisted). Some HDD/SSD-based DBMS have added this capability while others avoid discriminating the data and put it all in memory.
A huge performance increase is still found even though the optimizers are not yet mature enough to fully exploit this resource. As a result of few queries, if anything (except the budget) being sub-optimized by in-memory, many existing applications are being ported in addition to in-memory being selected for new workloads. When the cost of processing is used instead of the cost of storage, it is found that many analytic workloads belong in in-memory.
While you may have your analytic workload in a database once highly optimized for transactions, without turning on these analytic features, it’s still a transactional database. You may find there’s more gas in the tank of the relational database once you exploit the analytic features.
This post was written as part of the IBM for Midsize Businessprogram, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.