Storage Alternatives and In-Memory Systems
Organizations of all sizes that are still doing only simple reporting with their data are leaving a lot of value on the table. Typically, these organizations are still using as much of the lowest cost hard disk (HDD) storage as possible. Analytics and/or big data volumes will weigh down these systems. This strategy is increasingly only effective for a minority of cases.
The burning question for any DBMS is the allocation and juxtaposition of storage across HDD, flash-based solid state drives (SSD), and memory, as well as any associated strategies for dynamically moving data across storage types.
Deciding among HDD, SSD, and Memory comes down to cost versus performance: Dollars per GB vs. Pennies per GB for vastly different read rate performance. Other factors to consider include durability, start-up performance, write performance, CPU utilization, sound, heat and encryption.
Figure 1. From Stanford Study “The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM”.
Companies that don’t have the systems to support an analytical workload (high data volume, complex data model, varied traversal patterns, multi-step operations, occasional interim results), often don’t pursue an analytic workload as a company – and leave value on the table. Often this is because workload and query demands increase modestly in the organization and neither IT nor the application area nor the user area sees their workload as needing to add the complexity to the organization of a new platform. Companies should make themselves aware of gradual change lest they suffer eventual undesirable consequences.
The future of processing is in-memory processing, whether for an operational or an analytic system or the new breed of operational system requiring analytic data.
Storing a whole operational or analytic database in RAM as the primary persistence layer is possible. With an increasing number of cores (multi-core CPUs) becoming standard, CPUs are able to process increased data volumes in parallel. Main memory is no longer a limited resource. These systems recognize this and fully exploit main memory. Caches and layers are eliminated because the entire physical database is sitting on the motherboard and is therefore in memory all the time. Input-output (I/Os) are eliminated. And it’s been proven to be nearly linearly scalable.
In-memory (not technically “storage” actually) is directly addressable by a processor via a memory bus. There are two types of solid-state memory:
- Processor cache(s) – very fast, volatile data
- Dynamic RAM (DRAM) – fast (nanoseconds), volatile data
Fault tolerance is necessary since memory can be flushed and its data lost without a persistent second copy. Vendors use one of these strategies to ensure fault tolerance with in-memory DBMS:
- Disk-based persistence
- Periodic snapshots
- Multiple copies in memory
- Using non-volatile RAM
You also need a DBMS engineered for in-memory data. Simply putting a traditional database in RAM has been shown to dramatically underperform an in-memory database system, especially in the area of writes.
Memory is becoming the “new disk.” For cost of business (cost per megabyte retrieved per time measure), there is no comparison to other forms of data storage. The ability to achieve orders of magnitude improvement in transactional speed or value-added quality is a requirement for systems scaling to the future. HDD may find its rightful spot as archive, backup, and NoSQL storage.
This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.