From Early Database Models to Today
Computerized data storage was first organized into files, considered “flat” since they lacked structure, but at least they could be navigated by the mainframe computer.
The earliest structure for the data was a hierarchical model. One of the most valued functions of any data storage system is keeping track of relationships in the data. As its name implies, the hierarchical model supported tracking those relationships. By arranging data into “trees” of information, users could navigate from a parent record to all the child records.
The network model came along and allowed for a reverse navigation of the tree, which added more capabilities. Nevertheless these models, implemented on early mainframes, saw limited application – in large applications at large companies who could afford the machines and dedicate the staff. These models did, however, pave the way for the gorilla in the room of information management over the last 25 years – the relational model.
The relational model made relationships more dynamic and data much more accessible. Based on papers published over 40 years ago by E. F. Codd of IBM, the relational model introduced the “relation”, which is implemented in the form of a table.
The table, and specifically the integrity of reference between tables (i.e., such that only customers with IDs in the customer table are allocated to a sale), has served as the basis of most sales, accounting, controlling, asset management, materials management, production planning and reporting systems ever since.
The table, accessed by SQL, is stored in data pages, which took the concept of division of the data into flat files to much greater specificity by imposing a repeating structure on the file. This pattern is the data page, which is sized by an administrator.
The data page is the heart of the relational model. The page is filled with records, containing a value for every column of the table, in a fixed order which is usually the order of definition. The page, importantly, contains pointers to where the records begin, allowing for easy navigation of the database management system (DBMS) to any record.
An index, also a physical structure, is often the first stop for the DBMS when a subset of the table records are requested in a query. The index contains a subset of a record – often a single column, navigational structure and references where “the rest of the record” can be found in the data pages by using the aforementioned pointers.
Together the data pages and the indexes allowed the sales, accounting, and other, systems to flourish. This solid foundation is the catalyst that has extended the use of information management systems and dramatically pushed the envelope of what is core business and what information is necessary for a business to survive.
At the same time, the NoSQL and Hadoop movements have revitalized some of the vestiges of the hierarchical and network eras. These are flat file structures without the notion of data pages. Graph databases, with their focus on relationships, epitomize this the most. What’s old is new again.
This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies or opinions.