For this blog, I thought I would address some of the main questions I am answering these days for companies who are investigating NoSQL solutions, like those from Cloudant, an IBM Company.
Many will say that NoSQL is a Valley phenomenon and an SMB company does not have a need to get out of the SQL comfort zone. Not so fast. NoSQL deployments are mostly for net new applications in the business – exactly those applications that are have big data needs and big data does not always equate to big company.
There are many other aspects of NoSQL that define its best workload other than cheaply storing a large amount of data, but that is one. Achieving high availability cheaply would be another important consideration. Finally, data model flexibility is another good indicator that NoSQL is a correct categorization for the workload.
Now on to the questions.
Are your clients adopting NoSQL? If so, why?
Absolutely. Applications requiring interactive processing have undergone rapid change. Consequently so have their storage requirements. Megatrends like big data, data science and cloud computing are also driving the adoption of NoSQL. It’s an increasingly viable alternative to relational databases for the scale of operations that many companies, large and small, work at today. Clusters of commodity servers and a schema-free data model is also often best for the type of data being used today.
Are all NoSQL databases eventually consistent?
None are never consistent. Some NoSQL databases provide client applications with a guarantee of “eventual consistency”. Others are eventually consistent depending on their configurations.
Just as we have thrown off barriers in the past (i.e., the four-minute mile), the CAP theorem stating you can only have two out of three of consistency, availability and partition-tolerance will also need to get thrown by the wayside. New solutions will be adopting ACID transactions in addition to the shared-nothing, fault-tolerant aspects of current NoSQL solutions.
How do I know if a NoSQL database can support many concurrent writes?
This is a tough question. I’ve found vendor claims to be very difficult to achieve in real practice. Ultimately, you have to do a proof-of-concept to be sure the solution supports the concurrent writes you will need.
You can also put a queue in front of the database to manage the contention and drive the queued operations to a single thread for execution.
How do I know if a NoSQL database has a single point of failure?
None of the NoSQL solutions we’ve investigated for this aspect has a single point of failure. They are developed to run on commodity nodes, which do fail but are cheap to replace and there is a straightforward process for doing so. The distributed architectures ensure there is no single point of failure and there is built-in redundancy for function and data.
When a node fails, there is redundant data available to ensure there is no data loss and getting back to full redundancy is achieved quickly. This continuous availability is true for on-premise, cloud and multi-data center setups.
How does asynchronous replication affect consistency?
Asynchronous replication is primary to secondary replication. A single server is chosen as the primary, which can accept writes. It relays its state changes to secondary servers, which repeat the primary’s function. The database doesn’t have to wait for a write to be replicated before getting a response from the primary node. Writes eventually arrive on all secondary nodes.
It is possible with asynchronous replication to read stale data. If you read from a secondary node then write to the primary, you could overwrite writes that have not yet been replicated. It’s a tight window of opportunity, but it could happen.
Synchronous replication models may temporarily freeze out node activity while ensuring an integrant write operation to all nodes.
Why would I want to rebalance the data instead of letting it happen automatically?
In the early days of NoSQL, I lost some data after adding new nodes so this process must be tested to regain (in my case) a comfort with the inevitable task of adding nodes to the cluster.
You may want to take control of the timing of the rebalance. Some prefer to do this during off-peak hours. The rebalance process is automatic in so much as you simply click a button, but adding a node will not force rebalancing to take place as it can have a performance impact.
This post was brought to you by IBM for Midsize Business and opinions are my own. To read more on this topic, visit IBM’s Midsize Insider. Dedicated to providing businesses with expertise, solutions and tools that are specific to small and midsized companies, the Midsize Business program provides businesses with the materials and knowledge they need to become engines of a smarter planet.