Big Data and Analytics Summit 2014: Continuously Curate Information
Inhi Cho, the VP/GM of Big Data, Integration, & Governance at IBM, spoke on the topic of “Continuously Curate Information – Realize the Full Value of Data” at the IBM Big Data and Analytics Summit last month.
Cho said the “hyper-changing world” necessitates a new way of thinking about business. “The competitors you have to get ahead of are ones you may not be thinking about.” This is the same across many industries. Once a client, consumer or business knows about a particular experience, they will expect it in other areas. Your competition is actually the “last best experience” the customer had.
Cho mentioned that it’s instinct, rehearsal, practice and training that drives the many actions we make throughout the day. IBM portfolio expansion is based on how to capture confidence and context quickly.
IBM is releasing Big SQL 3.0 with the ability to leverage ANSI SQL to Hadoop. Big SQL is IBM’s entrant in the SQL-on-Hadoop competition. The difference between SQL on Hadoop and Hadoop Connectors that you may have heard about as well, is that SQL-on-Hadoop does its processing on HDFS or HBase instead of moving the data to a relational database first.
Big SQL takes advantage of the parallelism in Hadoop (HDFS) and helps address the skills gap with Hadoop, which is often a barrier for SMBs or less mature large companies that have not yet fully invested in a Hadoop skillset.
Another new emphasis is around incremental data. Historically, most decision-making was done in batch. With batch, we needed a sufficient amount of data to make a prediction or rules. However, with incremental data, and the right infrastructure, one data point could change all prior understanding of that data – what historically only a large batch would yield. All data builds context. This, Cho said, is “knowing the entity”. This is a similar concept to what I have architecting for clients – an infrastructure that brings historical summarized data to bear on real-time data, while allowing the real-time data to also blend into the historical data for the next interaction.
For example, a small bank need to analyze whether to allow for a large withdrawal by a customer. The customer’s unique characteristics, built up over many historical transactions, could be brought into the analysis, in summary fashion, perhaps in the form of a score or a withdrawal limit. After the transaction is completed, it goes immediately into the customer analytics and the score is updated for the next withdrawal. For example, a large withdrawal now may limit immediate another one in 10 minutes.
With an organization, each worker should be able to understand quickly what’s available, what can be used and how to access it for business gain. After this “shopping” for the data, movement needs to happen in real-time. It’s not just traditional staging. It’s “information virtualization” according to Cho.
This curation approach will vary based on the nature of the data, what action is desired and the timeliness of the action.
Cho then talked about the concept of the data lake, which she had recently discussed with leaders at a large financial services company. Cho said IBM thinks about it as multiple data stores, not just Hadoop, and they prefer “data reservoir” because reservoirs are managed and controlled whereas lakes are not.
By continuously curating information, IBM wants clients to be able to leverage all data, anywhere, unconstrained and with governance with SQL-on-Hadoop, a concept very relevant to SMB organizations looking to take advantage of big data.
This post was brought to you by IBM for Midsize Business and opinions are my own. To read more on this topic, visit IBM’s Midsize Insider. Dedicated to providing businesses with expertise, solutions and tools that are specific to small and midsized companies, the Midsize Business program provides businesses with the materials and knowledge they need to become engines of a smarter planet.