An Environment After Hadoop
I recently spoke to a company about their use of Hadoop. This blog is about what the environment looks like post-Hadoop. We can all, even SMBs, learn from their successful response to their situation and adjust the limitations we may have placed on Hadoop.
The relational data warehouse remains important. It’s just not expected to punch above its weight anymore into processing requirements at the volume and complexity of continual Retention Processing for all customers. The widely shared nature of the data warehouse was also prohibitive to optimizing it for high-performance applications.
The Retention Processing Hadoop cluster sources most of its data from the data warehouse. It may be inferred from this fact that all the data once was structured (and is not considered not structured (unstructured/semi-structured) in Hadoop). This also flies in the face of the common wisdom that Hadoop is best only for inbound data that is unstructured or semi-structured.
The data scientists then do their model building in R on Hadoop and programmers develop applications using Pig. Predictive model development has gone from months to weeks as a result of the move to Hadoop, translating to millions of dollars per year in productivity enhancements. The ability to iterate quickly and do AB modeling in a Hadoop environment is also making the models “uncannily accurate”.
In addition to the tens of millions of dollars in business gains, the team can credit a multi-million dollar annual total cost of ownership savings to Hadoop. They now spend one-tenth the cost to do a much higher level of processing power in Hadoop. They have had several months straight of 100% data integration completion, something inconceivable in the legacy environment.
Interestingly, the team also calls the Hadoop environment a “simpler environment”. This is not a descriptor many think of for a Hadoop environment, versus a legacy relational database one, but it has been their experience.
The Hadoop team provided the top three keys to their success:
- Don’t underestimate the organizational change management required to achieve the necessary internal support.
- Find the right first use case – nothing overly complex to begin; their first cluster was Apache Hadoop on hardware ready to be thrown out.
- Get a partner unless Hadoop is your core competency as a company – the Hadoop world is changing rapidly.
Retention Processing continues to be enhanced from the backlog of requirements one year ago as well as new requirements that form every day based upon the program execution. When development slows down remains to be seen, but the development activity, and its returns, have not slowed down since moving to Hadoop.
The team readily admits Hadoop is still not fully mature from the standpoint of nonfunctional requirements. Security and governance in particular require more work to be successful than the legacy environment. Fortunately, vendors are making strides to harden Hadoop and make it enterprise ready when it comes to security and governance.
They have benefited dramatically by gaining a thorough understanding of how each customer will act and react over time. They can better determine the products and services it needs to deliver to its customers. The data collection process continues to accelerate and user adoption of Hadoop is on the way to becoming widespread. Now, with a trusted source of information that performs at the speed of its business, they have gained new insights into Retention Processing and other key business functions. This ability to recognize opportunities is critical to the company’s strategic vision and continued success.
Everyone there is beginning to understand what its internal champions have done with Hadoop to ensure customer retention and the future of the company.
Disclaimer: This post was brought to you by IBM for Midsize Business and opinions are my own. To read more on this topic, visit IBM’s Midsize Insider. Dedicated to providing businesses with expertise, solutions and tools that are specific to small and midsized companies, the Midsize Business program provides businesses with the materials and knowledge they need to become engines of a smarter planet.