In last five years or so we have seen emergence of a whole new way to analyses data. Latest Gartner report indicates that 85% of data generated in an enterprise today is of new data types, those that we aren't used to storing in traditional data warehouse systems. We have almost traversed half of the hype curve. We have seen the peak of excitement, seen the trough of illusion and we seem to be now traversing the slope of enlightenment!! It will take about two more years before we enter the plateau of stabilization.
The next generation of Big Data will focus on yet another set of challenges. Challenges of near real-time analytics, data quality and packaged analytical solutions (both predictive and descriptive) for business processes like CRM, Sales and Distribution, Marketing, etc.
Data is growing at an exponential rate, and the ability to analyze it faster is more important than ever. Microsoft StreamInsight has been there for quite some time now. Almost every big data vendor is coming out with product offerings, like in-memory processing to process data faster. Hadoop also launched its new release, Hadoop 2.0 / YARN, which can process data in near real-time. Another big data technology gaining traction is Apache Spark, which can run 100 times faster than Hadoop.
2. Data Quality
Data quality has never been as important as it is now with data growing at an exponential rate. The speed at which decisions are made has already reached a point where the human brain can’t keep up. This means that based on defined rules, data is cleansed and processed and decisions are made, all without any human intervention. In such environments, a single stream of bad data can act as a virus and result in incorrect decisions or heavy financial loss. A good example is the world of algorithmic trading, where trades are placed every few milliseconds by analyzing stock market trends using algorithms versus a human.
Data quality has become a key part of service level agreements (SLAs) in evolving digital enterprises. Bad quality data can result in blacklisting the data provider/supplier or severe financial penalties. B2B environments are the early adopters as they rely heavily on the quality of data to ensure smooth business operations. Some enterprises are moving in the direction of deploying real-time alerts for data quality issues. The alerts can be sent to the designated person based on the issue and can also suggest recommendations on how to fix the issue.
Machine learning is another technique that is being used to improve data quality. It has made it easier to conduct pattern analysis to identify new data quality issues. Machine learning systems can be deployed in a closed loop environment where the data quality rules are refined as new quality issues are identified via pattern analysis and other techniques.
3. Big Data Applications
Big data has created so much excitement that everyone wants to use it, but the technical challenges prevent greater adoption. Applications help overcome this challenge by making it easy for everyone to benefit from big data. Over the next few years we will see thousands of specialized applications launched for various industry verticals to solve big data challenges. In the near future, even a small business will be able to benefit from analyzing big data without requiring special infrastructure or hiring data scientists.
These applications will correlate customer data from multiple channels to have a better understanding of customers and hence will make more money for businesses by targeting the right customers with the right products. And, hopefully, some of these applications will make our lives better by having personalized applications for health care, diet / food, entertainment, etc.