As described in a previous post, an Analytical Enterprise is one that leverages data to its advantage. This includes the hard-to-tame “Big Data”, which is too awkward to be managed by traditional means.
To support this mining of Big Data, a slew of new technologies has arisen. The most talked about is Hadoop. But what exactly is Hadoop? How should enterprises leverage it? And how should they not..?
A Schema-less World
Relational databases are at the heart of every Line of Business application today, and most web ones too. For good reason: they have tightly defined schemas, provide data consistency and support a well-defined query framework.
That said the technology can struggle with the variety and size of Big Data. The resources and effort required to store this type of information in a relational model can be significant. At the same time, databases are organized for anticipated queries, whereas businesses really need the potential to ask any question they want.
How Does Hadoop Help?
Historically business-critical data was well organized and the needs predictable. Today’s world is all about uncertainty. A fresh approach to data management is required, and Hadoop helps fulfil this need.
Hadoop at its simplest is a combination of a distributed file-system and a framework for Map/Reduce algorithms. Map/Reduce provides a means to perform analysis without moving the data (too cumbersome), and without requiring a predefined schema (not possible). The file-system stores large amounts of unstructured data in a cost effective, fault tolerant and scalable away.
Simply put, you can:
What does this mean for Financial Services?
Examples of industry Big Data include raw tick data from financial exchanges, server logs from e-banking websites, and the output of operational and trading applications. In these cases, data is generated at high velocity, often in a raw yet-to-be-structured format.
Big Data tools can therefore be applied to solve some critical industry problems:
These scenarios require the merging of varied data sources, and the ability to ask complex and detailed questions in an efficient way. An integrated data platform is therefore essential, and along with traditional relational databases, Massively Parallel Processing (MPP) appliances, analysis engines, and BI and visualisation tools, Hadoop has an important role to play.
Where does Microsoft fit in?
As part of the complete platform, Microsoft has optimised a distribution of the open source Hadoop project to shine on Windows Azure and Windows Server. In addition Hadoop has been made enterprise ready by integrating it with Microsoft’s systems management, database and BI platforms, as well as providing simplified tools to write and execute Map/Reduce jobs.
How Should Enterprises use Hadoop?
Some firms use Hadoop to test new forms of analysis on data before deciding whether to store it in their data warehouse. Others with heavy and frequent BI needs may leverage Hadoop for ETL (Extract + Transform + Load) workloads to process complex raw data before ingestion into a relational database or directly into a BI cube. Those whose business-needs require lots of ad-hoc analysis may use Excel as their front-end to directly query data stored in Hadoop.
Beware: Hadoop is not the hammer to make everything a nail. Queries are batch driven, not real time. It is not designed to be optimized for specific scenarios. There is no transactional model.
By combining Hadoop with other data technologies, firms can build a complete Analytical Enterprise. Leveraging internal and external Big Data to its fullest, they can gain a clear and accurate view of their world, make better decisions and leave the competition behind as a result.