Big Data Date Night was interesting. It was easy for me to attend since my office is at the Microsoft Silicon Valley Campus where the event was held. It was worth the time and worth the 100 foot walk.
The keynote was given my Bruno Aziza of SiSense. It was a very high-level talk but I think that's appropriate for an intro to a broad audience. He shared on interesting online data salary demo where you can learn about how much you can make as a data scientist. (I guess it's much more than if you declare your title to be data analyst.
One interesting thing I noted from that tool is that the team size doesn't vary much with company size. 4-5 and 6-10 seem to be popular Big Data team sizes across the board:
(Source: SiSense Maestro Demo, http://bit.ly/datasalary)
The "lightning" presentations (10 minutes each) that followed the keynote were pretty good, too. John De Goes of Precog made a good case for something called "Smart Data" which is what happens when machine generated data hooks up with machine learning. The main thing that I got out of that talk is that today's Big Data is not just a bigger version of small data. It used to be (say 30 years ago) that most data was human-generated, structured, "entity" data. But the majority of data today is machine-generated data, that is often more like "event data". Now, the machine-generated data may describe human activities, such as social network updates, but the volume is only possible because software is watching all of that human activity. Also, he quoted that BI is an industry with 8.3% annual growth, whereas machine learning, one of my perennial interests, is growing at 167%. Granted true machine learning is a much smaller market, but that growth rate is encouraging.
The PayPal lightning presentation by Moisés Nascimento did not disclose much detail about how PayPal is using Big Data, and so for me it was a little disappointing. I have to admit that in my notes I only wrote "uses Hadoop."
Michael Hollenbeck of Predixion Software gave an interesting talk on how Big Data technique can help in the Health Care industry. He was very apologetic about mentioning Obamacare, but noted that one of the side-effects of that that's good for the Big Data industry is that more data is available to analyze now. He gave an example of how Big Data analysis can be used to plan "data driven interventions", i.e., recommending preventative procedures tests for those who meet criteria of being at the highest risk.
Kamal Hathi of Microsoft gave a presentation that focus on how it's important for data analysis to be fun. Showing data analysis of tweets about movies using Excel, he made the point that a lot of insights only come with you have something interactive and dynamic to play with, so batch analysis can't be the be-all and end-all of BI. This is certainly in line with Microsoft's them of "self-service BI." I definitely agree that's it's critical to make things easy.
Finally, the panel:
Ken Rudin - Head of Analytics - FacebookChris Pouliot - Director, Algorithms & Analytics - NetflixFedor Dzegilenko - Director of Analytics - SurveyMonkeyIsaac Buahnick - Business Analyst - Wix Ofer Mendelevitch - Director, Data Sciences - HortonworksModerator: Dave Feinleib, Forbes Contributor, Founder of Big Data Landscape
It was an interesting panel discussion and well moderated by Dave FeinLeib. Probably the funniest part for me was seeing Ofer and Ken going at each other on how useful Hadoop is. Ken mentioned that Facebook started out trying to do almost everything with Hadoop, but that just doesn't work and he seemed almost angry that that approached seem to be an industry trend. Indeed, what Facebook does now is move a lot of that analysis back to traditional relational databases once the questions that need to be asked are well understood, simply because it's so much faster. Ken and Ofer also disagreed on whether Hive queries will ever be real-time. (Ofer used the term "right-time", which I guess means slow, but not too slow for the purposes.) Ken mentioned that Hadoop is a technology that makes "everything possible, but nothing easy" and multiple times cited the need for tolerating pain (namely, the pain of Hadoop) as a key skill required for the data scientist. Isaac of Wix was kind enough to fly in from Israel to participate in the panel. I found it interesting to hear from a smaller company on how they use Big Data to improve customer experience, but many of his comments sounded a little too much like an advertisement for Wix to be palatable in the context of a panel discussion.
I really liked Ken's description of Facebook's "data camp" where new employees learn about how data is tracked and how ask interesting questions of the huge amount of social networking data Facebook possesses.
All of the panel experts scoffed at the idea that there should be fear of storing data in the cloud, although Ofer mentioned that enterprises are more conservative about that than startups, since for startups the cloud is often more cost-effective to get things going. Chris shared that Netflix moved to AWS after experiencing some outages when they had a single on-premise data center, and credited the Netflix CEO for having the courage to try the cloud rather than just building a redundant data center.
I hope there are more events like this and I'd also like to see other industries represented.