Many of the resources here are coming from two sites. For external references, it mainly comes from infoq.com . InfoQ hold many QCON conferences in Beijing, New York, London, San Francisco which attracted presenter and attendee from hottest IT company. You can quickly look at New York 2013 Videos at here or search for QCON in www.infoq.com for more resources.

How other companies running their service

 How Netflix running service

Carl Quinn presents the build and deployment architecture used by Netflix in order to provide content out of Amazon AWS.

Ariel Tseitlin discusses Netflix' suite of tools, collectively called the Simian Army, used to improve resiliency and maintain the cloud environment. The tools simulate failure in order to see how the system reacts to it.

Xavier Amatriain discusses the machine learning algorithms and architecture behind Netflix' recommender systems, offline experiments and online A/B testing.

Dianne Marsh presents the open source tools used by Netflix to keep the continuous delivery wheels spinning.

Nov 18, 2013 ... Jeff Magnusson takes a deep dive into key services of Netflix's “data platform as a service” architecture, including RESTful services that: provide ...

Nov 29, 2013 ... Jeremy Edberg discusses how Netflix designs their systems and deployment processes to help the service survive both catastrophic events like ...

Oct 22, 2013 ... Joe Sondow presents how Netflix uses Asgard to deploy code updates and manage resources in the Amazon cloud.

 

How LinkedIn running service

Jay Kreps discusses the evolution of LinkedIn's architecture and lessons learned scaling from a monolithic application to a distributed set of services, from one database to distributed data stores.

Sid Anand presents the architecture set in place at LinkedIn and the data infrastructure running Java and Scala apps on top of Oracle, Voldemort, DataBus and Kafka.

Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap.

 

How Facebook shipping codes

More than one billion users log in to Facebook at least once a month to connect and share content with each other. Among other activities, these users upload over 2.5 billion content items every day. In this article we describe the development and deployment of the software that supports all this activity, focusing on the site's primary codebase for the Web front-end.

Chuck Rossi unveils some of the tools and processes used by Facebook for pushing new updates every day.

Nick Schrock presents how Facebook’s code evolved over time, explaining some new constructs – fbobjects, Preparables, Ents - introduced to address the complexities of a large social graph.

Ashish Thusoo presents the data scalability issues at Facebook and the data architecture evolution from EDW to Hadoop to Puma.

Serkan Piantino discusses news feeds at Facebook: the basics, infrastructure used, how feed data is stored, and Centrifuge – a storage solution.

 

How Twitter monitoring services

Nathan Marz shares lessons learned building Storm, an open-source, distributed, real-time computation system.

Nathan Marz introduces Twitter Storm, outlining its architecture and use cases, and takes a look at future features to be made available.

Arun Kejariwal, from Twitter, talked at Velocity Conf London last month about forecasting algorithms used at Twitter to proactively predict system resource needs as well as business metrics such as number of users or tweets. Given the dynamic nature of their data stream, they found that a refined ARIMA model works well once data is cleansed, including removal of outliers.

Jeremy Cloud discusses SOA at Twitter, approaches taken for maintaining high levels of concurrency, and briefly touches on some functional design patterns used to manage code complexity.

Innovation at Google

John Penix describes the test automation system and the supporting build system infrastructure used by Google.

Patrick Copeland presents the first three principles of the eXtreme innovation approach based on the Pretotyping Manifesto: Innovators Beat Ideas, Pretotypes Beat Productypes, and Data Beats Opinion.

A retrospective on Google's first Scrum implementation. Jeff Sutherland visited Google to do an analysis of the first Google implementation of Scrum on one of their largest distributed projects. Their strategy for inserting Scrum step by step into the Google engineering teams showed great insight and provides helpful lessons learned for all Agile teams.

 Others

* Scaling Reddit from 1 Million to 1 Billion–Pitfalls and Lessons

Jeremy Edberg shares some of the lessons learned scaling Reddit, advising on pitfalls to avoid.

Mike Krieger discusses Instagram's best and worst infrastructure decisions, building and deploying scalable and extensible services.

 Talks by different topics

 Hadoop

Eli Collins overviews how to build new applications with Hadoop and how to integrate Hadoop with existing applications, providing an update on the state of Hadoop ecosystem, frameworks and APIs.

Bill Yetman and Jeremy Pollack discuss using several Agile techniques -start simple, get going, iterate- and the “measure everything” principle to create the architecture behind the Family History website.

Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.

Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.

Fault Tolerant

John Allspaw discusses fault tolerance, anomaly detection and anticipation patterns helpful to create highly available and resilient systems.

 Continuous Integration and Continuous Delivery

DevOps

Alert, Monitoring, Availability and Metric

References