Large-scale machine learning at twitter

Full Paper: Search Google Scholar
 
The success of data-driven solutions to difficult problems, along with the dropping costs of storing and processing massive amounts of data, has led to growing interest in large-scale machine learning. This paper presents a case study of Twitter's integration of machine learning tools into its existing Hadoop-based, Pig-centric analytics platform. We begin with an overview of this platform, which handles "traditional" data warehousing and business intelligence tasks for the organization. The core of this work lies in recent Pig extensions to provide predictive analytics capabilities that incorporate machine learning, focused specifically on supervised classification. In particular, we have identified stochastic gradient descent techniques for online learning and ensemble methods as being highly amenable to scaling out to large amounts of data.
 
Suggested Reading
v

Suggest a relevant paper:
Title *
Authors Pub year

 
 

Discussion


Post anonymously: (You can change the anonymity of a comment at any time.)

You must be logged in to comment. Log in


No comments yet.