-
Notifications
You must be signed in to change notification settings - Fork 8
Overviews of Analytics Data Science at Companies
Databricks is a spin-off from the Apache Spark development team at UC Berkeley's famous AMPlab. Apache Spark is a leading solution for large-scale distributed in-memory computing.
Data Science at Databricks comprises of two main teams:
-
MLlib team, consisting of hardcore PhDs and software engineers in charge of developing the Machine Learning module for Apache Spark; and
-
(Product) Data Science team, which focuses on mining and understanding patterns in how users use the Databricks server clusters to conduct their own analytics work; this team also acts as the first testers and feedback-providers of the product developed by the first team; this team does more of the traditional Product Analytics work than it does predictive Machine Learning.
Facebook's Data Science setup involves two main groups:
-
R&D Machine Learning: this group consists of hardcore PhDs and academic researchers involved in long-term Machine Learning and Data Mining R&D projects, e.g. Deep Learning for Computer Vision; it also builds sophisticated predictive models used in the operation of Facebook, one example of which is a an accurate model to estimate a user's age based on the user's interactions with content and other users, even if this user does not declare her/his age or lies about it;
-
Product Analytics / Data Science: embedded into product management groups, this team is the more day-to-day operational side of Data Science at Facebook; it builds data analytics pipelines to measure various metrics and produce intuitive visualizations of them to help monitor the health of Facebook's various product lines, e.g. Messenger, News Feed, Video Sharing, etc.; this group does not do much predictive Machine Learning, instead "sub-contracting" that work to the PhD team to a large extent.
GitHub's Data Science team is currently compact and focuses on mining and understanding usage patterns to find new growth opportunities.
Uber's Data Science involves many different specializations, with new areas being created frequently to tackle emerging business and operational challenges. Some key teams are:
-
Economics / Econometrics / Marketplace Optimization / Dynamic Pricing: this team has strong Econometrics and Discrete Optimization flavors and focuses on the following challenges:
- setting the right price levels at different localities and different times in order to achieve an optimal market outcome, e.g. maximizing the conversion of gross Demand and gross Supply to market-clearing effective Demand and effective Supply;
- optimal dispatching / routing of Drivers;
- forecast market demand at various localities and at different times of the day;
-
Anomaly Detection / Intelligent Decision Systems: this team detects anomalies in real-time Uber market data and Uber IT platform data, to drive intelligent and timely responses to such anomalies;
-
Product Analytics: this teams measures various metrics / KPIs from the Rider app and the Driver app to improve user engagement and provide inputs to the Product Management team;
-
Machine Learning Platform: this team is currently small, due to Uber's business problem being quite clear-cut and lends itself more appropriately to a regression / econometrics approach; however, with fast growth in Uber's scale and complexity, it won't be surprising to see a big ML team emerging at Uber.