Interview With : Search & Data Mining Engineer Andrea Burbank, and Data Scientist Mohammad Shahangian
Updated on: 18 Dec 2013
What Pinterest's data scientists look for and how they help their peers
While free online courses are helping many people learn about data science, card-carrying data scientists are still somewhat scarce at many companies.
Not at Pinterest. The social bookmarking and fashion-sharing phenomenon has 13 people working primarily on data science, engineering, and analysis. And it has room for more. The company wants to make better decisions and design better products with the help of all the data it collects.
There's a lot of mystique surrounding data scientists ? the sexiest job of the 21st century, according to the Harvard Business Review. But what do they actually do, what do they care about, and how do they interact with other employees?
To get answers, you could look at a job description for a data scientist at Pinterest. According to one published this month, the right person for the job will "uncover business and product opportunities by efficient and actionable analysis; work with teams across all functions (growth, recommendations, spam, partnerships) to provide data expertise; build predictive models for pinner behavior and interests" and so on.
Or you could just go directly to the data scientists and ask them questions, which is what we did.
Andrea Burbank, officially called a search and data mining engineer; and Mohammad Shahangian, a data science and infrastructure engineer, responded to questions that we sent them over email. They identified critical metrics, pointed to product decisions informed by data, and explained how they help everyone else at the company analyze data.
Here's an edited transcript of the interview.
VentureBeat: Talk about specific data science projects you've worked on and how the analysis led to changes to the product.
Andrea Burbank: The range of projects here is very broad ? everything from how mobile usage affects user engagement to what behavior of new users is likely to lead to long-term usage of Pinterest, and how the website redesign influenced user behavior. For each of these, we used the results almost immediately: in designing improved user orientation flows for new users, adding features to the redesigned website to encourage behaviors that had fallen off, and providing easy links to the mobile apps in email and from the website.
VentureBeat: How do you define what's good or successful? Which metrics do you track?
Mohammad Shahangian: The reality is that our definition of "good" has to evolve with what our users are telling us through their behavior. Typically, we look for leading indicators that lead to users coming back to Pinterest.
Burbank: The key metrics for us are around user engagement. Are users returning to the site frequently? Do they perform key actions, like clicking on pins to take action on them or saving the pins to their boards, when they visit? Are the new users signing up today becoming long-term engaged users?
VentureBeat: Are data science efforts at Pinterest now any different from where they were a couple of years ago?
Shahangian: Yes! Pinterest went through a lot of growth before we had a data team. When I joined less than two years ago, I spent my first week just trying to answer the question, "How many users do we have?" Once we got the infrastructure in place to handle billions of objects, the analysis we did evolved into a critical input for key decisions across the team.
Burbank: We've added infrastructure to answer more questions more quickly so we can rapidly prototype new ideas. We run A/B experiments around most features we launch and look at our metrics daily to see how things are going.
VentureBeat: Do you let those who aren't data scientists analyze data on their own? And if so, what tools do you use?
Shahangian: Yes. One of the goals of the data team is to make big data feel small. We want everyone at the company to be able to perform their own analysis without having to think about scale. Tools like [Amazon Web Services'] Redshift [data warehousing service] and Qubole [for doing Hadoop jobs in the cloud] have been huge wins because they've enabled non-engineering roles such as community specialists and product managers to answer important questions on their own.
VentureBeat: How do you scale analysis as the company grows?
Shahangian: We believe that anyone can perform analysis with some guidance, and that's really the best way to scale analysis. With each incoming request, we share the queries we used to perform each new analysis and build new tools that make that analysis easy to reproduce.
Burbank: We have a huge amount of data and a lot more questions than we have time to answer. We try to balance building tools that help everyone at the company answer their questions quickly and doing deeper analysis that may be less widely used but which can really help inform a particular product decision. We also scale by hiring ? we're always on the lookout for folks who are adept with using data.
VentureBeat: How does your work impact user experience?
Burbank: We collaborate with all the teams here to design experiments around almost every user-facing feature. For example, we've looked at which sections of our recommendations email are clicked most frequently and used that to redesign the emails and to prioritize the company's work on personalized recommendations. We use models predicting user engagement to design new user orientation flows. We look at where traffic is coming from to decide which websites to reach out to about adding the Pin It button. It's not an exaggeration to say that pretty much any part of the product you look at has had significant data science behind it.