How Netflix Uses AI, Data Science, and Machine Learning — From A Product Perspective

--

Netflix’s machine learning algorithms are driven by business needs.

The presence of AI in today’s society is becoming more and more ubiquitous— particularly as large companies like Netflix, Amazon, Facebook, Spotify, and many more continually deploy AI-related solutions that directly interact (often behind the scenes) with consumers everyday.

When properly applied to business problems, these AI-related solutions can provide really unique solutions that scale and improve over time, creating significant impact for both business and user. But what does it mean to “properly apply” an AI solution? Does that mean there is a wrong way? From a product perspective, the short answer is yes, and we’ll get to why that is later in this article as we dig deeper.

Overview: First, we will outline 5 use cases of data science or machine learning at Netflix. We’ll then discuss some business needs vs technical considerations a Product Manager would look at. Then we will dive a little deeper into what is perhaps the most interesting of these 5 use cases as we identify what business problem it seeks to solve.

Trending AI Articles:

1. Let’s build a simple Neural Net!

2. Decision Trees in Machine Learning

3. An intuitive introduction to Machine Learning

4. The Balance of Passive vs. Active A.I.

5 Use Cases of AI/Data/Machine Learning at Netflix

  1. Personalization of Movie Recommendations — Users who watch A are likely to watch B. This is perhaps the most well known feature of a Netflix. Netflix uses the watching history of other users with similar tastes to recommend what you may be most interested in watching next so that you stay engaged and continue your monthly subscription for more.
  2. Auto-Generation and Personalization of Thumbnails / Artwork — Using thousands of video frames from an existing movie or show as a starting point for thumbnail generation, Netflix annotates these images then ranks each image in an effort to identify which thumbnails have the highest likelihood of resulting in your click. These calculations are based on what others who are similar to you have clicked on. One finding could be that users who like certain actors / movie genres are more likely to click thumbnails with certain actors/image attributes.
  3. Location Scouting for Movie Production (Pre-Production) — Using data to help decide on where and when best to shoot a movie set — given constraints of scheduling (actor/crew availability), budget(venue, flight/hotel costs), and production scene requirements (day vs night shoot, likelihood of weather event risks in a location). Notice this is more of a data science optimization problem rather than a machine learning model that makes predictions based on past data.
  4. Movie Editing (Post-Production) —Using historical data of when quality control checks have failed in the past (when syncing of subtitles to sound/movements were off in the past) — to predict when a manual check is most beneficial in what could otherwise be a very time-intensive and laborious process.
  5. Streaming Quality — Using past viewing data to predict bandwidth usage to help Netflix decide when to cache regional servers for faster load times during peak (expected) demand.

These 5 use cases / applications of data science or machine learning just in Netflix alone have had such scalable impact that they have forever changed the technology landscape and user experience for millions and more to come. Adoption of these AI-related solutions is only going to get stronger over time.

But before these use cases were as commonplace as they are today and used by users like you and I, someone or some group within Netflix properly connected these AI solutions with a business need. Without this business link, these use cases would simply be pie-in-the-sky ideas sitting at the bottom of a backlog like so many other great ideas. Only through proper positioning and connection with Netflix’s core business problem did these ideas become the reality that they are today.

Netflix uses machine learning to generate many variations of high-probability click-thru image thumbnails that it relentlessly and continuously A/B tests throughout its user base — for each user and each movie — all to increase the probability that you will click and watch.

What is the Business Need/Problem?

Notice in each of the use cases I’ve identified above, each one is associated with a specific business need, goal, or hypothesis.

This is absolutely important for any product manager — to avoid the temptation of the tech enthusiast who marvels in the details of the data science / or ML for intellectual reasons without clearly identifying the problem or business need — potentially using up valuable technical resources with no business impact.

At the end of the day, product managers need to properly connect a business problem to a data machine learning solution. We want to avoid having a solution that’s chasing for a problem, otherwise the project will lose momentum within the company: engineers won’t be clear what their North star is, stakeholders across the organization won’t buy-in and allocate the necessary resources to make the project a success, etc.

Make sure there is a problem to which an AI solution can be directly connected

Machine learning (ML) is a potential AI solution — but we need to first define the problem before prescribing that solution.

What’s the business result we are trying to achieve with ML? Because this core business need is what drives the parameters of the ML models used, what data is collected and processed, etc. We don’t do ML to provide personalization just because it’s interesting tech — we need to link it to a business problem. Data scientists are specialists in uncovering insights from the data, but it is the product manager’s role to properly link it to a business need or problem and compare it with competing priorities.

For example, a tech enthusiast might say:

Wouldn’t it be cool if you could analyze / debate an episode using voice with Netflix — and Netflix, with data input from thousands of other users’ reactions to that episode, could respond intelligently to your comments in a back and forth 2-way dialogue?

Yes, that would be a pretty awesome use case leveraging natural language processing (NLP) to understand your post-episode commentary in context. In addition to NLP, this use case uses text to voice personalities as well as sentiment analysis of how thousands of others felt about what happened in that episode, or how they feel about a certain character. Indeed, this is a beautiful merging of multiple cutting edge technologies in one use case.

If a pilot MVP version of this showed that users who engaged with his new feature stayed longer or came back more often or helped drive more word of mouth about Netflix, then it could warrant further resources. The initial decision to build that MVP would depend on strategic decision made by stakeholders, not necessarily prioritized by metric. That will depend on company strategy.

But as beautiful of a user scenario the above is, what problem does that solve?

How does it relate to Netflix’s main problem of keeping users subscribed every month? If it’s related, what evidence (qualitative or quantitative do we have to support that relationship?

And if this is a legitimate solution to that problem, is there a simpler version of this solution that could equally accomplish that problem but be less technically complex? For example, instead of voice input and voice output, how might the complexity of just text input and text output affect level of effort and impact on user engagement?

What if a conversational AI interface without the voice part (just text) achieved 80% of the intended user engagement but only required 40% of the development effort? Would it be worth considering such an alternative route?

What business impact would such a solution have in comparison to the level of effort? How does this ratio compare with that of other competing tasks in the backlog?

These are all product-focused questions that a PM should be asking in order to align technology solutions with business needs. Because ultimately, it’s the business need that drives the parameters of an ML model, not the other way around.

So let’s look once again at movie recommendations and those personalized thumbnails — what’s the problem or business goal?

Because You Watched…You’ll Love… — What Problem Does Movie Recommendation Help Solve?

Movie Recommendations: Identifying the Problem

Here the problem is that Netflix has a huge collection of content (over 100 million different products, according to Netflix) that is constantly changing and can be overwhelming for a user to consume. Users don’t want to be frustrated in finding content relevant to their interests. So then, what is the best way to allow each user to consume that data in a way that ultimately maximizes subscription loyalty?

Product Goals include:

  • Increase / maintain viewership in terms of # minutes consumed,
  • Increase in # of titles explored, frequency of logging back in
  • Exceeding whichever minimum threshold that the company determines is a success metric
  • Overall increase in monthly subscription loyalty / decrease in subscriber cancellations
Netflix Personalized Thumbnails At Work: 2 Different Users Seeing 2 Different Images for the same Godfather movie: 1 showing a dramatic closeup of a face, the other showing a happy smiling couple.

Personalized Image Thumbnail / Artwork: Identifying the Problem

This use case is a subset of Movie Recommendations. Given that movie recommendations are provided to the user, we now have yet another business / user problem.

Problem: How (and when) do we best present that movie recommendation to the user in a way that maximizes viewership and monthly subscriber loyalty?

Well, one way to provide that recommendation is through an image thumbnail — but what kind of thumbnail do we provide? And how confident are we that tweaking an image thumbnail will affect viewership or subscriber loyalty in a positive way?

And how important is that thumbnail? Do we have data for that?

Gathering Data to Support That Hypothesis

Well, you can be assured that some product-focused individual at Netflix — at a time prior to 2014 — was asking these exact same questions internally. And that individual or group worked together (probably with UX and related stakeholders) to put together user studies or data elsewhere, to prove that there was indeed a strong link between an image thumbnail and viewership.

That was their hypothesis: that adjusting the artistic content of an image thumbnail could have a strong link to viewership.

Well, turns out, back in 2014, Netflix conducted studies showing just how important that thumbnail is:

Nick Nelson, Netflix’s global manager of creative services, explained that the company conducted research in early 2014 that found artwork was “not only the biggest influencer” for a user’s decision about what to watch, it also constituted over 82 percent of their focus while browsing Netflix.

“We also saw that users spent an average of 1.8 seconds considering each title they were presented with while on Netflix,” Nelson wrote. “We were surprised by how much impact an image had on a member finding great content, and how little time we had to capture their interest.”

A small, compelling thumbnail could mean the difference between getting you to spend the entire weekend watching Netflix’s latest Originals hit or losing interest and bouncing over to a competing service like Hulu or similar OTT streaming services like ESPN / Disney / HBO Go.

So based on studies, the hypothesis above was shown to be very true.

OK, Thumbnails Are Important. But What Exactly Do We Tweak?

And how does an unstructured data set like a bunch of image thumbnails get fed into a digital/mathematical machine learning model? We’ll answer this second question further below.

First, given how important the thumbnail was to a user’s decision to watch something, how can Netflix generate better thumbnails for each user to increase the chance that a user will watch a video?

Using the movie’s original art as the only thumbnail used for every single person most likely won’t yield the highest click rates. The business is likely leaving clicks (and viewer stream time) on the table!

What if Netflix custom created a different thumbnail for each user that is optimized to increase click rates?

What are things within an image thumbnail that are within Netflix’s control that they can tweak to increase those click rates?

Same Riverdale Movie, but two different artistic image thumbnails, based on user’s past preference for romance (sweet smiles) or thriller (serious, dramatic looks) movie genres.

Which actor(s)/character(s) should be on that thumbnail, if any? How many? Which auto-generated frame or poster variation would be most enticing for a particular user to click on? What lighting works best? Filters?

What data do we have on other users’ past clicking behavior can we draw associations from to help inform this thumbnail decision at scale?

  • Increase click-thru-rates (CTR) of movie recommendations — signifying engagement
  • Hypothesis that higher engagement rates will lead to higher subscriber satisfaction and loyalty

So this is a really interesting problem with the image thumbnail that can have a huge impact on the likelihood that someone will click on a video and watch.

If the goal is to maximize that probability of watching by tweaking the thumbnail — what are some product decisions to consider?

Product Considerations In Personalized Image Thumbnails

We won’t dive into each of the use cases above, but let’s dive a little further into the second one: Artwork / Thumbnail Personalization

This is a data-driven personalization feature that sits on top of the Movie recommendation engine

Product Considerations

Algorithms are great, but they do have limitations. A product manager should always think ahead of possible edge case scenarios in which the algorithm may fail to produce the best results.

  1. Each movie should ideally have a personalized thumbnail that maximizes clicks. Since Netflix has data on clicking behavior of other people with similar interests, it is a reasonable hypothesis to guess that if other people with similar interests and watch history had a high click thru rate on a certain thumbnail, then it is likely that this image thumbnail will perform will on a new person who hasn’t yet been recommended this movie / thumbnail.
  2. The personalized thumbnail should take into consideration other movies there are being recommended at the same time — and what those image recommendations are. Let’s say Netflix is recommending 2 different Spiderman movies to a user side by side — and they both have Spiderman facing the camera mask off. One is Tobey Maguire and the other is Andrew Garfield. Wouldn’t it be weird for the user to see both portraits of Maguire and Garfield as Spiderman with their masks off — side by side? Something to account for if that ever were to occur.
    One image thumbnail could work well in isolation, but that may not be good enough when a page of a dozen thumbnails shows up. If they are all optimized to look the same way, then as a group, each one may seem less compelling. So looking at each thumbnail together with what else is being presented will be important.
  3. Data is great, but watch out for algorithms that do their job too well, resulting in unintended consequences / false positives!
    In statistics, they call this a Type I error — falsely (or improperly) suggesting an image thumbnail that shouldn’t be suggested.

Case in point: Just look at the example below of Like Father, a movie starring Kristen Bell. Yet, Netflix’s algorithm (arguably) made false thumbnail recommendations of supporting black actors/actresses who don’t really represent what the movie was about, but did experience a higher click rate among certain ethnic audiences.

Black users are seeing the thumbnail on the right, despite it not being representative of what the movie is about.

So be aware that an overly optimized / personalized experience could create a monotonous user experience that in some cases can be misleading to the user. We want to provide a healthy mix of the familiar with the unexpected but also accurately portray content to the user so they aren’t improperly misled.

Here’s another example:

Based on high likelihood of click-thru-rates (CTRs), Netflix ended up presenting thumbnails to users that matched a user’s ethnicity — — even when that (usually) supporting actor/actress had very little screentime in that movie.

A black user’s recommendation shows thumbnails reflecting her ethnicity — even when that thumbnail is not necessarily representative of the movie in general.

While this is a data-supported initiative, it’s quite obvious to the user that there’s a feeling of dis-ingenuousness that can be misleading in terms of a thumbnail accurately representing that movie (Type I false positive error).

Of course, this algorithm will likely be fine-tuned over time, but the lesson here is don’t overdo it when capitalizing on data — apply some common sense to balance it out.

We don’t want to improperly mislead users or let them know they are being treated differently because of their race, for example.

4. Lastly, the algorithm should take into consideration what thumbnail images the user previously saw in association with this movie and aim to provide consistent, non-confusing user experience.

We want to avoid the user seeing different thumbnails each time that movie appears to the user. Not only would this confuse the user, but it would also make it difficult for a Product Manager to assign attribution to a click — which image resulted in a higher click-thru-rate (CTR) when it keeps changing? PM’s need to be able to properly attribute each new result to a specific change — so maintaining consistent data attribution is important.

So those are some things a product manager would consider when designing edge case scenarios and what extreme cases of data usage can result in. Speaking of data, what specifically does Netflix work off of?

What Data Do We Have?

There’s 2 parts to this:

  1. What data does Netflix use to create these personalized thumbnails / artwork?
  2. What data does Netflix use target these custom-created thumbnails to the appropriate individual?

For the first question, consider that

  • A 1 hour episode of Stranger Things has >86,000 static video frames
  • These video frames can each individually be assigned certain attributes that are later used to filter down to the best thumbnail candidates through a set of tools and algorithms called Aesthetic Visual Analysis (AVA). This is designed to find the best custom thumbnail image out of every static frame of the video
  • Netflix Annotation — Netflix creates meta data for each frame including brightness (.67), # of faces (3) , skin tones (.2), probability of nudity (.03), level of motion blur (4), symmetry (.4)
  • Netflix Image Ranking — Netflix uses the meta data from above to pick out specific images that are highest quality (good lighting, no motion blur, probably contains some face shot of major characters from a decent angle, don’t contain unauthorized branded content, etc) and most clickable

For the second question of what data Netflix uses to identify who to target these custom-generated thumbnails towards, consider that Netflix tracks:

  • # of movies watched, # of minutes of each show watched
  • % of completion for every video/series
  • # of upvotes, which movies were favorited, etc
  • % of overall watch content that is attributable any specific show (and therefore level of affinity that user has to a specific show or related cast members)
  • any seasonal or weekly trends related to a user’s level of engagement, etc.

Interesting to note, in Mid 2018, Netflix stopped accepting user reviews as a data point, which it had previously solicited only on their website. Why? Because this “feature” actually reduces viewership, as negative reviews discourage users from trying out a video. This is just yet another example of how a business need supercedes a popular user need!

So Netflix has a TON of data on each of its customers — from videos watched to images clicked. What do they do with all that data?

How Netflix Uses Data to Construct A Universe of User Profile Interests

Well, they use it to put together a 360 profile of each user and mathematically index every user according to hundreds, possibly thousands of different attributes.

They do this in order to try to group people with similar interests together so they can use data from one user to help predict likely behavior of other similar users.

How does this grouping of similar user profiles work and how does a product manager make sense of the data?

Having gone through the complex math and algorithms associated with matrices, vectors, and n-dimensional feature analysis, I found the easiest way to understand how this works is through a 3D-spatial representation of 10+ dimensions.

Here’s a screenshot I took when using Google’s TensorBoard on the mNIST database of handwritten digits. It’s a fancy plot called the t-SNE plot — effectively a 3D representation of a a lot more dimensions than just 3. In this case, we are showing 10 dimensions (one for each digit from 1 to 10) on a 3D sphere-like coordinate system.

A t-SNE plot of 10 dimensions in a 3D view using Google’s Tensorboard. Looks complex at first, but is actually quite simple.

Each hand-written digit’s position in this spatial representation can be described by a vector — a coordinate-like series of numbers across however many feature dimensions.

Likewise, with Netflix users, each user profile’s position in the above chart could be described by numerical values each representing an individual dimension of that user’s interest — including movie genre, favorite actors/actresses, movie topic, etc.

Reimagining Netflix Users in Mathematical Relation To Each Other

Let’s pretend in the digits diagram above that:

  • “6” = romantic comedy
  • “4” = thriller

If a user is labeled a “6” by Netflix, then he/she will be placed in the general vicinity of where all the other turquoise 6’s are in the above spatial representation (near the bottom).

Likewise, if a user is labeled a “4” by Netflix, then he/she will be placed in the general vicinity of where all the other magenta 4’s are in the above spatial representation (near the top).

Let’s pretend each number represents a movie genre. A user who likes Romantic Comedies (6) could mathematically be closer to someone who likes Parody (5) than someone who likes a Thriller (4).

Notice how the turquoise “6” region (romantic comedy) somewhat overlaps with the grey “5” region. This could be analogous to how users who like romantic comedies could also like parody or satire movies because they both involve laughing.

Likewise, since the magenta “4” region (thriller) is somewhat close to the pink “9” region — this pink 9 region could represent those who like action movies — mathematically closer to the thriller “4” region than the romantic comedy “6” region.

Does that make sense? So when spatially represented, the distance between two user profiles represents how similar / different their tastes are. Of course, this can get infinitely more complex when someone who likes romantic comedies also likes thrillers — but the purpose of this analogy is to show the general idea of mathematical / spatial relationships between different categories.

Interest groups that are related to each other would appear closer together and could be good predictors of what a user will like, given that the user likes something else nearby.

This is how Netflix, or really any company leveraging ML models, creates relationships between seemingly unstructured data and turning that data into numbers. These numbers by themselves don’t make much sense, but together in relation to each other, they begin to make sense.

For the same Good Will Hunting movie below, one user identified as a comedy fan would be shown a Robin Williams (comedian) thumbnail, whereas another user identified as a romantic comedy fan would be shown a kissing thumbnail featuring Matt Damon and Minnie Driver. While not perfect, Netflix’s algorithms suggest that such level of personalization based on user profile characteristics increases probability of click thru rates.

So let’s summarize. A bunch of Netflix image thumbnails is a bunch of unstructured data.

But once Netflix annotates each thumbnail and assigns metadata to each one to describe what’s in that thumbnail — now we have numeric representation of that unstructured data.

Plot that numeric representation in the form of vectors across a 3D sphere like we did above — and now Netflix start forming relationships between data points.

Netflix then finds data points that are relatively near each other and uses them to help predict future click thru behavior. If predictions turn out bad or good, they adjust the mathematical positioning of these characteristics accordingly until the model becomes better and better over time.

So that’s how Netflix turns unstructured data into mathematical representations. It uses the relational distance between data points as a basis for making and improving upon image thumbnail recommendations.

What Did Netflix Learn From All This Data?

Now that we know how Netflix turns images into numbers in a machine learning model, what are some insights Netflix has found from all the data processing and A/B tests they have conducted for so many years?

Well, besides learning the millions of individual thumbnails that converted users to loyal subscribers over time, here are a few additional things Netflix has learned for what works in terms of thumbnails:

  • Show close-ups of emotionally expressive faces
  • Show people villains instead of heroes
  • Don’t show more than three characters

In Conclusion: Netflix Deployed AI (mostly) in the Right Way. Let’s Learn From Their Approach.

Netflix has done a phenomenal job of applying AI, data science, and machine learning the “right way” — using a product-based approach that focuses on business need first, then AI solution next, rather than the other way around.

When applied properly, AI can do wonders.

We’ve seen how effective AI solutions can be in personalizing the experience for the benefit of both Netflix in terms of subscriptions and users in terms of overall satisfaction.

We’ve also seen limitations of algorithms that “overdo it” and discussed specific examples in which the Netflix algorithm presented misleading thumbnails to people of color because the algorithm optimized for clicks, effectively “tricking” the users into clicking bait. This happened even when that thumbnail did not accurately represent that video.

No algorithm will be perfect in accounting for all the nuances of a human experience. In fact, algorithms designed to exploit metrics will do just that — so it is the role of the product manager to work with design or other team members to find ways to address these deficiencies in algorithms.

Going forward, the integration of AI in society as well as in the corporate enterprise space will continue to become more and more prevalent.

Technologists may have a tendency to prescribe existing AI solutions, but really the most effective way to adopt AI is the way Netflix did — from a business driven perspective first.

Dig deep and you will see that Netflix generated supporting data before making the strategic move forward.

As the world of AI, data science, and machine learning continues to grow, we product managers can all take a lesson or two out of the Netflix playbook when it comes to properly deploying AI solutions.

YouTube video showcasing Netflix’s thumbnail generation algorithm.

Don’t forget to give us your 👏 !

--

--