How Netflix Uses Throttling to Prevent 4 Big Streaming Problems

Netflix reveals their unconventional trick to keep viewers happy

Richard Oliver Bray

Aug 15, 2024

Article voiceover

1×

0:00

-6:12

It would be really difficult to find someone who has never heard of Netflix before.

With around 240 million paid subscribers, Netflix has to be the world's most popular streaming service. And it’s well deserved.

Wherever you are in the world, no matter the time or device, you can press play on any piece of Netflix content and it will work.

Does that mean that Netflix never has issues? Nope, things go wrong quite often. But they guarantee you'll always be able to watch your favorite show.

Here's how they can do that.

Estimated reading time: 4 minutes 13 seconds

What Goes Wrong?

Just like with many other services, there are lots of things that could affect a Netflix user's streaming experience.

Network Blip: A user's network connection temporarily goes down or has another issue.
Under Scaled Services: Cloud servers have not scaled up or do not have enough resources (CPU, RAM, Disk) to handle the traffic.
Retry Storms: A backend service goes down, meaning client requests fail, so it retries and retries, causing requests to build up.
Bad Deployments: Features or updates that introduce bugs.

This is not an exhaustive list, but remember that the main purpose of Netflix is to provide great content to its users. If any of these issues prevent a user from doing that, then Netflix is not fulfilling its purpose.

Considering most issues affect Netflix's backend services. The solution must 'shield' content playback from any potential problems.

Sidenote: API Gateway

Netflix has many backend services, as well as many clients that all communicate with them.

Imagine all the connection lines between them; it would look a lot like spaghetti.

An API Gateway is a server that sits between all those clients and the backend services. It's like a traffic controller routing requests to the right service. This results in cleaner, less confusing connections.

It can also check that the client has the authority to make requests to certain services and monitor requests, more about that later.

The Shield

If Netflix had a problem and no users were online, it could be resolved quickly without anyone noticing.

But if there's a problem, like not being able to favorite a show, and someone tries to use that feature, this would make the problem worse. Their attempts would send more requests to the backend, putting more strain on its resources.

It wouldn't make sense to block this feature because Netflix doesn’t want to scare its users.

But what they could do is ‘throttle’ those requests using the API Gateway.

Sidenote: Throttling

If you show up at a popular restaurant without booking ahead, you may be asked to come back later when a table is available.

Restaurants can only provide a certain number of seats at a time, or they would get overcrowded. This is how throttling works.

A service can usually handle only a certain number of requests at a time. A request threshold can be set, say 5 requests per minute.

If 6 requests are made in a minute, the 6th request is either held for a specified amount of time before being processed (rate limiting) or rejected.

How It Worked

Because Netflix's API Gateway was configured to track CPU load, error rates, and a bunch of other things for all the backend services.

It knew how many errors each service had and how many requests were being sent to them.

So if a service was getting a lot of requests and had lots of errors, this was a good indicator that any further requests would need to be throttled.

Sidenote: Collecting Request Metrics

Whenever a request is sent from a client to the API Gateway, it starts collecting metrics like response time, status code, request size, and response size.

This happens before the request is directed to the appropriate service.

When the service sends back a response, it goes through the gateway, which finishes collecting metrics before sending it to the client.

Of course, there are some services that if throttled, would have more of an impact on the ability to watch content than others. So the team prioritized requests based on:

Functionality: What will be affected if this request is throttled? If it's important to the user, then it's less likely to be throttled.
Point of origin: Is this request from a user interaction or something else, like a cron job? User interactions are less likely to be throttled.
Fallback available: If a request gets throttled, does it have a reasonable fallback? For example, if a trailer doesn’t play on hover, will the user see an image? If there's a good fallback, then it's more likely to be throttled.
Throughput: If the backend service tends to receive a lot of requests, like logs, then these requests are more likely to be throttled.

Based on these criteria, each request was given a score between 0 and 100 before being routed. With 0 being high priority (less likely to be throttled) and 100 being low priority (more likely to be throttled).

The team implemented a threshold number, for example 40, and if a request's score was above that number, it would be throttled.

This threshold was determined by the health of all the backend services which again, was monitored by the API Gateway. The worse the health, the lower the threshold and vice versa.

There are no hard numbers in the original article on how much resource, or time this technique saved the company (which is a shame).

But the gif below is a recording of what a potential user would experience if the backend system was recovering from an issue.

As you can see, they were able to play their favorite show without interruption, oblivious to what was going on in the background.

Let's Call It

I could go on, but I think this is a good place to stop.

The team must have put a huge amount of effort into getting this across the line. I mean, the API gateway is written in Java, so bravo to them.

If you want more information about this there's plenty of it out there.

I recommend reading the original article, watching this video, and reading this article as well.

But if you don't have time to do all that and are enjoying these simplified summaries, you know what to do.

PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (LinkedIn, Twitter, YouTube, Instagram). It only takes 10 seconds. Making this one took 19 hours.

Hacking Scale by Better Stack