How Tinder Secures Its 500+ Microservices
Tinder's highly customised solution that fixed their microservice security chaos
Tinder is an online dating platform famous for its swiping mechanism. Swipe right to like and swipe left to dislike.
It launched in 2012 and has grown to be one of the most popular dating platforms with around 75 million active users.
This growth has led to the development of many different tech-related services over the years. Some of these are external, so they’re open to the public and attackers.
These external services used different third-party solutions for security and routing requests. This, of course, made them very difficult to maintain.
So, the team at Tinder decided to build their own solution. A solution that would match their custom infrastructure.
Here's how they did it.
Estimated reading time: 4 minutes 48 seconds
Why Tinder Has External Services
Tinder has many external services. Some of these include;
The Authentication service for login and session management
Recommendations to suggest matches
Messaging for communication between matched users
Geolocation for location-based matching
Media to handle all image and video uploads
And many more.
These are external because people can access them easily. They can view the API requests and see the data being received.
This also means that bad actors can also see this information. This isn't great for any app, but it's a massive problem for a dating app like Tinder.
Someone could track a user's exact location via the geolocation service. Access phone numbers, images, or other personal information, hijack accounts, and so on.
Not to mention, if a hacker got access to an external service, they could also get access to internal services.
So it's important the team at Tinder keep these external services secure.
The best way to do this is to use an API gateway which enables authorization and security.
The problem Tinder had was that they were using many different third-party API gateways. These used different tech stacks and were difficult to manage.
They needed a single API gateway solution that:
Would have a consistent use of session management across different services
Could be a framework teams could take and use to scale their application independently
Could be customized using a configuration file instead of writing code
Would integrate with their Envoy service mesh
Sidenote: Service Mesh
A service mesh is a piece of software that manages communication between services.
Service meshes are usually added to a Kubernetes cluster and provide features like:
mTLS: Mutual TLS which makes sure that services verify their identities before they can talk
Retries: Automatically retry a connection if a service is down
Observability: Collect metrics and traces from a service
Traffic splitting: Control what percentage of traffic goes to each version of a service if there are many versions
These features and more work without making any changes to the application code.
A service mesh works by adding two things. A sidecar proxy to each pod. This intercepts all the application network calls. And a control plane that adds and manages the proxies.
So, configuration is done via the control plane, which updates the proxies. But the proxies don't need to communicate with the control plane.
Popular service mesh proxies include Envoy, Linkerd Proxy, and Consul Connect Proxy. Popular control planes include Istio, Linkerd, and Consul.
In fact, Envoy is the most popular service mesh proxy with first class support for gRPC.
It's important to note the difference between a service and a pod in Kubernetes.
A pod runs the application code. A service is a virtual layer that can be applied to a single or many pods.
It provides a stable DNS name and IP address. This is used to make sure that internal or external apps can access a pod or pods, even if the pod gets replaced or restarted.
Although you don't need services to use a service mesh, they provide stable names for pods that can complement a service mesh.
The team at Tinder looked into existing solutions like Amazon AWS Gateway, Kong, Apigee, and Tyk.io, but none met all their requirements.
So they built their own solution called TAG, Tinder API Gateway.
How TAG Works
Before TAG can receive traffic, it needs to be configured with a list of routes.
A developer would first create a route as configuration file. This would be a YAML file that could include information like an API endpoint or route, as well as its service and filters.
Filters are bits of logic that can be applied to requests as they come in, and responses as they go out. There are three types of filters.
Pre-Filters: applied to requests before they reach the service (e.g., converting HTTP to gRPC).
Post filters: applied to responses after they leave the service (e.g., adding headers like location).
Global filters: applied to all requests and responses. (e.g., request and response scanning).
These three types of filters have predefined logic inside of TAG, but it's also possible to add custom filters.
Sidenote: Request and Response Scanning
This is one of the ways TAG can prevent attacks on the system, and it does this very cleverly.
When a request or response is sent to TAG, an async event is sent to an event streaming platform. This event contains details like the type of request and endpoint being access.
This is async, so it doesn't block other processes.
The data is securely streamed to other applications using Amazon MSK (Managed Streaming for Apache Kafka).
These applications can check for unusual patterns, bots, or other attacks.
If TAG detects any issues, it can trigger some global filters, such as rate limiting, or just block the request.
When TAG gets an update to the route configuration or starts up, it does a few things.
It creates internal objects that represent each route. Associates the correct filters with each route object. Then, sets up rules for how each request should be matched to a service.
After that, TAG is ready to receive traffic. Here's a step-by-step, high-level overview of how it does this.
A client sends a HTTP request to the backend
The request hits TAG, which applies global filters
A route in TAG is matched to the request
TAG uses Envoy to discover which service should handle the request
TAG applies pre-filters, then forwards the request to the service
Once the service has responded, TAG applies post-filters
Then the response is sent back to the client
Teams at Tinder use TAG as a framework for building their own API gateways just by writing configuration files.
TAG is also used by other companies like Hinge, OkCupid, and PlentyOfFish.
Wrapping Things Up
If I'm being honest, I'm amazed to see how much effort went into building TAG for an app like Tinder. It's easy not to think much of something like a dating platform. How complicated can it be?
But doing the research for this article was a great insight into how problems at scale, no matter the app, can be really complicated to solve.
Check out the original article if you want more details about how Tinder's API gateway works.
And as usual, if you want the next article sent to your inbox as soon as it's released, go ahead and subscribe.
PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (LinkedIn, Twitter, YouTube, Instagram). It only takes 10 seconds. Making this one took 20 hours.