Tenant pinning in a multi-tenant cloud service

Tenant pinning in a multi-tenant cloud service


So you’ve got your multi-tenant cloud service in place and you already have multiple tenants using your service.

Up until now, everything seemed to be working pretty well. But you encountered this weird phenomenon. Every so often, latencies go up across the board and a lot of your active tenants are impacted during that time.

After some investigation, you find out that the root cause is one of your beloved, well-paying, huge tenants going crazy with his usage – and consuming most of the machine resources in the instances affected with the high response times.

It doesn’t have to be as bad as having your success rate impacted, for that you probably already have some auto-scaling in place and maybe even a throttling mechanism to handle the excessive load.

One option you might consider is an approach called “tenant pinning” which is some form of partial “tenant isolation”.


When developing a multi-tenant cloud service, each instance of your service serves more than one tenant. Therefore, the resources of each instance are usually also shared between multiple tenants to serve everybody’s requests.

Most of the time, different tenants will have different usage patterns, whether it is different peak hours due to geographic location, different volumes of traffic, and even different types of usage (some types of usage might be more machine-resource consuming than others).

Tenant pinning basically means you isolate your tenants into different groups, where each group is assigned to a different set of service instances (and their dependencies).


One way you could decide to group your tenants into multiple sets of services is by deploying your service on multiple Scale Units.

In short, each scale unit is a replication of the entire cloud system (a different set of service instances, DBs, and other service dependencies you might have).

The idea is that each group of tenants will be assigned to a different scale unit which will serve only that group of tenants. That way, each tenant can only potentially affect other tenants in the same group he’s assigned to – instead of impacting all tenants using your service. You could also group your tenants in a smart way so that impact is minimized even more. E.g. group together tenants that have different peak usage times, or isolate very big tenants in their own dedicated scale unit.

That doesn’t mean you have to pay much more for your cloud infrastructure. Having multiple scale units means you can make each scale unit “smaller” (a smaller amount of instances or fewer machine resources in each instance) than the original cloud system. That is because now the overall traffic of the entire service is split between multiple scale units.

As a side note, you will also want to make sure that each scale unit has service instances deployed in multiple DC’s for high availability.

“Wait a second… how can I route the requests of a specific group of tenants to their designated scale unit?”

Great question!

Here are two possible approaches to consider:

  1. Authentication and Gateway Routing: To use Gateway Routing, you will need a component in your cloud system that is not inside any specific scale unit – and whose sole purpose is to route traffic to different endpoints in different scale units. If your service requires your users to authenticate, it probably means you also don’t know who is the tenant that the current user belongs to. It doesn’t matter which authentication protocol you are using, as long as you don’t know the tenant this user belongs to, you cannot route his or her requests to the tenant’s assigned scale unit. Therefore, all of the workloads related to authentication (e.g. token\session cookie validation etc…) will need to be offloaded to the gateway routing component, or better yet, to a new service used for authentication which will also not be part of any specific scale unit.After authentication is complete, and you know the tenant the current user belongs to, you can route the request to the tenant’s designated scale unit.
  2. DNS Routing: If you have a DNS zone for your service and it makes sense for each tenant to use a different hostname to access your service (e.g. different subdomains like {tenantName}.MyApp.com) this approach can work well. Basically, as part of the registration\onboarding process to your service, a new DNS record will be created in the DNS zone for that specific tenant. The DNS record will map the tenant’s hostname to his designated scale unit. One major drawback to this approach is that you can’t guarantee that one tenant won’t access your service using another tenant’s hostname. To achieve strict tenant pinning here, you will need to have the authentication process of your service to validate that the user indeed belongs to the tenant whose hostname was used to access the service.

So which strategy should you use so your code could automatically distribute tenants to similar sized groups?

One easy way is by using the unique identifier of the tenant.

During the sign-up or registration process of a tenant to your service, some unique identifier value was probably assigned to identify it across the system.
What you basically want is a function to map the identifier of a tenant to a scale unit. You can do this by hashing the tenant identifier first, and then performing modulo according to the amount of scale units you have.

SomeHashingFunction(tenantId) mod scaleUnitCount

Keep note that your function might change a bit once you add more scale units to your cloud system. So make sure the “scale unit count” value you are using is configurable and easily changeable.

Another thing you might also need is to have the ability to pin specific tenants to specific scale units.
For example, if you have a very large and active tenant who has a lot of traffic, and you don’t want him to affect other customers in the same tenant group he was assigned to by default – sometimes it might make sense to “pin” that tenant to a different, dedicated, scale unit.
Therefore, you’ll probably need to add some “whitelisting” functionality. It will work like this:
Your gateway routing component will first check the whitelist to see if this tenant was pinned to any specific scale unit and if not, use the distribution function discussed above.
Make sure your distribution function excludes the dedicated scale units to make them truly “dedicated”.
Depending on how you implement the whitelisting functionality, it can also give you a mitigation option if you need to dynamically pin tenants to specific scale units.
For example, if one of your tenants suddenly has a big spike of traffic, you can decide to isolate him from everyone else quickly. So that everyone continues to get service without throttling any of the requests.

Tenant pinning is not an architecture decision you have to make early in the development process of a service – but you should definitely consider it once your service matures, in order to further improve your SLA.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s