Test in Production with Forking

Test in Production with Forking

Wouldn’t it be amazing if you could test your code before it is deployed to production, with real production traffic?

Meet forking.

Forking is the practice of copying a request that was sent to your production environment and replaying that request in another environment. Only to later have the processing of those requests and their responses analyzed for unintended differences.

As you can see, one precondition to forking is to have another environment to fork the requests to. Most services will want to fork their requests to a Canary or Test environments. Any new changes and builds you plan on deploying to production should first be deployed to the test environment serving the forked requests.

Once you have a test environment in place, and you continuously deploy new builds to that environment before deploying them to production, you will need to develop the actual forking mechanism and a diff comparing mechanism.


Forking Mechanism

Two main strategies for forking mechanisms are:

  • Front (or head) forking
  • Tail forking

Front Forking

This strategy is only relevant for “Read” (GET) requests – that does not modify state or data.

With front forking, you basically have a proxy or a gateway in front of your cloud service, which according to adjustable and programmable rules, can copy some of the requests and replay them to your test environment:

Front forking-page-001.jpg

In order to later analyze the diffs of the results, the proxy or gateway component will have to assign a unique identifier to the incoming request that will be sent to both the Prod environment and the Test environment along with the request.

Analyzing Diffs:

You have two options on how to analyze diffs when using Front Forking:

  1. Telemetry – Correlate and compare the event logs of the original Prod environment and the forked request in the Test environment. As this analysis should not degrade the performance of your production environment, you will probably want to run the analysis as a separate cloud service which you can scale independently of your customer-facing services. You should decide which fields in the logs count as a diff and which do not (e.g. difference in request processing time or latency should not count for a diff).
  2. Runtime (Not Recommended) – After the response from the Production environment returns to the end user, the proxy\gateway component can wait for the response from the test environment to arrive as well, and then compare them according to key fields in the responses. The additional value in this approach is that you can compare fields which contain sensitive information that you might not be able to log. E.g. if the address of the user is considered PII and you cannot log it, you will still be able to compare the addresses being returned by both requests in-memory. If logging the hashes of the PII fields and comparing them is sufficient for your use case, then there is no reason for you to use this diff analysis strategy. There are multiple reasons why this approach is not recommended, in short, a key component in your hotpath is doing work which should be offloaded so that there won’t be any degradation of the quality of service.

Tail Forking

With Tail Forking, the user request is processed by the production environment and the response is sent back to the user. Then according to adjustable and programmable rules, the original request might be sent to the Test environment for forking.

Variation 1: Simple Tail Forking

Tail forking-page-001

This is the simplest form of Tail Forking. The only added value of this approach is that because Tail Forking doesn’t require a proxy\gateway in front of your cloud service . You can not fork requests that are not “Read\Get” requests with this variation.

Analyzing Diffs:

  1. Telemetry – See “Analyzing Diffs” under Front Forking.
  2. Runtime (Not recommended) – An instance in the Prod environment can compare the response sent to the user and the response received from the Test environment. This approach is not recommended due to degradation in QoS of the production environment.


Variation 2: Runtime Comparison Tail Forking 

Tail forking - Variation 2

In this variation of Tail Forking, during the processing of the request in production, a “processing snapshot” is captured. The “processing snapshot” is aggregating information from various processing steps of the current request. It will include things like results from dependencies (e.g. what is the result\response from the DB when the service tried adding some new object to it), the steps that were taken to process the request and their results, the response that was sent back to the user, and any other information needed to later compare the diffs between the two environments.

After the production environment has finished processing the user’s request and sending the response, the original request, and its processing snapshot are sent to a special endpoint in the Test environment. In order to support the forking of write\update\delete types of requests, the Test environment needs to be set up to work with mocks instead of the original Production dependencies. Then, the “processing snapshot” is read by every mock that is involved in the Test environment forked request processing, and the mock replays the response of the original dependency it is mocking. For example, if during the processing of the request in the production environment, we fetched some data from service X, that data will be kept in the “processing snapshot”, then once the original request is forked to the test environment, the response from service X will be read from the “processing snapshot” by the mock in the Test environment that mocks the class that fetches the same data from service X and the mock will replay service X’s response.

Analyzing Diffs:

  1. Telemetry – See “Analyzing Diffs” under Front Forking.
  2. Runtime (Recommended) – Once the test environment has completed the processing of the forked request and recorded its own “processing snapshot”, the original request, the final response from the production environment, the final response from the test environment and both processing snapshots are all analyzed for potential diffs during runtime of the Test environment. What’s good about this approach is that you don’t have to store in your log store all the request\response information you are analyzing differences on because the diff analyzing is done at runtime.


Forking Tools:

Once a specific type of forking difference (diff) that had been discovered by the forking analysis has passed a certain threshold of occurrences, you will want to mark this specific kind of diff for investigation. In order to track different types of diffs you can distinguish each diff by the values of certain fields like “Prod HTTP response status code”, “Test HTTP response status code”, “Prod result description”, “Test result description” etc…

In order to track the current state of any substantial forking diffs that were marked for investigation, you will want to build a tool that the developers in the team can use. Such a tool can be useful to understand if the build version in the test environment is “production ready” or are there too many substantial unexplained diffs, in that case, you will probably want to deploy the production deployment until the investigations take place. For each forking diff displayed in the tool, it will also be a good idea to show some examples for requests their matching forked requests (+ their responses + traces) that had the diff that is being looked into. That way it will be a lot easier to understand what is the source of the diff, and to evaluate if this is intentional or not.

Sometimes, especially in larger projects, the team won’t be able to look into all of the forking diffs that were found for a specific build version. Therefore, it will be a good idea to store the diffs that were found for each build version that was deployed to the test environment. That way, by looking at the what diffs were added in a specific build version you can correlate the new diffs to the changes that were added to the source control in between the two build versions (and for example, send an automated email to anyone who has checked-in code in between the two build versions).

Obviously, some diffs will be intentional, so you need to have a way to “suppress” or “silence” diffs so that they can be ignored by your “diff tracking” system.

Although forking can sometimes be costly to maintain and build correctly, in the long run, and especially for larger projects, the additional value of almost real testing in production and quality assurance you get for your production candidates can be priceless.


Thanks to Ravi Sharma and Sriram Dhanasekaran for sharing their knowledge on the topic with me and making this post possible.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s