GraphQL Demystified (The n+1 problem)

5 min readDec 9, 2020

GraphQL is rapidly revolutionizing the way developers build and ship APIs. Consequently, this has presented a paradigm shift in the way clients consume data.

Clients have finally been given the power to ask for exactly what fields need and nothing more. After all, this feels natural and resembles real life. Wouldn’t it feel strange if you ordered an espresso, only to find the barista serving an espresso, a latte, and a cappuccino?

And it doesn’t stop there.

A single endpoint, one HTTP response code, predictable responses, advanced tooling, typed schema, and a rich query language. We once even overheard a passing manager mutter:

“Easy. Let’s ask GraphQL for it, it’s super smart and fast.”

Quite frankly, it's amazing that one can develop such expectations and presumptions about an API.

We utilize the flexibility of GraphQL to power our structured product’s life cycle management. Each structured product can have many fields, metadata, and related performance metrics, so it’s critical that clients can specify what they need.

For those who have only sat on the upper deck, GraphQL may appear to have some mystical superpowers. Those who have visited the engine room may tell another tale.

The Engine Room

“With great power comes great responsibility” — Peter Parker Principle

Our job as engineers is to provide customers with a seamless experience. A large portion of this responsibility lies with our GraphQL Java API which provides the backbone queries that service multiple products. This includes the retrieval of near-real-time stock market prices & delivery of user notifications.

Like all technologies, GraphQL comes with its challenges and complexities. Performance, security, caching, monitoring, instrumentation, and multi-threading to name a few. As you can imagine, we obtained a wide variety of friction burns, scars, and many lessons learned.

Users expect everything to happen instantly. And that’s OK, after all, it’s 2020. To meet such demands, it’s fundamental that our GraphQL server is as efficient as possible. A huge part of that is batching our downstream REST and gRPC requests.

The n+1 Problem

GraphQL fields resolve in an independent manner. This means that each field will trigger the execution of a matching resolver method. What if the resolver executes an expensive operation such as a network request?

Hello, n+1 problem.

The n+1 problem

Naive resolver implementation

In the above example, the server sends a single request to the stock datastore. A request is then sent to the price service for each stock. For example, if there are 20 stocks, then 20 requests are made to the price service. Forming a total of 21 requests. The server should be able to retrieve all stock prices in two requests. One to fetch the stocks and another to fetch all the necessary prices.

Without correcting this behavior, one can expect huge implications in their distributed system — Network, performance, latency, etc.

The Solution

DataLoader

“DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.”

Like GraphQL, the dataloader concept originated at Facebook. Fortunately, a JavaScript implementation was open-sourced, followed by a pure Java8 port.

Batching is a dataloader’s primary feature. A dataloader will load a key and return a promise. The keys are grouped and supplied to a batch function. Promises are correlated and completed with the batch function’s result.

It is essential that all backing APIs support batching.

Attempt 1: BatchDataLoader

Creating the DataLoader

The GraphQL server defines and registers the required dataloaders into the DataLoaderRegistry. The DataLoaderRegistry provides access dataloader references during resolver execution.

DataLoader correlating based on an ordered list

DataLoader Scope

The scope of a dataloader is very important. In our case, we always create a new DataLoaderRegistry per GraphQL query. This ensures that the dataloader cache and batch function will be specific to the user requesting it. If data can be safely shared among queries, then one can consider sharing a dataloader instance among web requests/queries.

Loading keys to the DataLoader

The DataFetchingEnvironment has access to the DataLoaderRegistry’s dataloaders. Load each stock Id into the appropriate dataloader. The dataloader will dispatch upon loading the last stock Id, or if the dataloader’s max batch size has been reached. If the max batch size is reached and all keys have not been loaded, multiple batch requests will be made.

Dataloader Max Batch Size

The dataloader max batch size should equal the downstream service’s recommended max batch size. Ideally, this should be set to an optimal figure provided by the downstream batch service team. The max batch size can be set during dataloader construction. Each dataloader can have a different max batch size.

Problems

This dataloader implementation requires a 1:1 ordered mapping of batch loaded keys to the values returned. This approach is error-prone and presents many risks. If the batch service returned an unordered or partial list, then the correlation between CompleteableFuture and batch entry may be incorrect. This leads to a data error that fails silently.

Attempt 2: MappedBatchDataLoader

The MappedBatchDataLoader correlates the CompletableFutures based on a returned Map<Key, Value>. If a loaded key is not present in the map, the correlating CompletableFuture is completed with null. In the majority of use-cases, this approach is safer than correlating on list position.

For example:

If you query a database with 10 keys, the resultset may contain less than 10 results. Additionally, they may be out-of-order.
If you query an external service, can you guarantee that the developers will not introduce a bug and return data out-of-order?

DataLoader correlating based on Map<K, V>

Passing an additional object to the DataLoader

An additional object can be supplied to the dataloader load method. This is known as the context object. This value is commonly used to perform a finishing function on each batch response entry.

This approach provides us a way to pass additional information to the dataloader whilst retaining optimal key-lookup performance.

Passing a context object

The key contexts are available inside the BatchLoaderEnvironment.

Thread Context World

Often ThreadLocal variables store user information. Key use-cases include MDC, correlation ids, and spring security’s SecurityContext.

Dataloaders execute in a different thread. The loading thread’s threadlocal variables must propagate to the dataloader’s threadlocal. Delete the dataloader’s threadlocal variables upon dataloader completion.

Thread Local Propagation

When using Spring Security, the SecurityContext can be propagated using Spring’s DelegatingSecurityContextExecutorService.

Security Context Thread Propagation

Conclusion

The default GraphQL field resolver implementation may lead to the dreaded n+1 problem. At scale, the n+1 problem will escalate rapidly, causing severe performance problems.

The dataloader pattern offers a simple solution to enable optimal network requests within a GraphQL query.

Happy coding :)

Philip