🇬🇧🇺🇸 GraphQL vs REST

In this post, I want to talk a little about GraphQL , the newest (sort of, not) and hottest tool that supposedly helps you build better HTTP APIs and how it fits in the bigger scheme of things.

I will talk about this new question technical decision makers have to answer more frequently when having to decide on the technologies/architectures of greenfield projects. I will talk from a strategic point of view, and will try to point out the advantages and drawbacks of both solutions, what we need to take into consideration when we pick GraphQL over REST, REST over GraphQL, or both (or none?).

I am going to say this straight out of the gate: both are good, one is not better than the other. Each one has its drawbacks and its advantages, and specific use cases.

What are they?

Firstly, to settle on a common vocabulary, I will talk about what they are, because out there in the industry, there are various misconceptions about what GraphQL and REST mean.

They are not a technology, they are a set of guidelines/specifications that help us structure our HTTP APIs in such a way that makes sense and help our clients with consuming these APIs. By adhering to such a specific guideline, we enable our clients to use specialized clients that work well with our servers, speeding up their development time, minimizing maintenance and allowing them to use battle-tested open source clients so they won't have to reimplement the wheel.

REST

REST comes from RE presentational S tate T transfer and comes with a set of guidelines for structuring and designing your API so that you get a predictable, extensible and functional API.

In the case of RESTful APIs, it is very important to use the HTTP specifications to its maximum:

  • everything is a resource that can be Created, Retrieved, Updated, Deleted (CRUD).
  • well designed URLs, that serve only one type of resource
  • /plural-name-of-the-resource/ -> dealing with sets of the said resource
  • /plural-name-of-the-resource/specific-id/ -> dealing with a single resource instance
  • HTTP verbs (GET, POST, UPDATE, DELETE) used for specifying the action we want to take (CRUD).
  • Status codes so we inform the client about the state of the request (successful or not, the reason for success, giving them a hint for the error reason).

These three things are the pillars of building a RESTful API. Things such as the way we serialize the data, whether it is JSON or XML, doesn't matter. Ideally, clients would specify the format of the data they would like to receive in a Accepts request header, and they will be able to get the data in various formats.

For example, let's say we are developing a web app that controls virtual machine deployment in a datacenter. Exposing functionality about virtual machines would look like this:

  • GET /vms/ -> returning a list of the available VMs. REST specifications doesn't offer a guideline on how pagination or filtering should work, so it is up to the person designing and implementing the API. Status codes such as 201 (Created) may be returned, but a plain 200 (Ok) would be fine too.
  • POST /vms/ -> creating a new VM with the details passed in the request body. Errors such as 400 (client error) may be returned if the user posted data contains invalid information (eg. the VM name contains forbidden characters).
  • GET /vms/2000/ -> reading information about the VM with the identifier 2000. Errors such as 404 (resource not found) may be returned if the specified ID doesn't exist in our database.
  • POST /vms/2000/ or PATCH /vms/2000/ -> partially updating a resource with the data sent in the request body.
  • PUT /vms/2000/ -> replacing the resource entirely, would be equivalent to POST /vms/ but with a specific user provided ID. Usually this endpoint isn't used because IDs are automatically generated by the system.
  • DELETE /vms/2000/ -> deleting a resource entirely.

There are some status codes that can be returned by all endpoints, such as

  • 401 (Unauthenticated - the client didn't include any authentication details in their request, so we can't allow them to perform the desired action)
  • 403 (Forbidden - the user included the authentication details, we know who they are, but they are not authorized to perform the specific action - eg. when trying to delete another user's virtual machine).
  • 500 (Server error) - shouldn't EVER happen, because it means that something crashed in our application.
Drawback 1: a lot of endpoints to manage

Having to create these URL set for each resource, creates a lot of clutter, makes it harder to develop and maintain them and each endpoint has to be developer individually (document the URL, the accepted query parameters, the accepted payload if any, and then all the possible response types, from successful results, to all possible errors that can occur).

Having a RESTful API that manages a few resources shouldn't be too much to handle for a small team. But in the wild, with the complexity of today's web apps, the number of resources is pretty big.

Drawback 2: nested resources

Usually, resources have connections between each other. A virtual machine has some storage blocks and some network interfaces attached, a cluster has some physical nodes, these nodes have some virtual machines, virtual machines are owned by users or teams, teams have some users, users have some permission policies attached, etc. That's how real-life applications look like. It is very rare that you have resources that are entirely independent.

When we linked resources, we can say they are nested from a data retrieval perspective. For example, when we want to query a virtual machine, we don't really want to get ALL the data attached to it, because we don't really know what data the client needs. Let's take the VM and storage blocks and network interfaces example.

Sometimes, clients will need all the information to display it, sometimes they don't. So we don't really know what to return. We are left with three choices:

First one is to include all the information for a nested resource in the parent resource

GET /vms/100/

{
    "id": 100,
    "name": "server-apache-2",
    "specs": {"ram": "1Gi", "cpu" 1},
    "storage_blocks": [
      {"id": 1004, "name": "apache-main-disk", "capacity": "100Gi", "used": 0.67},
      {"id": 2003, "name": "apache-secondary-disk", "capacity": "25Gi", "used": 0.22},
      {"id": 4503, "name": "apache-third-disk", "capacity": "10Gi", "used": 0.11},
    ],
    "network_interfaces": [
      {"id": 11754, "name": "vpc-1", "attached_to_vpc": 15324},
      {"id": 55432, "name": "vpc-2", "attached_to_vpc": 65353},
      {"id": 24341, "name": "vpc-3", "attached_to_vpc": 98743},
    ]
    ...
}

But then, we run into another problem: nested resources might have other nested resources as well (eg. network interfaces will be linked to a VPC. Should we include that data as well?). If we just include all the nested resources whenever we can, we will end up dumping the whole database with each request. That's not really a good long-term strategy.

So we have another choice: include references to the linked resources , and make the client request the data of the linked resources by their ID, from their own endpoints. The request from above becomes

GET /vms/100/

{
    "id": 100,
    "name": "server-apache-2",
    "specs": {"ram": "1Gi", "cpu" 1},
    "storage_blocks": [ 1004, 2003, 4503 ], // or with hyperlinks: [ "/storage_blocks/1004/", "/storage_blocks/2003/", "/storage_blocks/4503/"]
    "network_interfaces": [ 11754, 55432, 24341 ]
    ...
}

This way, we only include the IDs (or the hyperlinks) of the nested resources, so that the user can retrieve them individually if they are really interested to see more detailed information about each nested resource.

This approach will result in a lot of requests done to get the complete data (to get the same data as in the first approach, 7 requests need to be made - one for the vm, 3 for the network interfaces, 3 for the storage blocks, very similar to the infamous N+1 queries problem , but in the HTTP APIs space).

Third approach will be having the nested resources as a subset accessible at once, on top of the main resource.

The GET /vms/100/ will not include the storage_blocks and network_interfaces keys, but instead we will have available two extra endpoints that will return the list of the resources nested in the VM:

  • GET /vms/100/storage_blocks/
  • GET /vms/100/network_interfaces/

This, way, we can also model the CRUD operations on nested resources (eg. attaching a new storage block will be done with POST /vms/100/storage_blocks/ , removing a storage block will be done with DELETE /vms/100/storage_blocks/4433/ ), problems we didn't even think about when dealing with the first two approaches.

So, how to deal with nested resources? It depends (of course) on the access patterns by investigating how often resources will be needed to be retrieved together. If a lot of clients need the nested resources every time they query the main resource, it would make sense to include the full nested resource data in the main resource query. There's no perfect solution, and each of the three approaches works on specific data access patterns and data types. It's up to you or the application architect to figure that out.

Advantage 1: familiarity

Enough negativism, REST has some good parts too. The biggest advantage by far is the familiarity: every developer can figure out a RESTful API without significant effort. Endpoints are separate, if you need to get resources, you issue a GET request, the code that interacts with such an API is self-documenting to some degree (hmm I wonder what requests.get("https://api.example.com/pending_payments?last=10") does... does it retrieve the last 10 pending payments? It's entirely possible...).

There are a lot of tools out there that can automatically document RESTful APIs, and when developers have to deal with another 3rd party service and have to integrate with their API, they expect to find a RESTful API ready to be used. They don't have to learn new tools to get their job done: just raw HTTP requests and JSON parsing gets the job done.

GraphQL

Enough talk about RESTful, it's time to talk about its younger sibling: GraphQL.

What is it anyway? GraphQL is a specification for designing web APIs, it was made by Facebook, and its main focus is to resolve the biggest pain point of the RESTful APIs: nested resources.

GraphQL has these characteristics:

  • a single endpoint we have to know and deal with: typically /graphql.
  • a custom query language that we have to learn and use, with all its quirks, such as fragments, interfaces, polymorphism (yeah, it OOP time!) and annotations.
  • fancy features such as real-time updates via subscriptions (usually on top of websockets).
Advantage 1: nested resources

We typically need to retrieve more resources at once, based on their relationships, based on what our intention is, etc. The server can't predict what the client needs: each client has very different requirements, and it's impossible to design specific endpoints that return just the right data for each client. It would require a tremendous development effort which surely isn't worth it.

So, in GraphQL, to get just the data we need, we would be able to do a query similar to

query MyQuery {
    vm(id: 100) {
      id
      name
      specs {
        ram
        cpu
      }
      networkInterfaces {
        id
        name
      }
      storageBlocks {
        id
        name
        capacity
        used
      }
    }
}

or to get just the storage blocks capacities

uery MyQuery {
    vm(id: 100) {
      id
      storageBlocks {
        capacity
      }
    }
}

and any other combination of fields and nested entities.

What a GraphQL endpoint is: an endpoint that serves data under a specific static strongly typed schema, so that the client can then get that schema (which is also self-documentating), and then craft their queries based on their speecific needs. Then the server parses the query and retrieves just the data that was requested.

What it did in fact, was to shift the responsibility of determining what data to return from the client to the server. The server just says "here is what data I have available, with these fields, parameters and all these connections", then each client, based on the given specifications, through the query language communicate to the server exactly what data they need.

Advantage 2: Avoiding under-fetching and over-fetching

Another advantage of GraphQL is that the client gets only the data they need. When using REST, if the client needs only one specific field from a specific resource, there isn't a way to get just that. They have to retrieve the full resource and ignore the rest of the unwanted data. There are some weird implementations that can work around that (eg. adding a query parameter ?fields=name,capacity.ram to each request) but that can be very awkward to implement and maintain, especially when you deal with nested resources.

With GraphQL you only get exactly what you requested, thus avoiding under-fetching (having to do multiple requests to get all the data you need) and over-fetching (getting more data than you actually need, because there's no way to easily get only a subset of the fields).

Disadvantage 1: Being the new unknown kid in town

Being such a new specification and having a lot of features, not all developers are familiar with it. Integrating with a GraphQL API is more cumbersome and requires more development effort.

And that's not because of the implementation complexity (on that aspect it's all just a HTTP request after all), but because of the GraphQL language itself: the developers need to learn new concepts, learn to use them and craft their queries carefully, which is a whole thing in itself).

Disadvantage 2: Increased backend complexity

Because of the flexibility GraphQL allows in data fetching (basically allowing any combination of retrieved field, and with no restrictions on how nested you can go with your queries), the backend has to support all that.

The biggest clash between the backend implementation and data fetching flexibility comes when querying nested resources, and these resources are stored in a relational database. Usually, this kind of relational data is retrieved from the database using joins, to avoid the N+1 queries problem, but when developing a GraphQL server, that's a little harder than usual to do.

This is due to the fact that you can't really know beforehand how the data will be fetched over the lifetime of the application, so you kind of have to prepare for all the cases. In theory you can craft special queries for special cases but that takes time and effort.

For example, you have three resources A , B and C and a relationship A -> B -> C . In the GraphQL query, it would look something like this:

query {
  a {
    b {
      c {
        someFieldOnC
      }
    }
  }
}

The server can resolve that data through a three table join (between the tables for A , B and C ), but when the query changes, a new case appears.

query {
  a {
    b {
      someFieldOnB
    }
  }
}

Now we don't need C anymore, so the three table join is not needed anymore. We can get the data we need in a single two table join. We could in theory handle these two cases independently on the server, but that's not a feasible long-term solution. As I was pointing out before, real-life applications are more complex, have a lot of resources that are even more inter-connected, and covering all the possible access patterns is simply not feasible.

There are some solutions for this such as data loaders but they again add some complexity to our code, and introduce more advanced programming patterns in our code (eg. promises and asynchronous programming). With more advanced patterns, the development costs increase, new people will need more time to digest whatever is happening in your code base, juniors will get overwhelmed by all this complexity (you can't reasonably expect junior programmers to be comfortable with asynchronous programming).

Conclusion

Both REST and GraphQL have their good and bad parts. When deciding on what to use, there is no golden recipe for choosing the right one (hint: there is no right one), and we need to choose one based on the limited knowledge we have at the moment. To reduce the chances of a failed project, we should at least ask the following questions:

  • what are the main resources clients will use 80% of the time and how will they be used? Do I expect them to be accessed together with other nested resources? How often?
  • in the local/current software engineering job market, do I find enough talented backend engineers to be able to scale my system? If no, REST is probably the best bet.
  • will my API be consumed by other clients/external programmers or is it aimed only for internal use?

Asking and responding to these questions should give us a better idea on what we can get away with and how we should use our resources. A lot of companies opt for both APIs at the same time: a private GraphQL for internal use and a more restricted/limited public REST API for external client integrations. That's a valid strategy too.