What is GraphQL?
September 22, 2020

Beyond REST APIs - Here Comes GraphQL

Mainframes
API management

Twenty years ago, the big news was SOAP APIs and how this loosely coupled, contract based, platform agnostic architecture was going to be a real game changer. Turns out it was, and I spent a part of my career churning out a software product designed to expose mainframe programs and data as SOAP APIs.

Then, with the proliferation of mobile applications, all attention shifted to REST APIs. Similar conceptually but aimed at a different, pervasive, consumer audience. More churning needed.

Today, I see a lot of focus on yet another API technology. This one, however, is not so much a replacement for existing REST architecture but more an extension of it. Specifically, a query language designed to make accessing one or more REST APIs easier and more flexible. It is called GraphQL and although it has been around for a few years, it seems to be getting more attention these days. Let’s churn.

What is GraphQL?

As much as we have embraced and invested in REST APIs, they do have some drawbacks. Let’s look at one simple example just to highlight where this is all going. Suppose an application developer has two REST APIs. One retrieves a rather extensive list of information (discrete items of data) on a specific customer. It might look something like this (assume a customer id of 38):

Request:        https://endpoint01.8080/Customers/38
Response:     {
                                 "lastName":"Smith",
                                 "firstName":"Jack",
                                 "streetAddress":"123 East Main St",
                                 "city":"Syracuse",
                                 "dateOfBirth":"1987-09-12",
                                 ...
}

A second API retrieves and array of orders for a given customer (also containing many data items):

Request:        https://endpoint02:1840/Customers/38/Orders
Response:     {
                                 “Orders”: [
                                             {
                                                            “orderId”:”ORD#01r50”,
                                                            “orderDate”:”2020-01-23”,
                                                            “orderTotal”:352.00,
                                                            “shippingMethod”:”USPS”,
                                                            ...
                                             },
                                             {
                                                            “orderId”:”ODH10lk2”,
                                                            “orderDate”:”2020-01-23”,
                                                            …
                                             }

                                 ]
                      }

Now suppose that the developer only needs the Customer name and address along with a list of the order Id and order total for orders placed on a specific date. They would need to make two calls and then parse through the results for the information they require. Not so horrible but what if there are more than two APIs involved, the application feature will be used often, or it must be iterated for many customers? It can become a bit “chatty,” wasting bandwidth and placing more burden on the developer.

This is where GraphQL comes in:

             POST /CustomerOrders/graphqlendpoint? HTTP/1.1

             query {

                          Customers (id: “38”) {
                                        lastName
                                        firstName
                                        streetAddress
                                        city
                                        state
                                        Orders (orderDate: “2020-08-21”) {
                                                     orderId 
                                                     orderTotal
                                        }
                          }

             }

Using this query, the developer can make just one call to a GraphQL endpoint and return only the data that is needed. Using a variety of GraphQL Schema definitions, the GraphQL engine can interpret this query, execute the appropriate REST APIs, and marshal the results into a single response. Furthermore, if the functionality of the application expands and later needs some additional information it can simply be added to the existing query.

You can see that once a developer becomes skilled with GraphQL, this will be a much easier way to interact with one, or multiple, related APIs.

Déjà Vu

If, like me, you are or were a mainframe programmer, you are probably familiar with a rather old file system called VSAM (Virtual Storage Access Method). If not, let me explain. This was an architecture that consisted of flat files. Analogous to my REST example above you might have a Customer File and an Order File. Programmers needing to obtain information on a specific Customer and their Orders would need to read through, using separate reads, each of these files to obtain the data they need. Each read would return an entire record from the file regardless of the specific needs of the application. If you think of each JSON document returned by the REST APIs described above as a single row of data from a flat file, then you see the analogy here.  

Now along comes database architecture and, more pertinent to this topic, Structured Query Language (SQL). SQL allowed developers to execute a single query that could return related information from one or more underlying files. In addition, it would only return the data elements that were required. It quickly became the preferred way of retrieving/updating application data and has remained so for decades.

This is GraphQL in a nutshell. Its aim is to do for REST APIs what SQL did for flat files.

More about SQL in our blog:SQL Injection

Pandora’s Box?

The challenge facing GraphQL implementations is the need to take on some of the burdens formerly handled by individual applications. Beyond the obvious need to handling the marshaling and orchestration between a single GraphQL query and multiple REST API invocations, a GraphQL endpoint will need to protect itself from the effects of malicious and ill-formed queries.

Drawing of my database analogy, you may have heard of a Cartesian Product. If you have not, suffice it to say that it is the result of a poorly formed SQL query that can bring an entire database subsystem to its knees. Any implementation of GraphQL will need to consider similar issues.

How do you prevent ill-formed GraphQL queries from tying up an entire endpoint? DB2 employed a technique known as a BIND. A BIND generates an “Access Path” to your data that can be analyzed for efficiency. This is possible since the database environment is completely contained and controlled. That will not be the case with APIs as the actual processes that sit behind them are unknown.

So how can GraphQL manage this? The answer is probably evolving but for now it comes down to a combination of techniques such as Throttling and Rate Limits.

Does This Dog Hunt?

I can remember people raising an eyebrow to database and SQL back in its inception. Any doubts were quickly extinguished. As standards and techniques evolve and mature, I think the advantages that GraphQL brings to the application developer will help to crystalize GraphQL into an industry standard as it can technically be extended to other data access architectures. Imagine the ability to write a single query that could organize and deliver data from a wide variety of resources such as REST and SOAP APIs as well as raw data from . . . you pick the data store!

The sky is the limit.

Further Reading

Learn more about Sola, the Akana mainframe API solution.

SOLA MAINFRAME APIS