The Role of APIs in Cloud Architecture

Luddy Harrison, July 24, 2023

This is part 2 in our Introduction series

APIs — Application Programming Interfaces — play several important roles in cloud architecture. APIs are easy to understand. Thinking about them the right way simplifies building in the cloud. In this article we explore APIs, the roles they play, and the cloud resources and structures that are used to implement and use them.

Anatomy of an API

What is an API, in the context of the cloud?

API

An API is a protocol for a question-and-answer exchange between a client and server. The client sends a request to the server, and the server sends a response back to the client in return.

Question/Answer Exchange Between Client and Server

While nearly any bidirectional protocol can in principle be used for APIs, in practice the vast majority of cloud APIs use HTTP. Every HTTP request receives exactly one response. The response indicates success or failure, and in the case of success, carries the data that was requested by the client, or the answer to the question posed by the client, or at a minimum, some acknowledgement that the response was received and processed.

The details of HTTP are not necessary for understanding APIs, but to know the basic elements of an HTTP request and response is helpful. A (simplified) HTTP request contains:

a method, namely a verb like GET, PUT, POST, etc.
the domain to which the request is being sent, e.g. query1.finance.yahoo.com
the path within that domain that identifies the API being called, and sometimes additional parameters to the API, e.g. /v7/finance/download/AAPL
a set of queries that are attached to the domain and path, that provide yet more parameters to the API, e.g., ?period1=1591123216&period2=1622659216&interval=1d&events=history&includeAdjustedClose=true
an optional body containing data is a format that is specific to the API

The example here has been chosen from a (now unsupported but still available) Yahoo finance API. Here's what the HTTP request looks like in postman, a handy website for testing APIs:

Request to a Yahoo Finance API from Postman

The postman interface breaks the queries into individual parameters (keys and values) for easier reading. From this, together with the path (see above), we can see that the parameters to this API are:

the name of the stock symbol being inquired about (AAPL, the symbol for Apple Inc.)
the method GET
the beginning of a time period (June 1, 2022)
the end of the time period (June 30, 2022)
the interval, i.e., the granularity of the price data being request, one day in this case
the kind of data being requested (history, presumably for historical price data)
whether or not to include the adjusted closing price, a boolean parameter

This is a fairly typical-looking API.

Notice that one of the parameters, the symbol, comes from the path, and the others from the query string. When there is a distinguished thing referrred to by an API call, it is often named using the path, as though the data associated with AAPL were located in a sub-folder named AAPL on the server. This naming convention is one element of what is known as the REST style of API, for representational state transfer. From our point of view, this is a rather minor detail, and a detail we have no control over when we are using APIs designed by others. HTTP is a flexible format. In addition to the path and queries, it has headers which carry information back and forth between the client and server. Headers are not ordinarily used for the parameters of an API request or for elements of the response per se. They do however carry related information, like how long an API response can be cached before it should be regarded as stale, or authentication credentials.

Let's take a look at the response we get from this particular API call, again from postman:

Response from a Yahoo Finance API from Postman

The body of the response is in CSV format, a popular representation for importing and exporting to spreadsheets. The first row is the header row. It names the columns of the data that follows.

There are a number of headers (20) in the HTTP response, but those aren't important for the purpose of understanding the API. It is also typical of APIs that the important information in a response is (only) in the body of the response.

The Client: User of an API

Clients send requests to an API and wait for responses, and servers receive requests and respond to them. Clients and servers are the two faces of APIs as it were.

What can we say about the client of an API? Typically, the server of an API is remote, meaning that there is some delay in sending a request to it and receiving a response. On top of that, the API itself entails some amount of work — more or less depending on the particular API. The definining characteristic of an API client, then, is that it waits for communication delays and server computation. The game of calling APIs is a waiting game.

This might seem like a relatively minor problem. After all, in a modern computer, like our own laptop or telephone, calls to remote services are happening all the time, and the operating system of our device performs the simple trick of doing something else with the CPU while waiting for a remote call to complete: multitasking. In fact, the history of multitasking operating systems is — at the risk of over-simplifying — the history of overlapping I/O (input/output) operations with other useful work. Originally other useful work meant work belonging to another user. These days it more often means something else that we ourselves are doing, like checking for email notifications or counting how many steps we take, and so on.

What about in the cloud? Surely there must be lots of other work going on that can be performed while we are waiting for an API call to terminate. It can't possibly be that the best that can be done is to idly burn up the CPU doing nothing while waiting, right?

Well, as it turns out, things aren't always so simple.

When an API call occurs in the cloud, it means that a server is acting as a client at that moment. Everything that runs in the cloud is, roughly speaking, part of a server, because the whole point and purpose of the cloud is to serve APIs to clients. So this client API call we are waiting on is the action of some server, who needs the service of some API that is situated remotely, somewhere else in the cloud.

And of course, being a server, there are many independent activities going on at the same time, because servers respond to a multitude of requests at the same time. Many clients talk to each server. So, on the face of it, a server should be able to mask the latency of a client API call by working on other requests while it is pending. Here's a picture of the situation:

The server receives incoming requests, and in the course of processing each one, it makes a client API call, which is to say that it, in turn, calls other servers. Once it has finished processing a request, it sends a response back to the client that sent that request. Notice that the time spans labeled waiting overlap with one another and with other useful work. If the server has enough simultaneous requests, it might never sit idle while waiting for a pending client API call.

That is fine for a traditional server, listening for many incoming requests and processing them concurrently. But we have heard about serverless computing, and from what we've heard, it's often more economical than a traditional server. For one thing, it is an on-demand service: we pay nothing when there are no incoming requests. That by itself is enough to make it attractive when we are first beginning to scale our cloud back end up. Besides, managing servers with their many threads and network connections and complicated memory and CPU usage seems like a lot of complication.

So, how do client API calls perform in a serverless environment like AWS Lambda?

As it turns out, not nearly so well, from one point of view. A Lambda function runs on a single virtual slice of a CPU, a sliver of computation, memory and networking that is made available to run each Lambda invocation. A Lambda invocation that performs the same computation as one request to the server depicted above will look like this:

There is nothing on our virtual sliver of the CPU to do during the waiting periods. Our virtual slice of the CPU sits idle during those times. Of course, there is other work on the actual, physical CPU of which our sliver is a part, but that does no good to us, becasue we are billed for the time we occupy the CPU, regardless of whether we are doing something useful with it or not at any particular moment.

Recall our observation from the previous article:

Note

Building in the cloud is fundamentally about balancing opposing considerations.

The interaction between client API calls, and server versus serverless execution, is a perfect case in point. The tradeoff in cost is complicated. A server incurs cost the whole time it is running, including periods where request traffic is very light. Serverless functions cost nothing when they aren't needed. At the same time, a fully-loaded server may cost much less than the equivalent capacity in serverless functions.

And of course, many other performance and cost considerations come into the picture. A server that scales up well will generally need to be well-equipped with CPU cores and ample storage. Those things cost money, and their pricing isn't necessary a simple equation.

The best and safest position to be in is to have choices, to have the freedom to experiment with alternative architectures for the back end, in whole or in part, with a minimum of disruption to its other resources and the business logic of the application. It should be a primary goal of a low-code platform to offer this kind of architectural flexibility.

The Server: Provider of an API

Having explored the behavior of API clients a bit, let's turn to the behavior of API servers.

In some sense, a server is an abstraction for whatever we use to respond to the many requests to our APIs. We have an internet point of presence, a URL to which clients send their requests. This point of presence must, in effect, dispatch each incoming request to the compute resource that will service it, track the request until a response has been prepared for it, and send that response back to the client. The picture is something like this:

The compute resources may belong to a single physical server, with one or many CPUs and cores. Or there might be a pool of such servers, each able to handle many requests simultaneously. The servers might be thought of as instances of a container, a packaged version of the computation to be performed for each request that is easy to instantiate over and over. Or, the compute elements might be instances of an ephemeral serverless function that performs the work of computing a single response from a request. The lines between these alternatives are blurry, and what looks at first like radically different solutions to the problem of organizing the server seem to melt into one another. A single container instance might serve a single request, or many. It might be instantiated as frequently as requests arrive, or it might persist indefinitely, or at least so long as there seems to be a demand for it. The decision of how to route incoming requests to compute resources might be performed by the network point of presence, or by an agent that represents the compute resources as a cluster. The number of compute elements might be relatively stable, scaling up and down slowly or not at all, or entirely dynamic, changing scale radically in a matter of seconds.

As always, the best position to be in is one of flexibility. Experimentation is the best way to discover the right balance between responsiveness, scaling, cost of operation, recover from failure, and so on. Ideally, a low code platform will make it easy to replace one alternative by another with a minimum of disruption to the application as a whole and the bulk of its business logic.

The API View of a Cloud Back End

A cloud back end can be viewed as a collection of APIs: those that it provides (serves to clients of the back end), and those that it uses (for which it is itself a client). It can be quite illuminating to look at the back end this way, as an alternative to more common ways of looking at it, for example as a collection of cloud resources that relate to one another through events and messages. If we add a client call from each of the compute elements in the above server structure, we have:

Hyrbrid Server Client API View — The API View of a Server

This provides a nice shorthand view on the flow of data through the server, and the likely problems and opportunities created by calls to remote cloud services and APIs. The main observation here is:

Note

From one point of view, the particular resources that are involved in calls to cloud services don't matter. To understand the behavior and cost of the compute resources of a server, it is more important to consider the flow of data between the services and delays involved in making API calls to them.

If we are calling two cloud services per request to our server, it is not so important that one is an authentication service and one is a database. The delays involved and the frequency with which we can make the calls is more important for understanding the behavior and cost of the server itself.