Thoughts on API Design

Recently I’ve been working to unify/redefine backend APIs for a set of related backend systems and my learnings are captured in this post. So here it goes - key things to keep in mind while designing APIs:

Model the API Request/Response Upfront

First step in the API design phase is to outline request and response objects upfront. One can model the API in terms of resource objects and then define the actions a client can take over these resources. The resources may be real or virtual objects that may or may not be tied to a database record. Modeling request/responses correctly upfront is critical since changing/restructuring these objects will involve breaking changes for the clients consuming the APIs.

When defining model resources here a few key things to keep in mind:

Be consistent in naming resource and operations that can be performed on them, use the naming convention common to your team/organization.
Consider any external dependencies required in constructing a response.
Consider the size of request and responses, allow clients to request extra information when necessary instead of creating a bloated standard response.
Choose a date time format and stick to it. ISO 8601 is human readable and very common to use. Use UTC and let the clients do conversion to local timezone.
Consider implementing API versioning upfront. There isn’t a standard way to version APIs (eg., URL based versioning vs Query param based versioning), so use what makes sense.

Well Defined SLAs

Another important consideration when designing APIs is to think about the SLAs (Service Level Agreements) in terms of latency and throughput. One should be thinking about these upfront when designing new APIs since they can have major impact on backend implementations. For example, if you already have a database that you’re building a new API on top of - you may want to consider load testing the exact queries that are expected to run when your service starts to take production traffic and have room for growth. If you expect your resources to not change often you may be able to simply add caching to offload load on the primary data stores. In my perspective meeting SLAs and modeling resources are closely related and tradeoffs are to be made depending on the business use cases and existing data store infrastructure available.

Multi-tenancy

If you’re API is going to be consumed by multiple clients, some sort of access control becomes crucial. This includes things like authentication, authorization and rate limiting. Authentication is about establishing trust that the client is indeed who they claim to be, authorization is about enforcing actions the authorized client can perform. Rate limiting is a mechanism in which clients are limited in terms of number of operations they perform in a unit amount of time. In most cases the client if being throttled should expect to decrease the rate at which they call the APIs in order to not see throttling. Token bucket algorithm is an example of an algorithm that is widely used to implement rate limiting. Note that you usually want to enforce throttling limits per operation and also globally.

Idempotent Operations

Idempotency is more than just being all to repeat a (write) request without failure. Generally you’d want to keep operations idempotent but that does not mean backend state is not altered due to a idempotent operation. It simply means an operation can be repeated without any side effects. As an example if you issue DELETE to an resource, issuing another DELETE would simply return a 404 without altering the server state. Consider another more involved example, let’s take an API that increments a counter by 1. If one request fails to register as successful on the client side due to a network connection error and the client performs another request as a retry the final value at the backend maybe either 1 or 2 - depending on if the first request finished executing on the API. For addressing such issues, Stripe APIs take an interesting approach by providing a custom header that uniquely identifies a request. The server can simply ignore the second request since it knows it has already processed the request once.

Asynchronous vs Synchronous

Depending on the type of work being done on the backend, sometimes it makes sense to do “just enough” work and return a response to the client. For example, if you’re kicking off a long running job given a client request you may only want to create a record in your database (or enqueue the request on a durable queue) to register the request on your backend and then have a different set of processes kick off the long running job. In most cases you’d want to return a computation ID that the client can request to get the status of the job. Note that this can mean a ton of complexity on the backend to implement, but may be worthwhile (or even necessary) to implement if amount of time taken to finish the operation takes a while.

Pagination

Pagination is often presented as a solution when the response may contain too many objects to safely return in one operation. If properly designed you do not need to keep any state in your application and often simply translate the query parameters to a database query. Note that with this approach if the underlying records can change while a client is paginating, and the client will see inconsistent results which should be expected and dealt with on the client side. Ideally you’re providing next and previous URLs within the API response (see HATEOAS) so that the client does not need to do any guessing on their part on how to paginate.

HTTP Headers

HTTP headers are a powerful way to communicate between the client and server all operating parameters of web request. These can include ability to specify cache-control, do authentication and authorization, specify content-type, user-agent, cookies etc. Note that proxy servers have the ability to modify headers so you’d have to be careful about what the setup of your backend is and allow for headers to pass through if necessary.