Craig's Blog

Api Best Practices

Structuring an API is a contentious topic, and people have vastly different, often contradictory opinions. Take these tips as opinions that have been hard-earned over time. I have supported high-scale and low-scale APIs, internal and external APIs, low-latency and high-availability APIs. One of those APIs has the pleasure of being a tier 1 AWS service. While I do not love all the compromises I made for that API, it is nonetheless a tier 1 API. Many APIs are designed by a mix of engineers, customers, and product managers. It’s a strange space because the “true” customer of an API is likely the engineers under the customer, but they commonly do not get a voice. Many of these tips come from both my experiences trying to use other people’s APIs and trying to manage the operational excellence (OE) of the APIs that I build. I hope my lessons and tips help you out. At the very least, I will add a few additional resources for you to leverage. Every minute you spend ensuring that an API specification is rock-solid will save you hours down the line in implementation and maintenance. APIs (both internal and external) are not something to rush.

API Tips

1/ Resource centric. The first thing you need to establish is what the resources are that your APIs will operate on. While it can be tempting to tackle a bunch of “operations,” designing a mutating resource is much easier. “AddX,” “RemoveY,” “MutateZ” all seem like reasonable operations, so we could make them all APIs, right? Well, you will get absolutely destroyed when you try to handle idempotency or race conditions. You can kind of fix it by adding idempotency tokens, client tokens, or locks/semaphores, but you have backed yourself into a corner.

2/ Consistent latency. A given API should take within an order of magnitude of the same amount of time. If sometimes your API returns data in 100ms and sometimes it takes 10s, how will your customer design backoff and retries? It is even better if you can design all of your APIs to take within an order of magnitude the same amount of time. Then the customer can have a single retry/backoff/connection strategy for your entire API/SDK. Fun fact - if you use DynamoDB as your datastore and only interact with it, you get this for free.

3/ CRUD. Design support for Create, Read, Update, and Delete from day one. If you followed tip #1 for resource centric design, this tip seems obvious. But customers want to create, read, update, and delete their resources. Even if you do not want to release support for it immediately, this should be designed from day one.

4/ Never delete. This seems contradictory to CRUD, but it isn’t. You should tombstone data instead of deleting it. When a customer wants to delete, consider that a soft delete and mark the item as “Deleted,” but don’t actually delete the record from your database immediately. I like to attach a Time to Live (TTL) of 30 or 60 days to items. If a customer doesn’t call and say “whoops” after 30 days, they probably do not want their data back. Fun fact - if you use DynamoDB as your datastore and only interact with it, you can add a TTL field and get this behavior for free.

5/ Idempotent. This should go without saying, but make your service idempotent. Consider your customer an “at least once delivery” client, which means there is a good chance the customer sends you the same request twice accidentally. CreateObject should create one object, not two.

6/ Explicit is better than implicit. Do not make your customers guess what the side effects are going to be. Do not make your customers use some arbitrary set of parameters for specific behavior. Customers like explicit behavior. Look at Amazon S3; they built a lot of features to disable public access because while you can implicitly configure to disable public access, it’s really nice to know for a fact that public access is disabled.

7/ Async is better than long sync. Most HTTP requests time out somewhere around 30s, but this is normally configured closer to 25s. If we assume 5s for maximum round-trip time (RTT) (depends on user type), then that leaves 20s, half of which is 10s. The longest possible synchronous API you should build takes a maximum of 10s. I personally recommend anything over 5s can be moved to async. If it takes longer, return async. Give the customer a token that they can then poll. Fun fact - if you use Swagger and Swagger-generated SDKs, you get this for free with a Smithy waiter. The Smithy waiter makes async operations look synchronous for those customers that prefer sync operations.

8/ Pagination. Anything that can be considered an unbounded array is a dedicated resource with its own list. If you try to do something like a get-item with an ever-growing list, you will run into problems with consistent latency (#2). Furthermore, customers will likely want to filter that list, so you might as well build that into the API semantics.

9/ Versioning. It’s essential to have a versioning strategy for your APIs from the beginning. As your API evolves, you’ll need to introduce breaking changes at some point. Having a versioning strategy in place allows you to roll out new versions of your API without breaking existing clients.

10/ Documentation. Well-documented APIs are crucial for adoption and ease of use. Your documentation should cover all aspects of your API, including resource models, request/response formats, authentication and authorization mechanisms, error handling, and usage examples. Consider using tools like Swagger or OpenAPI to generate interactive documentation directly from your API specification.

← Back to all posts