API Keys

What is an API key?

An API key is a mechanism used to authenticate two pieces of software for programmatic access. Contrast this with a username and password, which is typically used for a user like you or I to authenticate with a website so that we can navigate around and use the site. An API key is meant to be a way for a programmer or other advanced user to automatically interact with the system.

When a user logs into a website with a username and password, the server typically gives the web browser a cookie that contains the authentication mechanism it uses going forward. This cookie is why you don’t have to supply your password every time you click a link on Facebook. It also behaves pretty similar to an API key, at a technical level. The cookie is used automatically by the web browser to authenticate each request to the web server. An API key typically works in a pretty similar manner, except without the need to exchange username and password first.

Why should I add an API key to my service?

If you’re considering building out an API for your web application, it’s important to consider how users are going to authenticate to your API. A naïve approach might be to have users send their username and password with every request, such as in HTTP Basic authentication.

But remember how I said that API keys are meant to be used for programmatic access? If you make users authenticate to your API with their username and password, when they reset their password, their software integration would also break as an unwanted side effect.

Additionally, it’s important to use strong hashing mechanisms for passwords to protect your user’s data in case there is a breach of your site. These strong hashing mechanisms tend to be designed to be slow to calculate so that it is harder for people to crack the hashes if they were compromised. That means that if you have to do this hashing on every single request from every single client, your servers would be slower to respond and use considerably more CPU and/or RAM.

Types of API keys

There are a few types of API keys that you might see being used, such as a simple bearer key, a JWT, and what I’m going to refer to as signature-based keys.

Note: For the purposes of this post, I will be leaving out OAuth and it’s associated authentication mechanisms at this time. OAuth is becoming a more popular authentication mechanism for APIs, but it is also considerably more complicated.

Bearer API Keys

The bearer API key is the most common type of API key that you’ll find in use. It’s a simple random string that the web application generates and gives you. You then send that string with your all of your subsequent requests, and the server can look you up by that string to authenticate you and allow you to take actions on your account/resources. Each time you send a request, the server will look the key up in the database to map it back to your user.

Benefits of Bearer Keys

Super simple to implement
Super simple to understand
Easy to revoke a lost or stolen key

Drawbacks of Bearer Keys

The key has to be sent to the server with every request
The server has to look up the authenticated principal with every request

Generating a Bearer Key

When generating a bearer API key, it’s important to use a good source of random data, and to pick a length that is resistant to attacks. But there are also a couple other useful things to consider.

Use a unique key prefix: This key prefix should be shared across all API keys for your service, and could include something that makes it obvious the key belongs to your service. For example, if I launched a new API for my company Room 641A, I might use a key prefix like: r641a_api_. This gives me a unique prefix that I can use to (1) identify what type of token the secret is and (2) setup secret scanning detection and revocation services with platforms like Github and Gitlab.
Include an identifier in the key format: This one is really only important, in my opinion, if you’re planning to hash your API keys. If you hash your API keys, as we’ll discuss in the next section, it’s important that users be able to identify API keys in their dashboard. Using a similar example as before, my new API keys might look like: r641a_api_dadedade_<secret>, where dadedade is an identifier that we store in plaintext in the API key table. This is then a value we can display on our web application dashboard, and it’s how we can reference a given key when communicating about a key, without having to share the whole key. Remember - possession of a bearer key gives you complete access to what that key can do.

Putting this all together, I might generate an API key in my python application like so:

import secrets

API_PREFIX = "r641a_api"
SECRET_LENGTH = 16 # bytes of entropy, used by the secrets library
ID_LENGTH = 4 # bytes of entropy, which should give about 4bn possible ID options. Adjust as necessary.

def generate_key():
    # we used token_hex for identifier to make sure there are no underscores
    # we use token_urlsafe for secret length to capture more random bytes in fewer characters than token_hex.
	return f"{API_PREFIX}_{secrets.token_hex(ID_LENGTH)}_{secrets.token_urlsafe(SECRET_LENGTH)}

This means that from left to right, I can always parse out my API prefix and my identifier, and then know that what’s left is the secret portion of the key.

Hashing vs Plaintext

If you’re familiar with setting up users in a database, you’ve probably seen the advice that you have to hash your user passwords to protect them at rest. This is great advice. Unfortunately most of the time, this advice doesn’t get followed when it comes to API keys, and I think it’s worth discussing why.

One of the major risks that led to the importance of storing (strongly) hashed user passwords, rather than plaintext passwords, is that users are responsible for picking their own passwords. Since users pick their own passwords, a password has historically been more likely to (1) have some significance to that person, (2) be reused across services, (3) not have a lot of entropy (or randomness). So if your website was compromised, an attacker could take the usernames and passwords and just try them on other sites to find other places where that user used the same password. A compromise of your Furby Fan Club website could result in John Doe’s bank account being drained. That’s pretty wild, and that’s one of the major reasons why we encourage strong hashing of User passwords today.

API keys don’t have this same level of risk associated with them, because the API key is inherently limited to your service. Additionally, API keys are typically machine generated and we can guarantee their length to be some high number. So in the event that your service is breached, the attacker gains a bunch of API keys that can be used to access data on your site (data they probably already have if they dumped your API key table), or make modifications to your data (which they probably could have already done if they were in position to dump your API key table).

I’m still a fan of hashing API keys, but think we can skip the more advanced hashing algorithms in favor of something simple like Sha256. Sha256 is a very fast hashing algorithm by today’s standards. As you can see here, my desktop was able to run 1000 sha256 hashing iterations in 0.0005 seconds.

Python 3.12.0 (main, Nov 12 2023, 12:45:37) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib, timeit; print(timeit.timeit(lambda: hashlib.sha256(b"input_string").hexdigest(), number=1000))
0.0005083880000000818

By storing your API keys hashed, it helps to remember that they are a security credential and should be treated as such. You don’t just want everyone with access to your Django Admin to see all of your users’ API keys, do you? While Sha256 isn’t a very strong hashing algorithm these days for the purposes of passwords, remember that we’re mitigating a bunch of that risk by using long machine generated API keys.

JWT API Keys

I don’t have a lot of opinions on JWT API keys, however… If you’re considering using JWT API keys for your application, I recommend you read the next section very carefully.

Don’t use JWTs for your API keys.

JWTs can be quite useful for short-lived access credentials, but creating long-lasting credentials as JWTs can open up new problems that you have to build bespoke solutions around.

Benefits of JWT API Keys

API keys can contain context about the user, preventing the need to look the user up on every request
Data contained within the API key is signed and/or encrypted so we can know if it’s been tampered with and reject it
Has a built in expiry mechanism to force key rotation
Can potentially be detected programmatically, depending on what values are visible in the JWT.

Drawbacks of JWT API Keys

The key has to be sent to the server with every request
Revoking a JWT that has been stolen requires building a custom explicit deny list of JWTs that you no longer do business with. So every request, you still have to lookup the security credential against this list, even though you don’t have to look it up in the “API Keys” database table.
If information about the user changes, the JWT needs to be reissued to capture that information, which isn’t very friendly in a typical API key scenario. At this point, should you revoke the old JWT since it’s no longer accurate?
Isn’t likely to integrate well into secret scanning services without heavy modification

Signature-based API keys

Signature-based API keys work similar to how public key cryptography works, or similar to how you might use an SSH key to push your code to github.

A user generates a public/private keypair on their machine and then provides the public key to your website. You can use ssh-keygen to generate this key, and you could make your application support RSA or ED25519, that way you have support for both of the most common types of keys.

On each subsequent request, the user uses their private key to sign their request (see RFC 9421), and then the server is able to receive the request and validate the signature using the stored public key.

Benefits of Signature-based API keys

The secret value that authenticates the client to the server never gets sent with a request
Intercepted requests can’t be tampered with because doing so would invalidate the signature
The server never has to store a secret value that grants access to the API resources
You can have hardware backed API keys by generating/storing the private key in an HSM, Yubikey, or other hardware security service.

Drawbacks of Signature-based API keys

It’s harder to implement authentication for both clients and servers
It’s a less common pattern, so developers have less experience with it
The signing operation of a request is likely to take some amount of time longer than just copying the bearer key out of memory into the request
Private keys just look like any other private key – on the one hand, this is good because we know what private keys look like and that they probably shouldn’t be public on your github repo. On the other hand, you’re not likely to be able to get integrated into secret scanning services.

Where Signature-based API keys need work

I think Signature-based API keys should become more of the norm. I’ve been using them for years, since my time at Oracle Cloud, and I have advocated for them in other platforms since. But I think if we really want to see improvement in adoption, we need common http client libraries in all languages to adopt a standard pattern from RFC 9421. I think the good news is that it is actually an RFC now, as of February 2024, whereas before it was just a draft. But work still needs to be done to make this normalized and easy to use in HTTP clients.

Secret Scanning Platforms

One useful thing to think about when implementing your API keys is how to help your users practice good security habits. After all, a leaked API key that results in a bunch of data from your platform making its way into the hands of attackers looks just as bad on you as it does on the user who leaked the API key.

Thankfully, Github and Gitlab both provide mechanisms to submit your own API key formats into their secret scanning capabilities. Github’s Secret Scanning Partnership Program details how to submit your API key format and shows how you can build a verification endpoint that can either automatically revoke keys, or automatically notify the user of the leaked key.

Gitlab also provides an “Automatic response to leaked secrets” capability, which allows you to register your API key format, get notified when a leak happens, and then automatically revoke and/or alert the user of the leaked key.

Additionally, there are many third party services out there that implement secret detection capabilities for their customers. By ensuring that your API key format is uniquely identifiable, you can proactively submit your API key format to their rules to improve security for all of your users. An example of one such secret detection is also available from GitLab via their gitleaks.toml file.