Application Programming Interfaces¶
You will hear the term “API” in many different contexts when it comes to computer programming. Already in this class, we have discussed Python’s API for the file system.
Often, however, when someone mentions an API, they are referring to a web-based API that is usually accessed over HTTP(S). You might have heard about the kerfuffle when Twitter shut down much of the access to its API, or when Reddit did the same thing a few years earlier. These APIs are servers that provide interfaces (the “I” in “API”) to a platform’s data.
As you probably noticed while reading Walker (2019), it is not exactly uncommon for references to APIs to become out of date.
Luckily, we can still use the API provided by the Digital Public Library of America for our work for this class.
We’ll be working with the Python Requests library, which provides its own easy-to-use API for making HTTP requests. In other words, it’s APIs all the way down.
Getting an access token¶
Generally, APIs will ask that you first obtain a key to use them. Even if APIs offer unlimited requests, it is important for them to require users to supply an API key so that they can track (often anonymized) usage statistics, errors, and so on.
Sometimes, APIs require you to pay, either immediately or after making a certain number of requests. Keys can be used to track usage for payment calculations, too. For an example of this system, see OpenAI’s pricing page.
An API Key for DPLA¶
For this tutorial, we’ll work with the Digital Public Library of America’s (DPLA) API. Take a few minutes to read through their API Basics, then request an API key.
The DPLA documentation instructs you to submit a request using curl
, but we don’t have access to curl
from this notebook. Instead, let’s make the request using the Python “Requests” library.
%pip install requests
import requests
my_email = "YOUR EMAIL HERE"
requests.post(f"https://api.dp.la/v2/api_key/{my_email}")
After running the above code cell, you should receive an email with your API code. It’s good practice not to share these codes or include them in version control (i.e., git).
Instead, create an account-specific secret by following the instructions provided by GitHub.
Let’s call the secret DPLA_API_KEY
. (It’s conventional to use all caps for environment variables and secrets.)
Make sure to give your fork of this repository access to the secret, and then restart this codespace. We’ll be here when you get back.
Making your first request¶
As we saw above, making requests using the requests
library is pretty straightforward — for a GET
request, we can just pass a URL to requests.get()
.
In order for the request to be successful, though, we’ll need to include the API key in the api_key
querystring parameter. And to do that, we’ll need to use the os
library in Python.
import os
import requests
DPLA_API_KEY = os.getenv("DPLA_API_KEY")
Let’s use the example provided by the DPLA documentation, querying for the term “weasels”.
requests.get(f"https://api.dp.la/v2/items?q=weasels&api_key={DPLA_API_KEY}")
<Response [200]>
means that our request was successful, but it doesn’t give us a whole lot of information. This is because we have not read the response body. To do so, let’s assign the response — which is the return value of requests.get()
— to a variable and read it as JSON.
response = requests.get(f"https://api.dp.la/v2/items?q=weasels&api_key={DPLA_API_KEY}")
response.json()
Reading responses¶
As you can see above, the response returns a JSON (JavaScript Object Notation) object with a few top-level keys. If you’re thinking, “Hm, this JSON looks an awful lot like a Python dictionary,” you’re absolutely right. While the semantics of Python dictionaries and JSON are different, in this case, the requests
library has already coerced the raw JSON to a Python dictionary for us. You can access its values like you would with any Python dict:
parsed_response = response.json()
parsed_response['count']
Constructing queries¶
Naturally, when you’re working with an API, you’ll want to be able to construct your own queries. Above, we hard-coded the value weasels
under the querystring parameter q
. But you can use Python’s string interpolation to set any value you want. For example
my_query = "foxes"
my_url = f"https://api.dp.la/v2/items?q={my_query}&api_key={DPLA_API_KEY}"
response = requests.get(my_url)
parsed_response = response.json()
parsed_response
You could even write a function that puts constructs the request URL and returns the parsed response so that you don’t have to do these things manually over and over again.
def make_dpla_request(query: str):
url = f"https://api.dp.la/v2/items?q={query}&api_key={DPLA_API_KEY}"
response = requests.get(url)
return response.json()
There’s a problem with this code, however. What happens if you try to make a request with a query that contains spaces, such as "red foxes"
?
Can you find the appropriate workaround using the documentation? https://
What other features does this API support?
RESTful APIs¶
Many APIs, including the DPLA’s, are built on RESTful principles. REST stands for Representational State Transfer. In terms of web APIs, REST means that a given server will respond with a representation of the data that it has available, and that representation will contain additional information for manipulating the data or requesting further data.
Although it is not, strictly speaking, a requirement of REST APIs, many REST implementations use a predictable URL scheme.
For example, you might find a list of “collections” at the /collections
endpoint. To request a specific collection, you would append its ID — e.g., for Collection 3, /collections/3
.
Each collection might contain items, so to get a list of items in Collection 3 you could send a request to /collections/3/items
. And then to get a specific item in that collection — you guessed it, /collections/3/items/12
.
DPLA does not implement this kind of schema, and instead relies on facets and other search parameters. But it is worth being aware of such schemes if you want to use other APIs in your work and research.
Readings¶
- Walker (2019)
- Matthes (2023, chs. 15–17)
Homework¶
Design and test an experiment using the data from a publicly available API, such as the Digital Public Library of America or Chronicling America — you can also use another data source, just run it by me first.
In your report, be sure to discuss your research question, hypothesis, methods, results, and conclusion — in other words, walk the reader through the full scientific process.
These experiments need not be large — think of a small, answerable question that you could tackle in the space of 4 hours of work (i.e., the amount of outside work generally expected for each lab).
- Walker. (2019, May 13). Getting Data for Digital Humanities with APIs: A Gentle Introduction – Digital Humanities @ Pratt School of Information. https://studentwork.prattsi.org/dh/2019/05/13/getting-data-for-digital-humanities-with-apis/
- Matthes, E. (2023). Python Crash Course, 3rd Edition (3rd ed.). No Starch Press.