We have an API
It’s always nice when you find some valuable database is available online via an API.
An API means you don’t have to write grubby code to screen-scrape their web site, and you can get all the data you need.
For example, recently I was messing around with book information and found that the New York Times has a best sellers API you can use to access their bestseller lists going back several years. Cool, right?
Of course like many online APIs they rate-limit you to 5,000 queries per day. So if you want, say, all the bestseller data from the last ten years, you’d need about two days to grab all of that (assuming there are ~4 updates to the list per month, and you want to grab all 15 of their bestseller lists).
So here’s my proposal. Instead of making us run slow crawlers on your APIs to access historical data, just provide a sqlite database we can download. It’s easier for everyone.
The API is still useful, of course, for up-to-the-moment data.
Now, commercial websites like the New York Times might want to use the inconvenience of an online API as a way to limit access to their data or to enforce some terms and conditions. But I think in practice it just means that people have to run their crawlers a little longer. And maybe they implemented an API because they thought it’s what people wanted.
In the case of government APIs, this is especially important. All government “open data” web sites should be providing downloadable data sets. If they’re too big, chunk them.
So, keep calling for online APIs. But ask for downloadable datasets too.