top | item 37526328

(no title)

liamkinne | 2 years ago

I once hade the unfortunate experience of building an API for a government org where the data changed once a year or when amendments were made which happens very infrequently.

The whole data set could have been zipped into a <1MB file but instead a “solution architect” go their hands on the requirements. We ended up with a slow API because they wouldn’t let us cache results in case the data had changed just as it was requested. And an overly complex webhook system for notifying subscribers of changes to the data.

A zip file probably was too simple, but not far off what was actually required.

discuss

order

xg15|2 years ago

I think for <1MB of data, with changes once (or twice) a year, the correct API is a static webserver with working ETag/If-Modified-Since support.

If you want to get really fancy, offer an additional webhook which triggers when the file changes - so clients know when to redownload it and don't have to poll once a day.

...or make a script that sends a predefined e-mail to a mailing list when there is a change.

calpaterson|2 years ago

> working ETag/If-Modified-Since support

I completely agree and csvbase already implements this (so does curl btw), try:

    curl --etag-compare stock-exchanges-etag.txt --etag-save stock-exchanges-etag.txt https://csvbase.com/meripaterson/stock-exchanges.parquet -O

justsomehnguy|2 years ago

> ETag/If-Modified-Since

See above. Also you can just publish the version in DNS with a long enough TTL

deeringc|2 years ago

A zip file on a web server that supports etags, that's polled every time access is required. When nothing has changed since last time, you get an empty HTTP 304 response and if it has changed then you simply download the <1MB Zip file again with the updated etag. What am I missing?

tryauuum|2 years ago

Probably nothing

My concern was "what if file is updated while it's mid-download" but Linux would probably keep the old version of the file until the download finishes (== until file is still open by webserver process). Probably. It's better to test

ipaddr|2 years ago

If data changes only once a year or rarely that would imply usage of the api is a rare event for a user of the data so speed isn't a huge concern. Caching would introduce more complexities and the risk of needing to manually revalidate the cache. The solution architect was probably right.

xg15|2 years ago

Why do rare writes imply rare usage? It's possible the file is read often and by different systems even if changes are infrequent.

If the API was used rarely, that would be even more of an argument for a simple implementation and not a complex system involving webhooks.

paulddraper|2 years ago

> Caching would introduce more complexities

Apache/nginx do it just fine...

pests|2 years ago

Can't cache so you need to read it whenever you use the data, not just when it changes.

justsomehnguy|2 years ago

  cat /api/version.txt
  2023.01.01

  ls /api
  version.txt data.zip