top | item 47063663

The Perils of ISBN

160 points| evakhoury | 11 days ago |rygoldstein.com

87 comments

order

amiga386|11 days ago

This reminds me of MusicBrainz, whose database stores "release groups", e.g. the album Nevermind by Nirvana is one, which can have hundreds of "releases", as different media (tape, CD, LP, promo, ...), different countries, later re-issues, etc. [0]

Sometimes these have different catalogue numbers or barcodes to distinguish them, sometimes they don't but they're still different. I've seen releases where the only difference is the label in the centre of the LP, or the back of the CD case has a two-column tracklisting vs a one-column tracklisting. Music publisher uses the same code and says it's identical and yet it's clearly not.

Then there's the "recordings" on an album, which even if they're never re-recorded can still end up chopped up, bleeped or remastered. They're not the same sound. MusicBrainz likes to track when they are exactly the same recording (e.g. the LP recording of a song appearing on a compilation album verbatim) and when they're not (e.g. radio edits of the LP recording). And if we're going beyond recordings by one artist of "their" song, i.e. cover versions, or just plain standards, those are "works", with composers, lyricists, and can be recorded thousands of times by different artists...

I greatly appreciate the pedantry and flexibility for noting down when creative works are the same versus where they differ, in relational database form.

[0] https://musicbrainz.org/release-group/1b022e01-4da6-387b-865...

SamWhited|10 days ago

They actually have a (very new, still alpha, probably not a ton of data yet) database for books:

https://bookbrainz.org/about

I haven't looked into what their schema is like, but if it's anything like Musicbrainz it will be pretty comprehensive and easy to pull the data you want out of!

ggm|10 days ago

I had a dual CD pressing of Bach (double violin concertos plus some other stuff, Zuckerman/Perlman, Colombia passed through a number of subsequent buyouts and re-releases) which simply would not index correctly from the cd-id track stuff.

I wound up making an account, uploading the info, managing the 29 different reasons a neophyte makes a mistake causing their data not to be accepted, and finally got my CD into the system. This included using a random chinese persons web from the 90s who presumably had come to Australia and bought the identical pressing which appears to be a hyper-local market specific variant of the ones which other (European, American) markets got.

I have massive sympathy for the brainz, because as this article on ISBN and my experience shows, people are cavalier about renewing their 'unique identity' info, when they think they don't have to.

makr17|10 days ago

My favorite example of this sort of thing has been In My Tribe by 10000 Maniacs. The UPC/Catalog Number remained the same between the 1987 release and the removal of Peace Train (track 7) in 1989. I have this memory of sifting through the stock at a large used CD store in the mid-90s hoping to find the pre-removal version.

wink|10 days ago

This is kinda topical for me as I just scanned some barcodes off some CDs and my results were: 90-95% detection rate on MusicBrainz, and for the rest it ranged from "yeah, this is clearly the same thing with 10 tracks" to "oh my, there are 7 different regional versions with 10, 11, 12, 13, 13 tracks and I need to pay attention to grab the correct one so the last 3 songs are not wrong" and "this is some 5 EUR sample from an unknown label and really hard to find. Or their docs are not great, I had wished for something like "artist of track 1 = X and artist of track 2 = Y" that probably would have narrowed it down the most.

bombcar|10 days ago

I know that for a book I've published via Kindle Press (the real ones, not digital) that there are at least 3 official revisions, and many many minor ones that as far as I know are only differentiated by the minor typos fixed, and MAYBE one of the numbers buried in the front matter. The ISBN has remained the same.

cedws|10 days ago

I worked on a project recently to organise my music and came across MusicBrainz. I wanted a reliable API to enrich my music with the proper metadata, but unfortunately the majority of my tracks weren’t in their database at all. Maybe the Anna’s Archive Spotify data will help there.

To me it makes the most sense to index music by its fingerprint. Releases, EPs, etc should just be pointers to that.

idoubtit|10 days ago

Wikidata is a FRBR-compatible public database of books. I don't know if it's good enough for the kind of books the author wants, but in recent years the quality of wikidata greatly increased for the books that deal with (about 1000 items).

BTW, they misunderstood their own example of "Hotel Iris" by Yoko Ogawa when they wrote "the same work is duplicated four times." In fact, those four entries in the list point to distinct works.

One of these is a French publication by the publisher Actes Sud. Translations are not the same work as the original. They are derived works.

But it's true this list is a mess. Another entriy has 3 editions, one in English and two in Spanish, so it's obviously an error that mixes two distinct works.

ZeroGravitas|10 days ago

In FBRB translations are generally considered the same work.

In Openlibrary specifically they should be combined as one work. The editions can store the language and the translator info.

The current grouping is probably because semi-automatic (and some manual) merging is easier for titles in the same language.

crazygringo|10 days ago

> Translations are not the same work as the original. They are derived works.

Which adds yet another layer. Because you still want them to be considered as part of a larger single entity. If you're performing a search, you want to find the single main entity, and then have different translations listed the same way you have different editions listed.

jiggawatts|11 days ago

My state had a reading competition that listed books by ISBN, which was a real challenge for students to track down. Each library had different editions and even different cover art, so if you “found” the book you might not recognise it on the shelf, etc…

I worked on the library systems and one of my innovations was to use the ISBN mapping database of WorldCat to find books with identical content but different ISBNs to help kids find the books on the list.

Over ten years that one SQL join in the code made the kids read an extra million books they wouldn’t have otherwise.

My biggest “bang for buck” in my career!

DiggyJohnson|10 days ago

That is amazing. For odd reasons I had to get real familiar with ISBN as well. What did that sql command look like if you don’t mind me asking?

zvr|10 days ago

Just a comment pointing people to https://www.librarything.com/ which I find so much better than goodreads.

Regarding the taxonomy of WEMI (work, expression, manifestation, and item), all of them are useful since we are talking about books at different levels. From "I have read Don Quixote", which is about the work (translations are the same), to "My Don Quixote has coffee stains", which is about the item.

saithir|10 days ago

Sometimes we definitely want 'items' though, so for example I am in a physical bookstore and see a book I might be interested in, so I buy it, to find out later back home that I already have the very same book - and edition - already. It's a scenario that anyone with some amount of books definitely encountered multiple times, I know I did it myself a few times. :)

Ability of an ISBN search of my collection would have helped me in this case - scanning a barcode is easy enough task to accomplish.

And even if I had a different edition, the resulting title from searching for a different edition would be enough to help me figure out that I should not buy a book I already own.

eudamoniac|10 days ago

Genuinely how is this possible? I have nearly a thousand ebooks and I'm certain whether I have or don't have one, because I obtained it deliberately. Are you buying books by the foot or something?

ajohnson1200|10 days ago

I built a personal / hobby site for books a couple years ago that was inspired by pinboard.io, and leaned heavily on ISBNDB (their API), during which I learned a lot about isnbn's and books, at least through the lens of what the ISBN DB API offers:

- searching by title, ie: "The last unicorn" will return books across many years, and many editions, and with lots of different titles, examples:

The Last Unicorn (thorndike Press Large Print Science Fiction Series) THE LAST UNICORN The Last Unicorn (40th Anniversary Edition) The Last Unicorn the Lost Journey The Last Unicorn: The Lost Version The Last Unicorn das Einhorn im Spiegel der Popkultur

and then books that have a similar title but are by completely different authors:

The Last Unicorn: A Search for One of Earth's Rarest Creatures

- there's no way to programatically link an ISBN or ISBN13 to all of the other variants of that book across years or editions ( "First Edition", "1st U. S. printing", "6th Printing", etc..) or bindings ("Hardcover", "Mass Market Paperback", "Library Binding", "Kindle Edition", "Audio Cassette", etc..) or languages ("en", "English", "zh", etc..)

- I wrote some code that would consume the 1000 items in the ISBNDB API search results, and attempt to reduce the list of search results based on the the language, the title, and author(s) using Jaccard similarity, and then sorted by year, and grouped by binding, which mostly worked to be able to see all editions for a book, but it's super messy.

Going to have to see if I can use OpenLibrary instead, looks like a great option.

jdranczewski|10 days ago

If anyone in the comments is in a similar predicament to the author and would like a book logging app, I will say that I disagree on their judgement of StoryGraph - I've found it a pretty decent interface, the search function is very good, and the (anti)features mentioned in the footnote are incredibly easy to not use, as the creators seem to understand that many of their users have a very strong preference to avoid AI bloat.

KPGv2|10 days ago

https://hardcover.app is another choice. It's the one I've been using since right after the second Trump inauguration when I decided to "de-oligarch" as much as possible.

millicentricism|11 days ago

This also fails to take into account that ISBNs also contain the publisher ID in them. So identical copies of a book could have different ISBNs depending on which markets they are sold in.

boznz|11 days ago

I'm not sure this is the case, I got my ISBN range through my government national library service, I could be wrong but when you let them know what the book is you are publishing they ask for the Publisher name, though I am guessing as the service is free and it only applies to New Zealand books and publications.

ilamont|10 days ago

They don't contain the publisher name, but ISBNs are usually purchased in blocks of 10 or 100 or 1000 or whatever by a single entity, which is often a single publisher or corporation.

However, within the block publishers can assign ISBNs to different imprints.

rahimnathwani|11 days ago

I'm not sure we always want 'works'. Sometimes different 'expressions' of the same work are different enough that they don't have the same value.

For example, compare the most recent edition of 'Straight and crooked thinking' with the one published in 1930.

vidarh|11 days ago

I don't know that work, but I agree with you in general because of forewords etc. Or even appendices. And translations by different translators.

I "grew up with" a specific translation of Lord of the Rings into Norwegian, for example. There are two. They are very different. But the editions also differ in whether they include the appendices, whose illustrations are used, and more.

RobotToaster|11 days ago

The most obvious example of this is the innumerable[0] versions of the Christian bible.

[0] Before anyone says it, I'm sure some bible nerd has numbered them, it's hyperbole.

crazygringo|10 days ago

I think the point is, you want a single work when searching.

Then click on the item and drill down into editions sorted by year, or whatever.

But when you're doing search, it's terrible UX to be flooding it with tens of editions mixed in with other things with similar titles.

mmooss|10 days ago

> there’s a distinction between the work (the book The Last Unicorn), the expression (a given edition of the book), a manifestation (a given physical format for an expression, such as paperback or hardcover), and an item (an individual object in a collection)

The author misunderstands 'work', as far as I know: A work is "intellectual or artistic content of a distinct creation. It refers to a very abstract idea of a creation e.g. Shakespeare's Romeo and Juliet and not a specific expression."[0]

In contrast, an "expression" is an "intellectual or artistic realization of a work. The realization may take the form of text, sound, image, object, movement, etc., or any combination of such forms."[0]

The Last Unicorn story is the work, "the book The Last Unicorn" is an expression as would be the film version or the computer game, etc.

[0] https://www.ifla.org/references/best-practice-for-national-b... (as of a few years ago)

kxcrossing|10 days ago

I like the bait-and-switch here. “Let’s make my own app” which almost made me tab out, followed by an interesting dive into the perils of uniqueness in ISBN. I would still say overspecifying is better than under!

culi|4 days ago

I like WikiData for tasks like these that require consolidating multiple different identifiers for the same work. E.g.

https://www.wikidata.org/wiki/Q106545884

Gives us the ISBN, Goodreads work id, LibraryThing id, the OpenLibrary id, and the Google Knowledge Graph id all in one query.

bell-cot|11 days ago

The first few para's of https://en.wikipedia.org/wiki/ISBN are a better summary of the issue.

tl;dr; - The ISBN is intended to be a physical Part Number, within the book business. Where "hardcover, or paperback, or trade paperback, or large print, or revised edition, or ..." very much matters.

NoMoreNicksLeft|10 days ago

>Uh-oh. Why do we have so many distinct versions of The Last Unicorn? Well, each distinct format of a work has its own ISBN (so a hardcover, paperback, and eBook all have different ISBNs),

This isn't even the half of it. On some digital books, I'll find a dozen ISBNs in the front matter. Of course there's the hardback, the clothbound (not always the same as the hardback), the alk. paper variant, paperback, trade paperback, epub, pdf, "Adobe digital", and "master digital e-book" (no idea what that even is myself). And that's all just issued together. If they reprint, it won't get a new ISBN, but if the rights convey to another publisher, that one will get a whole 'nother set again. Some popular titles likely have low hundreds of ISBNs, and keep in mind that these have only been a thing since the late 1960s (9 digit ISBNs, technically just SBNs back then). Then with the now dead paperback trade, you could go through a dozen different covers for the most popular books (King, etc) but they'd all use the same ISBN.

Then, and this one bites me the most... if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf? I've decided that for lack of a better alternative I have to use it, but if the publisher made their own pdf (even just scanning the hardback), then it is supposed to issue a new ISBN to it.

Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides. And it still comes up short. The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.

Finnucane|10 days ago

>if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?

The scanned pdf just doesn't have an ISBN. ISBNs are assigned by publishers to products for inventory management. That's it. If archive.org scans a book, it's not a product that needs inventory control.

user205738|10 days ago

Your question has already been answered, but you considered the option of specifying several ISBNs, a description of the book, a link to the website with this edition, the publisher, and a note with details of the book's format (hardcover, soft cover, etc.)

Personally, I have never had all these indicators match in any book. It also allows you to find a very specific publication using a semantic search, specifying a combination of tags/publisher/formats.

WorldMaker|10 days ago

> if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?

Archive.org would recommend using the OpenLibrary IDs instead of ISBNs. (OpenLibrary is an Archive.org project.)

> The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.

I think it's more the case that number of catalogs is too many. At least with LibraryThing it always seems like somebody has cataloged everything, but we have such a hodgepodge of ID systems and catalog numbers in part because so rarely have all the catalogs been connected or have tried to be connected. It's only a relatively recent library phenomenon that so many small library catalogs can talk to each other on the same protocol, much less coexist in the same broader search tool.

> Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides.

In part because most of my personal catalog is in LibraryThing, I've been impressed with LibraryThing's Works ID as a generally trustworthy unique ID for a book. LibraryThing benefits from an interesting mix of volunteer and professional librarian work (especially the work of a lot of tiny and interesting niche libraries across the world) in deduping and merging editions together into the same Work ID. StoryGraph and OpenLibrary are also doing interesting things in this space, but LibraryThing has the momentum of time (it's as old as GoodReads and not an Amazon side project) and the benefit of extra (nerdy) labor.

I also like the LibraryThing IDs because they are generally short, opaque (which is a weird feature sometimes), and don't look anything like an ISBN because they aren't intended for that. StoryGraph's IDs are GUIDs, which I will forever find ugly in their normal - delimited hexadecimal rendering. Open Library's look like ISBNs for reasons that I don't understand, but I do appreciate that you can use the last letter of the ID to distinguish between an edition ID (ends in M for reasons I don't know why) and a work ID (ends in W), and the OL prefix does help them stand out next to other catalogs' IDs.

I built a voting website for my current favorite book club and I thought I could do everything with just the LibraryThing Works ID but then I keep adding other IDs to the "database" (YAML frontmatter) as time goes on. LibraryThing doesn't have a Covers API because most of their edition covers come from Amazon and Amazon is restrictive on that. If I add the OpenLibrary Edition ID, I can use the OpenLibrary Covers API as Archive.org has very nice terms on that today. (Not the OpenLibrary Works ID, because covers are associated at the Edition level, which does make some sense, but the website UI shows a default cover from a random edition so I'm not sure why the API couldn't return that cover from the Works ID, but it is nice to pick and choose Edition covers anyway and I can't complain too much having a working cover image API from someone.) I started adding StoryGraph IDs because members of the club love StoryGraph right now and also because while StoryGraph doesn't have an Official API yet (it is on the Roadmap), I discovered StoryGraph's CWs section was amenable to easy scraping. I figured since an API for it is on the Roadmap a bit of light scraping (with attribution!) was fair. (My club wanted CW information to help decide on book voting. LibraryThing intentionally doesn't track CWs as too hot button and subjective, but StoryGraph has a rather nice "voting" experience for CWs and before I started to scrape StoryGraph's CWs we were already starting to copy and paste them by hand into the Markdown documents. The scraping provides better attribution and a unified display.)

cestith|10 days ago

I buy a lot of books for an individual. I have a dedicated library room in my home, and that’s not the only place there are bookcases.

I shop by ISBN often because I want specifically a particular edition in a particular cover. So it’s not just title and author. It’s not even title, author, publisher, edition, and cover honestly. Sometimes there’s an Indian subcontinent English printing of a book that’s laid out differently and on different paper from the US/Canada market version.

One small drawback is sometimes I’ll order a book by ISBN, and the bookseller will locate it by ISBN, and it will be a completely different item on a different topic by a different author. Sometimes if a book is a small printing or is a very old title the publisher will recycle the ISBN.

KPGv2|10 days ago

> why isn't there a letterboxd for books

There is. https://hardcover.app

I used Letterboxd a lot before kids. I used Goodreads until the Trump inauguration when I de-Amazon'd myself as much as possible (Amazon owns Goodreads). I switched to Hardcover, which is a much better interface. There are ways to improve, but overall I prefer it over Goodreads.

ncfausti|10 days ago

What would you like to see improved?

ggm|10 days ago

A salutory lesson in field overloading and structured keys. There must be a aphorism for "things you cannot do with a key, if you don't know in advance thats how the key works" list.

galkk|10 days ago

Unfortunately, for isbns even if you know how the key works in theory and should be used by standard, reality will break you very soon. It’s quite loose. At least it was 10 years ago when I worked in the area of book catalogs matching, per different online stores.

joemi|10 days ago

A simple search for books is an interesting problem because some it makes sense to find based on title alone, while it doesn't make sense for other books.

Take To Kill A Mockingbird as an example... No matter what (English) edition of the book you read, you're likely reading the exact same content, even the exact same words, as any other English edition. There might be a different preface near the front or different blurbs on the back cover or a different number of words per page, but the actual story is word-for-word the same. A simple title lookup makes sense here in most cases.

Compare that to something like The Iliad, where the English versions are all translations and can vary greatly from translator to translator. While all telling ultimately the same story, a bad translation doesn't begin to compare to an elegantly beautiful translation, so you almost certainly don't want to treat all editions of The Iliad the same.

Translations aren't the only times that you wouldn't want to treat all editions of a title the same. Some books have undergone abridgments, revisions, or corrections, so the content won't be word-for-word the same between editions, but might or might not be close enough that it's worth considering them the same. Some books have heavily annotated editions, so while not changing the underlying content that all the editions are based on, the reading experience is quite different.

I could go on with differences, but I hope it's clear that there _are_ differences between books and movies when it comes to variations/releases. For books, I think the lookup issue is closer to how it is for board games. Board games, like books, have many editions and translations and often get updated/revised between editions. Sometimes the updates change the gameplay significantly, and other times they don't. Boardgamegeek.com is one of the best (if not _the_ best) catalogs of board games that there is, and it has regular discussions/arguments about whether a new edition of a game is different enough that it deserves its own page or if it should just be relegated to be an easy-to-ignore note in the Versions section of the previous version's page. I think a letterboxd-like lookup for books would have similar regularly-occurring debates, and, like with board games, ultimately have to be fairly hand-curated.

galkk|10 days ago

I worked a little bit in the area. (it was 10 years ago in the area of book catalogs matching, per different stores/countries/bestseller lists)

ISBN is a an attribute/key, but not primary key, in database terms :)

ISBNs are messy and in real world you’ll see crazy amount of broken/edge cases that shouldn’t happen by the letter of the standard, but happen all the time in reality.

* For example, isbn can be reused by publisher for completely different book.

* 2nd edition, while very different, may have same isbn.

* Reissue of the same book could have different isbn.

* Textbook of same author for 6th and 7th grade could have same isbn.

* As soon as you’ll get in translations all bets are off.

* I already mentioned textbooks. How anbout about college books where each year there was slightly revised edition of same book.

If you ask yourself - wtf? You’re not alone.

—-

In my youth I heard horror stories about people who suddenly found multiple duplicate guids (uuidv1) in their databases because cheap Chinese knockoff network cards were using same MAC addresses. Think that with isbn that could Happen to you any time.

Ekaros|10 days ago

I did some data collection on my cookbooks. Figured out Lidl had used same ISBN for same book. In entirely different languages.

wise_blood|10 days ago

TMDB is the best metadata provider for my home media server, they just have everything.

Two great features are: season names and episode groups. The other day there was a thread about Babylon 5, where seasons have names and the watching order is different from the airing order. Perfect application of both

gerdesj|10 days ago

When you delve into real domain specific knowledge, surprises often surface and it turns out that what you might think is a simple thing is actually rather complicated.

I'm mildly surprised at exactly how successful ISBNs are. I worked in a book wholesaler's warehouse 35 odd years ago and the ISBN was used as the product code by the "system". I'd get a series of picking lists for pallets on good old green "staved" fan fold. I'd whizz around the warehouse with my trolley and pick from paper packets of books. The product lines had the rack and bay, last four from the SBN, quantity, title and full SBN. The packets of books had the rack/bay/last four from SBN printed on a label in large and small other details. I got very good at optimising my course around the warehouse and could pick at a right old rate, whilst listening to my mini cassette player. Its pretty boring work so you might as well game it!

Sometimes an individual book might fall off my trolley and be dumped in the big cardboard "skip" for rejects. For some reason casualties around me generally involved subjects like maths, material sciences, geology, surveying, hydrology. Oh and fractals!

I graduated in civil engineering.

Anyway. Surely all of us here know that really getting to grips with defining what it is that you are cataloguing/indexing/numbering/whatever and why can be quite tricky.

Both Dewey and SBNs catalogue "books" but for very different reasons. Both systems are extremely successful. You might think that in our world of LLMs n that, that books, Dewey and SBNs will go the way of the dodo.

Perhaps, but I doubt it.

Right, bugger all this old school nonsense. I've got a C64 (it rocks a SD card interface and a HDMI out (via SCART - must sort that out)) blinking away on my telly in the sittingroom and some mutant camels need a bloody good kicking.

CodesInChaos|11 days ago

I read that it's much worse than that, and there are ISBNs that were reused for completely different books.

rmunn|10 days ago

I've been cataloguing my books using the ISBN to look them up, and I think I ran into that situation a few times, maybe about 0.2% of all the books I catalogued. (That is, the ISBN search on openlibrary.org returned multiple clearly-different books for the ISBN I searched for). I didn't pay much attention to it so I can't tell you which ISBNs were duplicates, but I've definitely seen it happen.

But there is at least one case where it was on purpose. There's a set of reading primers from the UK called the Biff, Chip and Kipper books. We acquired a whole set of them at a garage sale, and when I went to enter them into my catalogue, I discovered that the publisher had assigned just one ISBN to the whole series. Which quite annoyed me when I discovered it. (I ended up just not cataloguing those books, because I didn't want to type the titles, author, copyright date, etc. in by hand for 50+ tiny books).

NoMoreNicksLeft|10 days ago

I've stumbled across 3 or 4 magazines that printed the wrong ISSN in more than one issue. One from the 80s did so in every single issue of it's 20some issue run. It must be true that some books have done so as well, but I don't even check that those are correct.

joemi|10 days ago

In my experience this is very very rare. Rare enough that it's practically negligible.