top | item 47070737

(no title)

galkk | 11 days ago

I worked a little bit in the area. (it was 10 years ago in the area of book catalogs matching, per different stores/countries/bestseller lists)

ISBN is a an attribute/key, but not primary key, in database terms :)

ISBNs are messy and in real world you’ll see crazy amount of broken/edge cases that shouldn’t happen by the letter of the standard, but happen all the time in reality.

* For example, isbn can be reused by publisher for completely different book.

* 2nd edition, while very different, may have same isbn.

* Reissue of the same book could have different isbn.

* Textbook of same author for 6th and 7th grade could have same isbn.

* As soon as you’ll get in translations all bets are off.

* I already mentioned textbooks. How anbout about college books where each year there was slightly revised edition of same book.

If you ask yourself - wtf? You’re not alone.

—-

In my youth I heard horror stories about people who suddenly found multiple duplicate guids (uuidv1) in their databases because cheap Chinese knockoff network cards were using same MAC addresses. Think that with isbn that could Happen to you any time.

discuss

order

Ekaros|11 days ago

I did some data collection on my cookbooks. Figured out Lidl had used same ISBN for same book. In entirely different languages.

galkk|11 days ago

You feel my pain :)

Honestly, right now I probably wouldn’t even try to code complex algorithm of book matching but fed all of books metadata, including book covers etc to llm and it would do better than what we had.

Our algorithm had tons of special cases coded and in results ui there was a button “needs manual review”, that was launching review workflow (not a joke, business people has special support team in India, because we were matching not only books) for cases when confidence score was low.