top | item 41206209

(no title)

As someone who has subscribed to Bowker’s Books in Print data for the last four years, I’d take any stats based on their data with a huge grain of salt. Bowker does issue ISBNs (and the BiP data has tens of millions of them), but they do very little validation, with their data largely input by publishers often long after the ISBN has been issued and with varying standards. For example, their attempt to identify overarching “works” (i.e. The Fellowship of the Ring as a literary work vs its various editions and reprintings) across ISBNs is unusably inaccurate, even for mainstream published titles.

Also as the article mentions, ISBNs are issued for all sorts of things most people would not consider a “book”, like journals (the kind you write in, not the academic kind), coloring books, sales displays, maps, bulk lots of books for schools, box sets, reprints of Wikipedia, calendars, etc and these are not always particularly well distinguished in their data because it’s seemingly up to the publisher to categorize it correctly, and some fly-by-night Wikipedia article reseller is just not going to put in accurate data.

Maybe Bowker has data they don’t include in BiP that would make their stats believable…but I kind of doubt it. LoC seems more reliable, but their corpus is (intentionally) much smaller and more focused, and generally the books libraries care about doesn’t 100% overlap with “all things published that most people would consider a book” since that’s not their purpose. OpenLibrary is doing good work in this space, but it’s still kinda early and struggles with data quality. It does ultimately depend how you on how you define a “book”, but for my money I’d say your numbers are low, though you’re spot on that only a very small fraction of those get widely read.

discuss

No comments yet.