top | item 28140621

The British Library puts 1M newspaper pages online for free

324 points| aries1980 | 4 years ago |ianvisits.co.uk

53 comments

order

open-source-ux|4 years ago

Interesting facts about the British Library:

- It requires every physical book published in the UK to be collected by the Library (since 1662)

- It has 60 million individual newspaper editions

- In 1999, the Library earmarked 60,000 volumes of non-British newspapers for disposal because it was running out of storage space (inviting criticism)

- The newspapers were offered to overseas museums, or put up for auction. But the short notice given to museums meant many were unable to accept them (they also needed time to free up physical space)

- The American writer Nicholson Baker used his own retirement money to purchase "2000 bound volumes of American newspapers - the last remaining copies in the world - including a complete run of the Chicago Tribune from 1888 to 1958 and hundreds of editions of Joseph Pulitzer's ground-breaking colour broadsheet of the 1890s, the New York World." [1]

- The physical copies of the American newspapers were saved and become part of the American Newspaper Repository [2] a non-profit organisation which Baker founded. In 2004, the collection moved to Duke University.

- Baker went on to publish a book of the whole affair in 2001 called Double Fold: Libraries and the Assault on Paper. The Guardian published an interview with him in 2002 (below)

[1] Paper Chase: https://www.theguardian.com/education/2002/mar/22/museums.re...

[2] From an archived copy of the American Newspaper Repository website: "Research libraries everywhere, including the Library of Congress, the New York Public Library, and the Center for Research Libraries, have replaced most of their often richly illustrated sets of late 19th and 20th century newspapers with black and white microfilm."

tokai|4 years ago

> It requires every physical book published in the UK to be collected by the Library

Legal deposit is more the rule than exception for national libraries. Many national libraries are also saving copies of the national relevant web.

tosser0001|4 years ago

This is great. Historical newspapers are one of the largest corpora of information that has yet to be adequately brought on line.

In the U.S. the Library of Congress has digitized a fair number, but at the state and local level it's really hit or miss. Some states such as California and New York have put quite a bit on line, but many others rely on individual towns and historical societies.

Different pay services cover various papers, but there has never been concerted effort to digitize the staggering amount of microfilm that is out there.

jf|4 years ago

> Historical newspapers are one of the largest corpora of information that has yet to be adequately brought on line.

Not just information, but works of art as well!

A few years ago, I trained an image classifier to help me find Krazy Kat comics in newspaper archives. In the process of doing that, I came across a shocking amount of other comics and artwork. I was honestly surprised to see how many amazing illustrations and comics are just sitting in newspaper archives, waiting to be rediscovered.

reaperducer|4 years ago

I wonder what ever happened to all the newspapers that were fed into services like CompuServe in the 80's.

I dated a newspaper reporter during that era, and all of her stuff went into the online services. But her newspaper's current online archive only goes back to about 2005, even for subscribers.

jfax|4 years ago

I just came back from - actually, physically - visiting the British Library last week to do some research in the newsroom with microfilm and all.

Title of this piqued my interest, but looking at the details, but the selection of papers they're adding seems kinda meh. Mostly the sort of local papers that British Newspaper Archive always had, and still can't compete with the horrendously proprietary Gale and ProQuest archives, which have national papers (Guardian, Observer, etc) and require physically turning up to the library to use.

I used to have a pay-as-you-go subscription to BNA, to spite the monthly pay option that I figure I wouldn't make the most of, but it quite scandalously "expired".

wombatmobile|4 years ago

Trove does this for Australian newspapers

https://trove.nla.gov.au/

aaron695|4 years ago

And because the newspapers were syndicated they also have the big stories around the world.

Like the Carrington Event & Krakatoa, or Alexander Graham Bell's firewall to stop people stealing electricity, or pirate attacks on junks off Hong Kong.

laacz|4 years ago

Latvia has it's own digital library of periodicals. It was digitized in few projects, and was made available to general reader (via periodika.lv) for free, except last 60 or so years, which were subject to copyright. However, when pandemic came, digital library became available for everyone without any copyright deductions.

jack_riminton|4 years ago

I wish more institutions 'tested the waters' with these copyright laws.

Put a newspaper up from 100 years ago and see if anyone complains, if they do; take it down. Subtract 10 years every year until someone does complain.

Angostura|4 years ago

I think you underestimate how assiduous copyright lawyers are. The automated systems would spot the post and prompt a complaint within minutes.

blackcat201|4 years ago

Do anyone know any existing effort on converting these scanned image to text corpus ( probably a new OCR model needed to be developed on these old text ) ? I think it would be more usable if they are in text form in terms of search and research purpose.

bnj|4 years ago

Well when Apple releases the next OS it will automatically OCR all images, so one possibility is just downloading them all on an Apple device.

AaronNewcomer|4 years ago

It already is. I do text searches all the time here and have paid for a subscription for awhile now.

cube00|4 years ago

> The British Library keeps to a ‘safe date’ when determining when a newspaper can be considered to be entirely out-of-copyright, which is 140 years after the date of publication.

It's depressing that copyright has been extended so far it's now longer then any single individual's possible lifetime.

dalbasal|4 years ago

This needs to change, big time.

There is almost no cash value to an article one day later, yet we completely impoverish the public domain for its sake. Not only are creators of valuable works is usually pretty distant from direct ownership anyway, there's no possible way for them to profit directly from this work. The only way a spotify-like deal works is because copyright ownership is conglomerated.

IMO, public (especially free-as-in-beer) access to newspaper archives could be pretty liberally justified on fair use grounds.

Ovah|4 years ago

The National library of Sweden has a similar website but for newspapers printed in Sweden. They write "Copyright protection is valid for 115 years on the day. The free material is moved forward by one day every day." I get that likes such as Disney has an interest in extending it. Maybe a middle ground could be the ability to apply for extensions of individual works instead of a universal blanket extension? This long period really hampers my ability to do research on 1700s literature - a lot of such research was done in the early 1900s. But that content, even if it's 100 years old, just isn't indexed anywhere (e.g. Google books) due to copyright even if it is already digitized. https://tidningar.kb.se

Zenst|4 years ago

Wasn't it Disney and other media companies that pushed copyright duration towards this 140 year value? I vaguely recall that as a driving factor though maybe wrong and one of those unproven theories or meme'd news of times past.

Be ironic (A children focused company like Disney pushing thru a change in law that actually in the end harms the children as it limits their access for their lifetime into copyright servitude that shows little thought for the children) if was as would sure add a whole new spin to the "think of the children" sound-bite often used to push thru some change in law/rules.

Still, somewhat sad. More so as we build building today too a standard that is not that long-standing in years by design.

iechoz6H|4 years ago

It's even worse than that, from the failed registration page:

    View 3 pages FREE when you register to help you get started
    Explore hundreds of national, regional and local titles dating from the 1700s-2000s
    Search, save and organise your favourite topics

5faulker|4 years ago

It's almost as if they're slowly declassifying documents over time.

chris_wot|4 years ago

The irony is that they gave money to the digitisation of newspapers, but this outfit then charged money fir it abs cut out the general public.

Hopefully this will be like Australia’s Trove.

qxxx|4 years ago

only 3 pages are free. To be able to view more you need to pay a monthly subscription.

dkd903|4 years ago

[deleted]