top | item 27880905

Deurbanising the Web [pdf]

496 points| ColinWright | 4 years ago |lab6.com | reply

413 comments

[+] monkeynotes|4 years ago|reply

* PDFs are self-contained and offlineable

HTML can easily be offline-able. Base64 your images or use SVG, put your CSS in the HTML page, remove all 2-way data interaction, basically reduce HTML to the same performance as PDF and allow it to be downloaded.

* PDFs are files

HTML is files

* PDFs are decentralised

This should be "PDFs can be decentralised". PDFs aren't inherently any more decentralised than any other kind of file, including HTML.

The store is the thing that becomes decentralised, not the content.

* PDFs are page-oriented

HTML can be page-oriented. Simply build your website with pagination. PDFs can also be abused to have hugely long pages. Bad UX can be encapsulated in any medium.

* PDFs used to be large (bla bla bla Javascript weighs a lot)

Nope, PDFs are still objectively larger than the equivalent HTML. PDFs don't have any dynamic interaction, rip all that out and produce the HTML of yesteryear and your HTML will be tiny in comparison to the PDF.

Edit: I'm sorry, the more I think about this the dumber I feel. The web is useful because it's 2-way. I am excited by the web because I can interact with other people. I come to hacker news to engage with thinkers, not to just read a published article from one single author. I want to read ad-hoc opinions and user submitted content. PDF web, really?

[+] LeifCarrotson|4 years ago|reply

When you find a page - inherently a document-oriented term - like an article, blog post, how-to, or project writeup that's interesting or useful, and you want to make sure it's available to you later, what do you do?

Do you save the HTML, CSS, and Javascript, and hope that it works offline? I used to use the "Save page as..." tool back in the early 2000s, but it's become less and less useful, with too many dysfunctional disappointments.

No, I cut out some junk I don't need with the Printliminator [1] bookmarklet, then I do a *print-to-PDF.* This gives me a file. I can save the file, back it up to my NAS, search for it later, keep it with other files from a project where it was useful, and otherwise hang onto it. This is so common, in fact, that it's gone from being an obscure thing you could do with a Postscript-to-PDF converter or (before the adware/Ask toolbar scandal) the installing the CutePDF virtual printer. Modern OSes bundle a PDF printer, and print dialogs understand that you want to "Save as PDF". Google Docs and Office 365 editors allow downloading a document as a PDF.

I totally agree that a dynamic, interactive page or a comment section is not compatible with this model of usage. There's a lot of consumption of endless feeds, and a lot of one-time video views that also don't make sense to save as offline files. However, the web for creators, where people write articles that are worth hanging onto, has a definite place for PDFs.

[1]: http://css-tricks.github.io/The-Printliminator/

[+] camgunz|4 years ago|reply

You got nerd sniped by the HTML vs. PDF format thing and missed the entire point of TA:

> Isn’t it a good thing that we enjoy rapid progress? To the extent that we get to enjoy things like YouTube and sandspiel, yes! But to the extent that we want the internet to be a place where we can work and live and think and communicate free of malware, surveillance, dark patterns and the insidious influence of advertising, the answer is, empirically, sadly, no. The web has become ad-corrupted hand-in-hand with growth in technological capability, and the symbiotic relationship between web and browser means they feed on each others’ churn. Ads demand new sources of novelty to put themselves on, so the web expands continually, the specs grow in complexity, the browsers grow in sophistication, the barrier to entry grows ever higher, the vast cost of it all demands more ad revenue to fund it... and thus the perpetual motion machine is complete.

[+] hyperpape|4 years ago|reply

Saying HTML can be offlineable is like saying C can be provably terminating. There's a subset of programs where that's true, but it's not inherent to the form. A PDF is inherently self-contained, standard web technologies are not. When you open the page and it's a PDF, it gives you certain guarantees, when you open it and it's HTML, you have to have to do further investigation.

[+] tablespoon|4 years ago|reply

>> * PDFs are self-contained and offlineable

> HTML can easily be offline-able. Base64 your images or use SVG, put your CSS in the HTML page, remove all 2-way data interaction, basically reduce HTML to the same performance as PDF and allow it to be downloaded.

You're missing the point. Even a relatively computer-illiterate person can easily save a PDF to my hard drive, and it's significantly more difficult with HTML. At a minimum you're probably going to get an HTML file with a sidecar directory (or I believe a sometimes browser-specific archive, it's been a long time since I tried since it works so poorly), and even that may not have the content you want to due to dynamic sites.

[+] playpause|4 years ago|reply

These all seem like technical quibbles that miss the point.

[+] Frost1x|4 years ago|reply

>PDFs don't have any dynamic interaction...

Just a caveat to that statement, you can literally do interactive and dynamic 3D graphics rendering in PDFs: https://helpx.adobe.com/acrobat/using/enable-3d-content-pdf....

You can also embed JS in PDFs: https://helpx.adobe.com/acrobat/using/applying-actions-scrip...

[+] rexreed|4 years ago|reply

Also - how are PDFs exactly "discoverable"? I have petabytes of PDFs and making them easily "discoverable" for any mass use, such as analytics, search, or data analysis is a massive pain. I'd rather have them in a non-PDF format.

[+] noduerme|4 years ago|reply

Honestly, if you're going to put out a manifesto as a PDF, at least take some time "layouting" your design. The one advantage of that format is that you control the aspect ratio. Every font is permissible, everything is absolutely positioned. Using a generator to create it is cringey. Show the art that's possible. Really sell the format.

FWIW I deliver PDFs daily as an art director; not ideal, but they work in most cases. There's certainly nothing rebellious or non-commercial about them.

[+] chowderman|4 years ago|reply

I built a tool for this exact purpose[0] since the HTML specification and modern browsers have a lot of nice features for creating and reading documents compared to PDF (reflow and responsive page scaling, accessibility, easily sharable, a lot of styling options that are easy to use, ability for the user to easily modify the document or change the style, integration with existing web technologies, etc.). In general I would rather read an HTML document than the PDF document since I like to modify the styling in various ways (dark theme extensions in the browser for example) which may be hard to do with a PDF, but its more of a personal preference. Some people will prefer that the document adjusts to the screen size of the device (many HTML pages), and others will prefer the exact same or similar rendering regardless of the screen size (PDF).

Either way, kind of a fun idea making a website using just PDFs. Not the most practical choice, but fun none-the-less.

[0] https://github.com/chowderman/hyperfiler

[+] supperburg|4 years ago|reply

This reminds me of the guy who said drop box was stupid because he could set up an ftp server. It’s the exact same argument.

People understand PDFs, they are extremely common in the academic and business world as “digital paper” standalone documents. Hypothetically, anything in memory can be made into a file but in this scenario what matters is the practical goal of people actually using these files.

I think it makes sense for the web to be made up of discreet primitives not only so that the web can be browsed in an intuitive and frictionless way but also because it lends itself to being backed up and easily re-hosted.

[+] pajko|4 years ago|reply

This. Also who hates the huge double margins? The slow rendering? The unnatural break-up of text? Meaningless headers and footers? And the whole page-based layout? PDF is not meant for the web. Period.

[+] goodpoint|4 years ago|reply

You seem to miss the point of the post:

----

Call to action

Publish in static file formats

Date and hash your work

Stop spying on your users

----

All this cannot be GUARANTEED by HTML/pdf/epub and requires active cooperation from the author. This is bad.

[+] Koshkin|4 years ago|reply

All true. Incidentally, I do not see pagination as necessary or in most cases even desirable; rather, I see it as a vestige of the printing technology, while the need for printing has shrunk dramatically over the past 20 years.

[+] marcosdumay|4 years ago|reply

> PDFs don't have any dynamic interaction

Oh, you are set for a world of surprises. Nearly every single one bad, but running our current web over PDFs is well within the specs.

[+] majkinetor|4 years ago|reply

PDF

- does not reflow, major suck

- is binary format, another major suck

So no thx, PDF is outdated tech, while HTML and friends are just abused.

[+] gunapologist99|4 years ago|reply

agreed.

and, ancient HTML can still be easily read by modern browsers, so that's not exactly a special attribute of PDF either.

[+] anigbrowl|4 years ago|reply

HTML can easily be offline-able.

Sure - if the publisher cares. From the user's standpoint, the safe assumption is that they don't. Of course PDF is No Good for many contexts, but for any sort of long-form document that is primarily meant to be read, it's so often better.

Also, if something is available in pdf, I can be moderately sure that someone else took the time to make sure it would be formatted correctly and print out OK.* If it only exists in HTML it's more of a roulette wheel experience.

* Unless some graphic designer thought 'gee this report would look so cool if the cover pages were black or some other highly saturated block of solid color.'

[+] baybal2|4 years ago|reply

HTML used to be a very nice format at the age of xhtml 1.1, very formally specified, and a tie with DOM was assured by vert strictly standardised DOM v3. And ACID3 was giving you a pixel for pixel repeatability during rendering.

HTML+JS today... now it's effectively a standard in name only, and Chrome is the new IE6. The standard is now "what has worked in the last stable release"

Now go to http://acid3.acidtests.org/ and see how the latest stable Chrome release can't render a decade old CSS testcase.

[+] ChrisMarshallNY|4 years ago|reply

> Simply build your website with pagination.

My experience is that browsers are terrible with CSS pagination support in their display and printing directly.

The only place it seems to actually work is...saving as a PDF...

[+] grishka|4 years ago|reply

PDFs aren't really meant to be read off a screen, they're much better suited for stuff that's meant to be printed out.

And you can have a single self-contained file with a webpage, it's called a "web archive", with .mhtml extension.

[+] Tomte|4 years ago|reply

> Base64 your images […], put your CSS in the HTML page

Is there a tool that does those two things (or at least the first one) and that can be used by non-programmers (command line use is fine, a Python library would not be)?

[+] 1vuio0pswjnm7|4 years ago|reply

"I come to hacker news to engage with thinkers, not just read a published article from a single author."

And how many websites today are anything like HN, in terms of relative simplicity, e.g., no images^1, 3rd party requests or ads, only a tiny bit of (gratuitous)^2 JS.

1. I do not particpate in the voting scheme but I could vote from the command line if I wanted to. I use a text-only browser so the grey, fading text gimmick is irrelevant. I see all comments and treat them according to the thinking not the voting.

2. If we exclude the .ico and a .gif

There seems to be a double-standard, for lack of a better term, where many HN commenters and voters appear to work for companies that make websites with tracking and ads and various gimmicks targeted at "non-thinkers" which are nothing at all like HN. Whatever these commenters and voters see and appreciate in HN they are not working to bring it to the rest of the web. I seriously doubt they comment and vote on HN out of fear of so-called "power users" or a belief that the HN type of simplicity could become more popular and threaten their jobs that depend on surveillance, online ads and a non-thinking audience of "powerless" users. Rather, a more rational explanation might be that they see some value in a website that shows no ads and generally uses no gimmicks; that's something to think about.

"PDF web" may not make sense to many folks who have invested heavily in JS and Big Tech web browsers, but Postscript is arguably more elegant than Javascript. "Thinkers" usually like FORTH.

https://en.m.wikipedia.org/wiki/Display_PostScript

The tracking section mentions the Abe Vigoda status page.

http://www.abevigoda.com/

[+] kemitche|4 years ago|reply

PDFs are also horrible to view on mobile, as the text doesn't reflow.

[+] novok|4 years ago|reply

Sounds a lot like epub.

[+] stjohnswarts|4 years ago|reply

so because someone chooses to publish their website in an open format that they prefer "it's dumb" because they don't agree with you.

[+] petercooper|4 years ago|reply

In a sea of cynicism, I gotta say.. bravo. This genuinely put a smile on my face. It has a lot of problems, sure, but it's a creative use of the Web and would surely work for some use cases. It's certainly no worse than using Flash ever was.

It reminds me a bit of a "newsletter" I'm subscribed to called, ironically, "Not a Newsletter" (http://notanewsletter.com/). You get an email from the author each month and it just points to a Google Doc where he puts the actual content. Why's this good? The content can't set off any spam filters, he can edit the issue after it's "sent" if there are mistakes or broken links..

[+] sneak|4 years ago|reply

The content can be censored arbitrarily by google, and when you click on mobile web with the docs app installed, it logs your logged in google account identity (maybe for work?) with the view when it switches to the app.

Files have none of these problems.

[+] bmn__|4 years ago|reply

It is too early to displace HTML with PDF.

> PDFs used to be inaccessible

My eyes are not very good. I have trouble reading the font in the PDF. I am using Firefox. HTML lets me pick that a font that I can read easily. I cannot do that with PDF.

> PDFs used to be unreadable on small screens, but now you can reflow them.

I am using Firefox. I cannot do that.

Realistically, how many years will I have to wait until Firefox catches up?

Over twenty years ago, I learnt Web authoring by examining the source which had a profound effect on my career. That serendipitous opportunity I had with human-readable sources will be lost to the next generation with PDF - they have to learn the technology deliberately.

[+] noduerme|4 years ago|reply

I read this entire document. If you've ever had to write a PDF-to-text parser - and God help you, I have - you will beg for Flash to come back as a web standard.

[edit] Generally though, I'm sympathetic with your point and it's kind of like why zines regained popularity in the 90s (and samizdat in the Soviet Union before that)... controlling your own publishing is a powerful idea. Anyone can do that though, without resorting to obscure formats, unless obfuscation is the point.

[+] taftster|4 years ago|reply

  $> cat file.pdf | strings

Done. /s

[+] dredmorbius|4 years ago|reply

The Poppler library's pdftotext is remarkably effective.

[+] millerm|4 years ago|reply

Yeah, 10 second load time, tiny text on a mobile device. No thanks. Sucks that people went for over-styling every site making everything painful to publish. I’d be happy with 90’s static HTML, and a few images when needed. I seek information, not “an experience”.

[+] trhoad|4 years ago|reply

I just ran your PDF through an accessibility checker and it failed magnificently. For this reason alone, suggesting people make more use of PDFs instead of well-formatted HTML is a total non-starter for me (and should be for everyone).

[+] wccrawford|4 years ago|reply

I find it quite amusing that the author is railing against HTML at least in part because it's practically impossible to build a new web browser at this point, and then moves to PDF instead.

In my time working with PDFs, I've found that generating them in ways that can be read with the most popular PDF readers is cryptic and difficult, and even parsing the ones made from the most popular creators is hard.

I would definitely not pick PDF over HTML in regards to how easy it is to implement a good reader or writer.

And there's plenty of authoring tools for HTML already, so the "ecosystem already exists for PDF" doesn't track either.

Even the complaint about churn makes no sense to me, because there's no need to upgrade your tools constantly. If you're using something that produces good HTML today, it'll produce good HTML in a decade, too.

OTOH, if you have a problem that could be automated, you're a lot more likely to be able to create that tool for HTML than PDF, and it's quite likely that someone else already has for HTML, but not PDF.

[+] cochne|4 years ago|reply

As someone who works with PDFs a lot, please don't. PDFs are awful in every case except those which require a very precise visual layout. From reading the article, I do not see a single case in which PDF is superior to vanilla HTML.

[+] duxup|4 years ago|reply

My kids school used to send links to google docs for their announcements, I hated it. I pretty much hate any system like that, it's purely extra steps on the web.

In both email, and the browser I'm already in a program that displays text and images and cool stuff. So then I'm just sent a link to someplace else that does the same thing?

So then what? Is it all just "pdf can do that too", but with extra steps...? I can print to PDF in most browsers if I want, but in this case it isn't a choice.

The idea that I might save and store the school emails or that website and somehow manage those files seems kinda self important in a way ... I don't mean that as a personal attack, just that this idea that they imagine me taking the time to do that with their content? When otherwise it could have just been an accessible web page? How many people care to do that?

If I'm visiting a website I'm almost certainly not interested in saving your content / managing it... almost never.

I'm a little lost on the whole 'page-oriented' idea too. That's just a limitation of paper, and it's a pain / disruptive more often than not. Even the 'page oriented' section is broken up by the page and some extra text at the bottom of the page that is irrelevant to the paragraph...

If folks want a 'save to pdf' option might be nice to add, or the user can just print to pdf...

[+] MichalSternik|4 years ago|reply

Well, what's wrong with static site (generators)?

I certainly get the argument, but using something like hugo or gatsby or jekyll when you want to avoid the "churn" also seems like a perfectly valid solution.

[+] ussrlongbow|4 years ago|reply

Very surprised to see just few comments mentioning EPUB, which is IMO is much more suitable for document-centric approach. An open standard with freely available[1] specification and never had any problems with EPUBs on PC, tablets and phones.

[1] - https://www.w3.org/publishing/epub32/epub-spec.html#sec-intr...

[+] shuntress|4 years ago|reply

Also worth pointing out, EPUBs are (or, at least, can be. I'm not sure how much flexibility is in the specifications) basically just bundled HTML.

[+] DerDangDerDang|4 years ago|reply

There’s a fixed layout version of the ePub standard too, allowing PDF quality if that’s what you’re after.

[+] mark_l_watson|4 years ago|reply

I also enjoyed the sentiment of the article. I used to blog a lot but in the last decade I have preferred more long form writing. Now I use the leanpub.com [1] service so when I write, I get generated PDF/ePub/Kindle formats, and material is readable online as HTML/CSS. For me leanpub is a way to make content free and accessible, but people can pay if they want. The relatively few people who pay for my material have a large effect on what I decide to write about in the future or which writing projects to drop.

I consume the web mostly by following a few very interesting people on social media and following their links. As an author, my goal is to keep producing interesting enough material to be worth people's time reading.

[1] https://leanpub.com/u/markwatson

[+] Ajedi32|4 years ago|reply

This is an awful idea and I love it.

As others have pointed out it's strictly worse than a static HTML site in many, many ways. At the same time though, it's a brilliant criticism of many of the worst aspects of the modern web.

This is art.

[+] aenigma|4 years ago|reply

Great article - so much depth and accuracy to this! I see a lot of discussion about the semantics of pdfs but I think those are missing the overarching theme here.

Feels like this is more about the fact that websites have become increasingly dynamic, unstable, unreliable, inconsistent, etc. - pdfs offer something like a book, static, stable, reliable and consistent.

Think about a book you can turn to a specific page no matter how many times you look at it and the print is the same, the information is the same, you can do the same action over and over again and get the same expected result.

Now imagine opening a book and you could have sworn that the chapter you wanted to reference was 11 but now it's 16 and the images are different, the examples are different, in fact the quote that you wanted to use for reference no longer exists in the book.

There's an insanity to this experience but it's exactly what the web is like - a book that is constantly changing, upended changed - even disappearing entirely. I could have sworn I had bought that book on discrete mathematics - how could it be gone? oh that's right the server managing site is powered off - book no longer even exists.

[+] keiferski|4 years ago|reply

I think if one designed a “crisis-proof” version of the web, it might end up being a network of PDFs. My reasoning being:

- PDFs are universally understood by most people and can be read on phones, desktops, laptops, and eBook readers.

- Once you’ve downloaded a local PDF version of the site, there is no risk that it can be changed or removed by the host.

- File size is predictable ahead of time, which is useful if your connection is limited or slow.

- PDFs are designed for printing (moreso than most sites) which may be useful in situations where electricity is in low supply.

[+] greatgib|4 years ago|reply

What is the summary?

Same as someone else, to read on mobile I have to download and open a pdf so i just cancelled the download and ignored the link

[+] pornel|4 years ago|reply

Maybe the author doesn't realize how difficult PDF is to work with. In PDF it's ambiguous whether any two spans of text belong together in the same sentence or paragraph. It can even be unclear where are spaces between words. PDF also allows "optimizing" font usage that makes text unreadable without OCR-ing the custom font. The messy hacks go on and on:

https://filingdb.com/b/pdf-text-extraction

OTOH it's totally possible to make a self-contained HTML page without using a JS framework of the day. It's going to be way easier to consume than a PDF.

[+] cunthorpe|4 years ago|reply

Please somebody bake an icon into the browser that turns green when websites are lightweight and content-only and make it affect Google rankings.

We don’t need PDF sites, we need incentives for publishing acceptable websites.

Side note: I’d honestly love for the government to step in and outright outlaw some obvious and intentional dark patterns (example: California unsubscribe law)

[+] 101008|4 years ago|reply

I've been doing something similar for 4 years now. I converted my niche website into a monthly magazine, that is released as a PDF (and also uploaded to Issuu).

It has its good sides and bad sides. People will download the PDF every month when there is a new issue, but you don't know if they read it, how much time they spend on it, etc. You won't appear on Google Results as you would do if you posted the articles as HTML, etc.

Based on my experience, I just keep doing it as an experiment and because I enjoy saying I run a digital magazine, but the true is that there is no real advantages on it.

[+] schipplock|4 years ago|reply

The text is too small to read on my phone. I can zoom in, but then I have to scroll horizontally. I’m afraid this website isn’t targetting me.