Show HN: ReLaXed – High-quality PDFs using web technologies

[+] leephillips|8 years ago|reply

The examples lack hyphenation, which partly explains the too-variable interword spacing. Is this because Chrome still fails to support hyphenation, unlike, for example, Firefox?

There are other subtle defects, which make these PDFs pretty good, but not high quality.

Here is a brief discussion of some of the shortcomings of web typography, and why we still need to use TeX if we want the most beautiful and easiest to read results:

https://lwn.net/Articles/662053/

All that aside, this is impressive and should be useful to many people.

[+] drb91|8 years ago|reply

I would say similarly to hyphenation is TeX's ability to place page breaks optimally. I don't believe any web technology can solve this problem at the moment.

Just printing the <p> tag, with its constraints of text layout on all layers (word, line, paragraph, page) already has a lot of details you need to get right to get naturally readable text flow before adding on all the other complexities of html. For instance, if you have a single line creep onto the next page, but you could also just move the entire paragraph to the next page and subtly adjust spacing on the first page, then that is preferable so that each paragraph resides entirely on one page. This is obviously not always possible or desirable, so it turns into a search problem with many variables that can be dynamically altered in the middle of text flowing.

My understanding of modern CSS engines is both that a) CSS itself lacks the natural primitives to even express constraints you'd find in TeX, and also b) the concerns necessary to solve page layout to this degree fall into the type of search problem that browsers tend to try to avoid when rendering.

Of course, there's an argument to be made that if people don't realize it's missing, maybe it wasn't terribly valuable to begin with. I'd imagine for most home uses it's not very useful, but the fact that you can typeset decades old documents at a de-facto professional level, for free, OR with heavily modified engines allowing more modern practices, is really quite amazing. I hope the effort that went into formalizing "readable text" doesn't get lost as people move on from TeX--it'd be great to get some of this capacity in a browser with competing implementations; TeX is a lot to learn for most people, and it's also turing complete, which is IMHO mostly a bad sign for accessibility.

There are also projects which attempt to render HTML to TeX, but they were frankly mostly terrible the last time I looked. I honestly wonder if it's easier these days for javascript to attempt to render the DOM to TeX and just leverage the browser as much as possible, but I'm not familiar enough with the DOM to speculate on how much this is likely to work on unaltered pages. My guess is you only get so much for free before you have to specifically consider that output scenario, just like other types of responsive layout.

[+] zulko|8 years ago|reply

Hyphenation can be tuned via CSS but I have never been happy with it:

https://www.w3schools.com/cssref/css3_pr_word-break.asp

From what I remember LaTeX has better algorithms, both in how to distribute words between lines, and in knowing where in a word it is ok to cut.

[+] kaycebasques|8 years ago|reply

When I saw "... using web technologies" I was curious if it uses Puppeteer. package.json confirms that is indeed the case.

https://github.com/GoogleChrome/puppeteer

(I work for Chrome DevTools team, creators of Puppeteer)

[+] mrskitch|8 years ago|reply

I was wondering the same as it’s a common use-case for the project I run (browserless.io). Seems to be a big demand for sane PDF rendering and generation.

Been pretty interesting seeing webtech handle these kinds of problems

[+] nightmunnas|8 years ago|reply

Happy serendipity!

[+] Ecco|8 years ago|reply

How would that compare to, say, an HTML template + wkhtmltopdf?

Also I feel like the biggest gripe with generating (long) PDFs from HTML are things such as page numbering, orphans and widows, semantically correct word-wrapping, page margins, etc...

Chrome does a decent job but is nowhere close to what LaTeX can do.

[+] zulko|8 years ago|reply

I have gathered some comparisons in this wiki page:

https://github.com/RelaxedJS/ReLaXed/wiki/ReLaXed-vs-other-s...

It is open to contributions, so any thoughts welcome. In a nutshell, all your points are valid. Chrome is one of the best browsers, but still behind LaTeX in some aspects. But which will evolve faster in the future ?

[+] jahewson|8 years ago|reply

CSS paged media supports page numbering, widow and orphan control and page margins.

https://developer.mozilla.org/en-US/docs/Web/CSS/Paged_Media

[+] tingletech|8 years ago|reply

> Also I feel like the biggest gripe with generating (long) PDFs from HTML are things such as page numbering, orphans and widows, semantically correct word-wrapping, page margins, etc...

a blog about this issue: http://www.pagedmedia.org

[+] omnimus|8 years ago|reply

Oh how much i would love to have good way how to generate print quality PDFs. The real problem is not hyphenation but how lines are composed. If you want even lines in type set to block then there is probably only Adobe Indesign and LaTeX anything else uses "single line composer" i dont know the algorythm but Latex and Indesign are only ones which take multiple lines into considiration. Latex is sort of Okay but the algorythm in Indesign is still highly superior. I suspect that is some Adobe secret sauce. Pity because you cant run indesign on server, you have it open and use "extendscript" their version of old ECMAScript 3 :(

[+] kccqzy|8 years ago|reply

Adobe's secret sauce is largely implemented in the microtype package in LaTeX world (character protrusion for optical margin alignment and font expansion for more even interword spacing and less hyphenation). Also the technology didn't originate at Adobe; Adobe purchased the technology from URW who developed the hz-program that was the real pioneer for those micro-typographic adjustments.

[+] lobster_johnson|8 years ago|reply

Have you looked at Prince [1]? It's commercial, but highly regarded.

The coolest project I've seen with it is OMA (Rem Koolhaas' architecture firm), which uses it to print internal, very professional-looking booklets automatically generated from data, text and photos stored in Sanity [2]. (The Sanity team also built the system to make the booklets.)

[1] https://www.princexml.com

[2] https://www.sanity.io/docs/introduction/what-the-headless

[+] rayiner|8 years ago|reply

This thesis has a good discussion of the issue. Pages 15-16 discuss Adobe’s secret sauce (though it is secret). https://www.tug.org/TUGboat/tb21-4/tb69thanh.pdf

[+] che371291|8 years ago|reply

Seems kind of neat. But for my purposes I will still use Markdown to PDF using pandoc etc.

What really upsets me... the typography still looks shit compared to LaTeX... MS Word / LibreOffice can do better. Would rather stick with plaintext again.

[+] kevin_thibedeau|8 years ago|reply

FOP is the only TeX alternative that can get close to it on basic typography in a FOSS implementation. I had a toolchain that ran ReST -> Docbook -> XSL-FOP -> PDF but the hard drive it was on bit the dust and I haven't gotten around to recreating it. Still much more pleasant than wrestling with LaTeX's rigid predetermined layouts. The result was nice and didn't have the crusty PDF LaTeX appearance.

[+] thangalin|8 years ago|reply

How does this improve upon Pandoc?

https://i.imgur.com/tMkMjNV.png

In the image, ConTeXt generates PDFs. The EA box represents HTML documentation exported from Enterprise Architect, but could be any structured document that pandoc can parse. The source repository contains various themes for the final PDF.

Using ConTeXt offers several compelling features, such as: citations, cross-references, and ability to produce EBPUBs.

[+] vorpalhex|8 years ago|reply

Pandoc ultimately either has to move the html through another markup format such as laTeX or uses a plugin that attempts to convert html4 to pdf code.

This uses a full browser rendering engine that supports modern html5/css3/js by ultimately running a headless browser.

I suspect pandoc is still a great approach for a lot of cases. Running a headless browser isn't cheap, especially at scale. If your output is a simple book or an invoice, pandoc is probably the way to go. If you want to pdf websites or dump an html file with charts into a pdf, use this.

[+] baby|8 years ago|reply

I'm not sure I understand the image, but to my knowledge you can't just do any html -> pdf with pandoc.

[+] lahcim8|8 years ago|reply

Would you happen to know the origin of this diagram? I like the font and overall style.

[+] deleterofworlds|8 years ago|reply

This is neat, but perhaps switching the final typesetting engine from chromium's PDF printer to LaTex (via Pandoc maybe) would make it more useful. You'd get more control over things like page numbering and TOCs, plus good justification/microtypography, which is important to most publishers.

[+] nateroling|8 years ago|reply

Related, why doesn't anyone ever mention [Apache FOP](https://xmlgraphics.apache.org/fop/) for this kind of thing? I've had great success with it.

[+] ghrifter|8 years ago|reply

[deleted]

[+] killercup|8 years ago|reply

Waiting to see an example with footnotes and auto references ;)

[+] sebazzz|8 years ago|reply

That will probably be difficult because Chrome just "prints" a PDF. Therefore headers, footers, footnotes, and page numbering is a difficult issue to solve.

[+] zulko|8 years ago|reply

I am thinking about it and there may be a way to do it using Pug mixins (like LaTeX macros).

Also, ReLaXed supports Markdown-it, which in turn has plug-ins for footnotes and citations, for instance. Not sure what you mean by auto-reference, but that should be possible, like in any other HTML page, wouldn't it ?

[+] nmca|8 years ago|reply

This looks nice - as a regular latex user, I'd say it (latex) sits roughly between excruciating agony and the actual worst thing in the world.

So the beginnings of an alternative looks great!

[+] foobaw|8 years ago|reply

How do we pronounce this? Re-LACKED?

[+] fmntf|8 years ago|reply

Probably. People is abusing the X letter (do they know that it is not pronounced anymore as "ki" in modern greek?).

[+] felixfbecker|8 years ago|reply

All I want is a system that gets the basic right and is version-controllable in git (plain text source code). Latex is just ridiculously complex and inconsistent. Even after years of using it, I have to google how to do most things every time. I would prefer a simple PDF generator that uses pug/HTML (which I know by heart) any day.

[+] agussell|8 years ago|reply

https://github.com/bramstein/typeset

This is an implementation of the line breaking algorithm used in TeX in Javascript. It would be nice to add to obtain better typographic results with justified text.

[+] Wehrdo|8 years ago|reply

Looks like the perfect solution to my resume. The latest iteration is in HTML/CSS, because it allowed me to easily get the exact layout I wanted (so painful in LaTeX...), but getting a consistent PDF was a challenge.

[+] Klasiaster|8 years ago|reply

I produce all my PDFs with pandoc's markdown and in-line html: letters, slides and papers with citations. Depending on whether I need mathjax I use wkhtmltopdf or chromium (JS-based hypens with Hyphenopoly) or just http://weasyprint.org/ if no JS is involved.

This pug language seems to be a good alternative to intermixed markdown+html.

[+] buildbuildbuild|8 years ago|reply

I find Markdown most natural for writing because I do not have to worry about formatting or syntax.

Currently I deliver ~2 PDF reports per week using Ulysses or MacDown for content creation (distraction-free writing), and then typesetting everything into InDesign.

Thank you for creating this tool, I will try it next week.

The ability to render Markdown to Pug as an "Import Markdown" feature would be key for many people to adopt this.

[+] zulko|8 years ago|reply

Inline markdown and external markdown files are both supported. Have a look at the "Book" example. Every chapter is in its own Markdown file. Most of the other examples have parts where I simply switch to markdown.

I am also a big markdown user and I have found that for writing reports all day long markdown clearly wins over Pug, in particular with tools like

https://atom.io/packages/markdown-preview-enhanced

But the day where you need to produce a super-nice report with a bit of custom layout, Pug/SCSS is awesome.

[+] deedubaya|8 years ago|reply

Really beautiful stuff!

I'm in the process of launching BreezyPDF.com which can generate equally as wonderful PDFs from the HTML/JS/CSS you're already using.

Here's a demo of turning a complex dashboard into a PDF: https://ruby.demo.breezypdf.com

[+] williamscales|8 years ago|reply

How does this compare with Prince?

[+] zulko|8 years ago|reply

Prince is a 3800$ software. Prince seems to encourage XML/HTML/CSS for writing documents, and I didn't like this. With ReLaXed I am trying to show that Pug/SCSS makes document writing much more natural.

Where Prince wins is in its support for CSS @page extensions (having pages with different margins etc.), it looks much more adapted to professional publishing. There are certainly many more advantages related to typography but I don't know them.

Link to Prince:

https://www.princexml.com

[+] SigmundA|8 years ago|reply

Looks like it uses headless chrome for HTML to PDF conversion so its not going to support advanced print CSS like Prince.

Real issue is Prince is the only browser that supports full print CSS, none of the major browsers seem to care about better print output anymore.

[+] evilduck|8 years ago|reply

For one, everything here appears to be free and open source. Prince is pretty costly to run as a small endeavor and last I knew their pricing model wasn't very kind to horizontal scaling.

[+] unknown|8 years ago|reply

[deleted]

[+] jakear|8 years ago|reply

Not sure if this is related to the format of the PDF somehow, but my computer completely froze when trying to open the Alice pdf in the GitHub viewer. This is on Safari, Chrome was fine.

[+] jakear|8 years ago|reply

Upon further inspection, the GitHub renderer works fine on PDF's much larger [1], and the native Safari PDF viewer opens these PDF's fine. I suspect there is something the GitHub renderer, your pdf generator, and Safari's js engine disagree on.

[1]: https://github.com/mynane/PDF/blob/master/Docker%20——%20从入门到...

79 comments