top | item 18404337

(no title)

mikeday | 7 years ago

We use Mercury at YesLogic to write Prince, our HTML to PDF formatter! [1]

We chose it because logic/functional languages are great for tree processing, Mercury was designed for large projects, and because in 2002 there really weren't many other options around.

Its syntax and semantics are derived from Prolog, it borrows a lot from Haskell (types, type classes), in spirit it's reminiscent of OCaml (niche, little weird) and with support for unique modes there is some interesting overlap with Rust, although this aspect of the language still needs more compiler support.

All in all, definitely worth checking out.

[1] https://www.princexml.com/

discuss

mpweiher|7 years ago

I've seen a bunch of these (HTML -> PDF). I've never seen a succinct answer to: "How is this different/better than taking <random web browser> and hitting "print", which at least on OS X will produce a nice PDF?"

bjz_|7 years ago

Prince is pretty powerful when it comes to print-specific stuff. We care about pagination, making tables look good across page breaks, footnotes, great justification, table of contents, non-sRGB color space handling, crop marks, etc. Also having great accessibility annotations (often mandatory for government documents). These are things that web browsers are less concerned with - print-to-PDF is more of an afterthought, where as for us it's our main area of focus.

coldtea|7 years ago

That's not even close to what you get with a good HTML -> PDF export, which can include anything from proper pagination, page margins and TOCs, to orphans handling and other such concerns.

noir_lord|7 years ago

The Synfony project use princexml to generate their documentation (including The Book) and it's phenomenally good.

https://symfony.com/doc/current/index.html#gsc.tab=0

Select offline, The book, 4.2 and it generates the book on the fly.

lillesvin|7 years ago

I'm not the guy you asked but I've been using PrinceXML to produce PDFs intended for customers of our client (e.g. invoices, terms and conditions, itineraries, etc.). Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

Full disclosure: If I'd had my way we would have used LaTeX templates to produce the PDFs but the previous developers had already implemented the HTML->PDF flow, so we just replaced the old, defunct service with Prince, which did a surprisingly good job, IMO.

justbaker|7 years ago

> "How is this different/better than taking <random web browser> and hitting "print"

If you have to repeat this process 2000 times, it becomes time consuming. It doesn't scale for a single user needing 2000 pdfs to do the process manually.

fermigier|7 years ago

Prince is cool, I've used it 10 years ago or something. No fuss about that.

It's a bit pricey, though (at leats, pricer than "free"). So we're using WeasyPrint on more recent projects.

WeasyPrint is open source and written in Python. It's much slower than Prince, though, but this can be mitigated by caching renderings. I'm would bet that it's as standard-compliant or bug-free than Prince, but it's good enough for us.

When / if our customers ask for more speed or pixel-perfect support (with the $$$ to match), we will definitively try Prince again.

marmaduke|7 years ago

The Java class caught my eye. Is that a wrapper around a native lib or you make RPC calls to something?

HTML to PDF is something I never thought about since Firefox does it (and results usually aren’t great).

mikeday|7 years ago

It's a wrapper around the native process just to simplify passing command-line arguments. (There is also a persistent process mode for speeding up batch processing of many small documents).

The browsers don't specialise in PDF generation, and we do :)