top | item 39137755

I don't always use LaTeX, but when I do, I compile to HTML (2013)

204 points| pyjamafish | 2 years ago |peterkrautzberger.org

148 comments

order

DominikPeters|2 years ago

Another LaTeX-to-HTML tool is lwarp (https://github.com/bdtc/lwarp) which starts from the idea that there only exists one program that can parse LaTeX: the LaTeX compiler itself. Implementing a new parser is almost futile. So instead, the lwarp package redefines all the macros to output HTML. Something like \renewcommand[1]{\textbf}{<b>#1</b>} This way, compiling LaTeX gives you a PDF whose text is HTML code, so now you can extract the plain text from it and you have an HTML file. The advantage is that it can easily deal with custom macros etc., because these are natively resolved by the LaTeX compiler.

I use lwarp to make https://tikz.dev/, an HTML version of the TikZ manual, which is probably one of the most complicated LaTeX documents in existence.

magnio|2 years ago

You are the author of tikz.dev? I have always thought it was made by the tikz author. Mad props to you, the site is very functional and helpful to me. With it, using tikz feels a bit less like a chore.

acidburnNSA|2 years ago

Sphinx and reStructuredText are, IMHO, underrated power houses of document building. With extensions, you can hook them up to Zotero (or whatever)-managed bibtex files. You can render to beautiful HTML files, and you get latex PDFs and epubs for free. First class latex-math support, plenty of integrations with things like mermaid, graphviz, and the ability to build super-powerful custom directives to do basically anything. And way simpler/easier than pure LaTeX.

Heck you can even integrate a full-on requirements management system in them using sphinx-needs https://sphinx-needs.readthedocs.io/en/latest/

ReleaseCandidat|2 years ago

It is too complex compared to Markdown and hasn't got enough features to be comparable to Latex. And I still (almost) use the samé Latex templates that I used at university, 25 years ago.

mr_mitm|2 years ago

One of the selling points of PDF is that it is a single self-contained file. I found this lacking in Sphinx and wrote an extension for it to zip and bundle the assets into a single HTML file: https://github.com/AdrianVollmer/Zundler

Also works with HTML documents produced in other ways.

wodenokoto|2 years ago

I write a fair amount of reports professionally and I use word.

Getting data from my Python analysis into the reports are tedious at best and updating numbers last minute is hair pulling frustrating.

But because of the good wysiwyg I can cheat on my adjustments when I need a graph to go “just there”, I can edit my paragraph wording such that I don’t get a almost completely blank page in between sections, etc, etc which is important to make a good looking report, imho.

How do you go about that with rst? I’d love to write a templates rst file that can be fed from my excel sheets and Python scripts, but how do I go about final layout adjustments?

anta40|2 years ago

I guess latex is still unbeatable for writing complex math expressions. These days, when I don't need that, I'm happy with AsciiDoc.

DrSantow|2 years ago

I agree! I've been also using this as a personal website (for academia). This works like a charm. It's easy to render any equation, and it's fast (because not bloated).

fireflash38|2 years ago

Sphinx/rst are a nice middle ground between the simplicity of markdown and complexity of LaTeX. I used it to generate a lot of html docs for test reports. I did try pdf gen using via LaTeX and pdflatex for a bit, but stopped after the pdf was breaking the multiple thousands of pages.

And it's really tweakable, especially with html output where you can provide your own templates, or add in your own CSS/scripts even manual tags.

mgaunard|2 years ago

I forced myself to use it recently, I mostly found it to be both limited (cannot have part of a link in bold or italics) and inconvenient (each line of inline code must be indented).

zilti|2 years ago

I simply settled for Texinfo. It has great features exactly for tech documentation.

riperoni|2 years ago

This article really doesn't get what LaTeX does. Of course it is overkill to have 5 lines of text rendered with LaTeX into a PDF. But the point of LaTeX is exactly to set the typesetting of an output document in stone. PDF is meant to do that and HTML cannot do that. A PDF conserves everything and that is precisely the point to have a set layout for printing or displaying on different devices.

Yes, there should be easy ways to display math on the web. No, this doesn't mean that LaTeX is obsolete.

Besides, what about references, both external and internal? Probably needs more "modern" tooling.

geon|2 years ago

> to have a set layout for printing or displaying on different devices.

That’s a horrible way to go about it. Already in the 90s it was clear that varying display sizes was a problem, and it has gotten orders of magnitudes worse since then.

The concept of a single set layout that is suitable for everyone is utterly absurd.

pyjamafish|2 years ago

So, I originally posted this last year. When I posted it, I was using tectonic as my LaTeX compiler, and since it didn't support HTML output yet, I didn't actually try the article's suggestion.

Today, when I saw that I got an invitation to repost this article from the mods, I thought I'd take the time to try it out.

The two commands that the article suggests can be combined into one:

    latexmlpost --dest=mydoc.html --format=html5 <(latexml mydoc.tex)

I did a comparison[1] of pdflatex and latexml using some old assignments, and it looks like compiling to HTML isn't fully there yet: the spacing was off in some places, and manual line breaks didn't work. But, I remain hopeful. If this gets polished, viewing LaTeX documents on phones would be much nicer.

[1]: https://imgur.com/a/yyyXWL8

marknazzaro|2 years ago

There's some good news... arXiv just adopted LaTeXML for in-house HTML conversions of its papers. They allow users to submit bug reports and have collected over 700 so far.

LaTeXML is maintained by a team at NIST, and they are actively responding to the bug reports on github issues.

The LaTeX team headed by Frank Mittelbach is also working to add more structural information to the output of LaTeX, which will make compiling to HTML much easier.

thewakalix|2 years ago

What's the advantage of that subshell redirection over a simple pipe?

PrimeMcFly|2 years ago

> Today, when I saw that I got an invitation to repost this article from the mods

The mods personally invited you to repost a year later?

mbid|2 years ago

For me, the main problem with most tools that render to HTML was that they don't support all math typesetting libraries that latex supports. I used to work with category theory, where it's common to use the tikz-cd library to typeset commutative diagrams. tikz-cd is based on tikz, which is usually not supported for HTML output.

But apart from math typesetting, my latex documents were usually very simple: They just used sections, paragraphs, some theorem environments and references to those, perhaps similar to what the stack project uses [3]. Simple latex such as this corresponds relatively directly to HTML (except for the math formulas of course). But many latex to html tools try to implement a full tex engine, which I believe means that they lower the high-level constructs to something more low level (or that's at least my understanding). This results in very complicated HTML documents from even simple latex input documents.

So what would've been needed for me was a tool that can (1) render all math that pdflatex can render, but that apart from math only needs to (2) support a very limited set of other latex features. In a hacky way, (1) can be accomplished by simply using pdflatex to render each formula of a latex document in isolation to a separate pdf, then converting this pdf to svg, and then incuding this svg in the output HTML in the appropriate position. And (2) is simply a matter of parsing this limited subset of latex. I've prototyped a tool like that here [1]. An example output can be found here [2].

Of course, SVGs are not exactly great for accessibility. But my understanding is that many blind mathematicians are very good at reading latex source code, so perhaps an SVG with alt text set to the latex source for that image is already pretty good.

[1] https://github.com/mbid/latex-to-html

[2] https://www.mbid.me/lcc-model/

[3] https://stacks.math.columbia.edu/

ykonstant|2 years ago

Tangentially, for me the stacks project is the gold standard of mathematical typography on the web. Look at this beauty: https://stacks.math.columbia.edu/tag/074J

Also check the diagrams: https://stacks.math.columbia.edu/tag/001U

If anyone can explain to me, a complete noob regarding html, how they achieve this result with html, css and whichever latex engine they use, I would be grateful. I want to make a personal webpage in this style.

datadeft|2 years ago

Have you seen typst? I have moved over from LaTex to Typst and most if not all your use cases are covered.

https://typst.app/

bmacho|2 years ago

I feel ambivalent to LaTeX.

I don't like the language, the ecosystem is too big, complicated and breaks, but the end result is hard to do any other way.

This applies both the equations part, and the text reflow part (I think them as separate things, but they usually go together).

It should be possible to write text in HTML or markdown, and write the equations in latex or asciimath, and turn it into a beautiful/article style pdf, but sadly it is not.

Although CSS (colored and rounded boxes and such) + MathJax-SVG also can look nice.

ants_everywhere|2 years ago

Document formatting seems like one of those problems where 80% or so of the problem space is simple and the remaining 20% is an unfathomable pit of nightmares.

There are so many different ways people could want characters printed on a sheet of virtual paper that the problem is virtually unconstrained in its difficulty.

TeX was a major theoretical advance, and LaTeX is a nice enough UI layer on TeX that has gotten significant traction. But even outside of TeX, it feels like even software like MS Word are impossibly complex and clunky.

You can make something nicer by dramatically simplifying or cutting the feature set. I think that's probably how Google Docs has a pretty simple interface. But I'm not convinced there's a real replacement for the incumbents that simply tries to improve UI without having a deep technical insight about document layout the way Knuth had with TeX.

da_chicken|2 years ago

Every time I encounter LaTeX, I think of something I heard: "You shouldn't need a build environment for a word processor." I can't get away from that sentiment. Almost nobody I've seen using LaTeX has actually been using it for typesetting. Usually they're using a typesetter for word processing.

Sometimes it feels like they're only using LaTeX because they "learned it in college." You ever notice that? So many people in LaTeX threads say they learned it in college, or they've been using the same setting since college, or whatever. People learn LaTeX to make college papers look nice, and then they never need to configure it again? Isn't that strange?

The worst part, though, is that people complain if you call it latex. Which I think says quite a lot about it's userbase.

loxdalen|2 years ago

I believe I have used pandoc to convert markdown to PDF. Maybe this is something you could try?

bowsamic|2 years ago

Using REVTeX I honestly have no issues with LaTeX, especially if I just stick to Overleaf

j2kun|2 years ago

The recommendation to use Markdown+MathJAX fall short when you want to write longer documents with numbered section, subsection, and theorem/definition/figure etc tracking and referencing.

I'm sure with Sphinx and reStructuredText you can get that large-scale document tracking stuff, but with LaTeX it just works for the most part and you don't need to juggle a bunch of different side-projects and extensions. Plus you get things like automatic index generation (for a physical book).

phiresky|2 years ago

Markdown actually works great for larger documents when you use it with pandoc [1]. That way you get HTML output and PDF output via Latex, without the HTML being a second class citizen.

I wrote my thesis (50 pages) and multiple published papers this way. Maybe it seems janky but honestly my experience with Latex and it's 10 incompatible compilers and thousands of semi-incompatible packages has been much worse.

I also don't understand why (academic) publishing is so PDF focused. It's a horrible format to read on screens (think multi-column PDFs, and scrolling / jumping up and down to find references), and who actually prints stuff anymore?

The thing I love most about Pandoc is that my notes can just slowly turn into a fully fledged document. Like bullet points - The syntax in Latex is far too verbose to make taking notes with it comfortable.

It's also much easier to extend, I wrote a simple tool that automatically converts URLs into full and correctly formatted citations, so I don't even need a citation manager to get the same results:

    The GAN was first introduced in [@gan](https://papers.nips.cc/paper/5423-generative-adversarial-nets).
Turns into https://github.com/phiresky/pandoc-url2cite/blob/master/exam...

Another great project with similar structure is Manubot [3], though the PDFs there are not generated by LaTeX.

[1]: https://pandoc.org/ [2]: https://github.com/phiresky/pandoc-url2cite [3]: https://manubot.org/

bigpeopleareold|2 years ago

I searched for a comment to supports the fact that LaTeX shines in certain areas.

My memory of LaTeX has weakened over the years, since I am not writing long texts with lots of figures and such, but I know it's more than this statement let's on in the article: "Something that is more modern than learning a hundred bits of print typesetting that your student will never, ever need?"

What exactly is, in the end, is 'modern'? Is it because there is less syntax in Markdown to remember and the Modern is syntax-adverse? :D Aren't there editors for these in the first place to avoid the daily grind of remembering syntax?

bradrn|2 years ago

I honestly don’t see the point of using LaTeX if you’re generating HTML. The great strength of LaTeX, in my view, is the precise control it provides over typography and formatting. As such, it works best with an output format which can faithfully render these documents — such as PDF. For an output format like HTML, which encourages reflowability over faithful rendering, I’d much prefer to use an ‘easier’ document format like Markdown or reStructuredText.

golol|2 years ago

Exactly, there is a triangle of tradeoffs here: prettyness vs easyness vs responsiveness. You can only have 2 of them. pretty and easy is Latex. The reason people call CSS a nightmare is because responsiveness fundamentally makes it much more difficult to make a document pretty. So HTML+CSS gives you pretty + responsive or easy + responsive. That's not the same functionality as a pdf for a fixed scientific document.

seeknotfind|2 years ago

I spent a few weeks last year doing the opposite, HTML to LaTex in order to print and nicely typeset top HN articles, so I'd have a nicely printed booklet each morning. I think creating hard copies of web content for offline reading holds a lot of promise, but the internet is a beast.

PrimeMcFly|2 years ago

> so I'd have a nicely printed booklet each morning.

Why? If you're just printing to read on the train or whatever, wouldn't you just discard after reading?

AzuraIsCool|2 years ago

Interesting, I have done exactly that too! I have it sent to my laser printer to print out just before I wake up.

kkfx|2 years ago

I like LaTeX for the quality of it's pdf output, I use in for docs that need to be "printed" (non necessarily on paper, but still 'fixed typographical form for potentially long term archiving) not for anything else and yes I DO HATE pdfs because of their design, but PostScript is not much common these days and while a bit better for certain aspect is not much better in general, dvi is even worse.

For my notes, for anything that need to be "live" I use org-mode because:

- it's a far more natural markup than anything else

- it's rendered INLINE, no need to jump between a source form and a rendered one, a thing MD lovers fails to understand

- it's an outlining tool, another thing most other tools fails miserably to understand

- it easily incorporate live things in other languages (org-babel) a thing no modern REPL-alike DocUI like Jupyter can't do

Long story short I prefer the best tool depending on the job. HTML might be the least common denominator tool, making it the worst in essentially all cases. XML for machine usage, SGML in general, are good for machine usage, but they are very impractical in current usage, just see the actual crappy state of things for e-invoicing with XML/XADES docs + XSL to render them in the end as pdf for the human. They are a good too in some case, but again not the best for any specific case.

bovermyer|2 years ago

When I use LaTeX, it's because I want a way to store book manuscripts and their layout as code in version control. I never use any of the math layout. I get the impression that my use case is rather in the minority.

I would use CSS+HTML for layout, but what do I do about automatically generating tables of contents and indexes?

I guess I could write my own tool for that. Hmm.

gglitch|2 years ago

Looks like Pandoc can generate tables of contents for HTML, though I don't see anything about indexes. Roff and friends, and Texinfo, can do both, though with their own tradeoffs.

https://pandoc.org/MANUAL.html

generationP|2 years ago

This is from 2013, so the bet that "nobody will want to read [PDFs] in 5 years" can be considered failed. If anything, PDF has become the lingua franca of the academic web, crowding out even DjVU at the thing that DjVU was made for and PDF was not.

I have not been following the development of mathjax, pandoc, etc. carefully, so I'm wondering: Have the main issues been solved? By these I mean

(1) support for most popular packages,

(2) automatically breaking long outputs into small pages that don't overheat my laptop or crash my browser and yet reference each other properly,

(3) printability (without lines broken in half, senseless overflows and the likes) or cross-compilability with a regular PDF compiler?

I know the ar5iv project is getting closer and closer to (1) and (3), but is that available to regular users?

roel_v|2 years ago

But don't worry, 2024 is going to be the Year Of Math On The Web.

(I've been trying to do 'math on the web' (ish)) since 2002, and it's always sucked in some way; and all that time, images/pdf have Just Worked(TM). The emphasis in the OP on how much you'll have to report/chip in/fix is telling...)

bowsamic|2 years ago

The problem with DjVu is that its viewers suck, especially on macOS, which is very popular in modern academia

bloaf|2 years ago

And it is a shame. The current AI explosion is the poorer for it, due to the greater difficulty of extracting the text from PDFs.

adastra22|2 years ago

mathjax has come tremendously far, but not on the problems you mention :(

Retr0id|2 years ago

> don't just produce PDFs that nobody can read on small screens

I was thinking about this recently. If you get pedantic enough* about it, the typesetting quality you can get from a LaTeX+PDF is strictly better than what can be achieved using (sane) HTML.

I wanted to blog in LaTeX, and to solve the screen-size issue I thought I'd pre-bake to a wide range of page geometries, and then serve up an appropriate one to the client using pdf.js.

Fortunately for everyone, I decided against it in the end and continued blogging in markdown+html (with mathml support)

*well beyond what most readers would possibly care about

mattl|2 years ago

I write markdown, use pandoc to make LaTeX and from that a PDF for a printed thing and just supply markdown for non-printed stuff.

davidthewatson|2 years ago

I was surprised recently when I changed up my HTML and PDF toolstack not just how good pandoc was, but the entire ecosystem that had emerged around pandoc including pandocomatic and pandoc-resume.

chaxor|2 years ago

Typst is pretty close to markdown for simple things, and scales nicely to hard things. So you don't really need to worry about the markdown-pandoc shuffle anymore.

artagnon|2 years ago

LatexML has come a long way. Even arXiv uses LatexML internally to offer HTML5 versions as of late 2023. It does have limitations in not supporting all packages, or producing a high-quality translation in all cases.

If you don't need to convert entire LaTeX documents, MathJaX and KaTeX are really good at rendering a subset of LaTeX as MathML/SVG. I run MathJaX + an xypic extension for commutative diagrams with server-side rendering on my website, and it works great in practice.

IAmLiterallyAB|2 years ago

Tangently related, does anyone have experience with AsciiDoc? I've used reStructuredText before, but AsciiDoc is tempting, it looks cleaner.

pbronez|2 years ago

Asciidoc has potential. Last time I dug into it the ecosystem was lacking, but there were glimmers of a reboot. I hope that pulls through because it’s a great format.

Edit: yeah it’s managed through the Eclipse Foundation now. They’re slowly working towards a formal spec, haven’t hit 1.0 yet.

Details here https://gitlab.eclipse.org/eclipse/asciidoc-lang/asciidoc-la...

lkuty|2 years ago

You have also AsciiDoctor ( https://asciidoctor.org/ ) which is alive and well. I am using it for technical CS documentation internally, but only for single page documents. I did not try to deploy their whole multi-document setup called Antora ( https://antora.org/ ).

throwaway290|2 years ago

I had experience with AsciiDoc and personally not a fan. IMO it has weird features like totally illegible compact table syntax (seriously, that stuff is worse than XML) and the spec looks abandoned. But I keep seeing it being used, I guess it appeals to people who want something more flexible than Markdown (and who like Ruby, or they would go with RST)

jiehong|2 years ago

Using it for internal docs, but we don’t generate pdfs so I can’t comment on that part.

I personally find asciidoc easier to write manually.

dwheeler|2 years ago

One solution is to embed alternatives within PDF itself. LibreOffice can embed inside a PDF the original editabble source in ODF format. You could also embed ePub. That would mean you would have a single file that could be processed in many useful ways.

bluenose69|2 years ago

Although I use markdown (and similar) for memos, I turn to latex for longer and more complex material.

A lot of this is just because latex has been a standard for publishers in my field since I started (approximately a thousand years ago).

When writing for journals, latex saves a lot of work. Publishers provide latex templates that ensure that articles have a prescribed format and scope of content. Being able to see a good facsimile of the final published form is quite handy for authors. Oh, this paragraph is going on for over a column -- I'll break it up. That sort of thing.

This still applies when writing for longer things, such as textbooks and course notes, but another factor (for me, the larger one) is that latex (more properly, the tex upon which latex sits) is a programming language. Macros can be written to do lots of things that would be a pain if done manually, and once a macro is written, altering an entire text is easy. I did this in a book I wrote a while back, writing macros to colourize text that would be indexed, add margin notes for things I wanted to return to, categorize paragraphs by function, and so on. I could turn all those macros on and off by uncommenting a line. This is really quite helpful in writing something that takes months to years to complete. Frankly, I use this macro approach even in memos written in markdown. Inside almost all of my markdown documents, there are latex commands.

As for reading things on a small screen, which I guess is really the topic here, I must admit that this is something I rarely do within my own field. Sure, I do it if reading one of those 10-km overview articles in Science or Nature. But when it comes to my own field, things are technical and demand long periods of study. I don't try to read this stuff on the bus or in a coffee queue. I need time (hours or days) and I need to be able to take notes.

Another reason I prefer PDF is that it is fixed. My brain puts information into a sort of spatial framework. Somehow, if I look at a paper I first read 40 years ago, I still know what information is on which page, which of the diagrams summarizes the whole thing, and which of the citations is key. This may be a flaw in my brain functioning, but I just don't find these sorts of memories forming when I read content that has a plastic format, with paragraph breaks changing if I adjust my view. But maybe this is just my age talking, I suppose.

setgree|2 years ago

I learned LaTeX in grad school in 2013, starting with LyX. Yesterday, I compiled an Rmarkdown document into an APA6-conformant PDF with just a bit of YAML, with a tex file as an intermediate output.

We're almost there for skipping LaTeX entirely, but in my experience, Google Docs and Overleaf still offer vastly superior collaborating tools. Now if we could just edit {.md; .rmd; .ipynb} files directly on Overleaf, with comments and track changes, we'd be in business...

bowsamic|2 years ago

If I'm using LaTeX, I'm writing scientific articles. I expect scientific articles to be read by people on computers with normal screen sizes or printed off. Therefore there's no reason to bother with anything other than PDF. PDF works great.

analog31|2 years ago

That's certainly one use case. I might be the exception, trying to look up something on my phone, or following a link in a blog or HN post. Stuff in PDF's is hard to read, especially two column journal articles. I'm often not at my desk, since I might be in a meeting or in the lab.

asimpletune|2 years ago

I love the author’s “if you want to leave a comment email me”. I saw this somewhere else and it motivated me to make an automated system that works like that: https://r3ply.com

notpushkin|2 years ago

Instead of MathJax, maybe also consider KaTeX: https://katex.org/

It's faster than MathJax and also can be pre-rendered on the server (or in your SSG!).

amai|2 years ago

That is old news. Mathjax 3 is a lot faster nowadays than it used to be and it supports more LaTeX keywords than KaTex. Especially the important \label and \ref are still not supported by KaTex.

froh|2 years ago

I just moved "up" from gfm markdown to asxiidoc and oh do I miss LaTeX.

html rendering of LaTeX is a godsend. and imnsho asciidoc a work around to not fully having that.

matt3210|2 years ago

At work all reports are html. If you want pdf, cmd-P

clbrmbr|2 years ago

Would be nice if this article included some equations!

opentokix|2 years ago

LyX is the way to LaTeX

whatever1|2 years ago

I dont always use latex but when I do I always hate it.