top | item 38730839

(no title)

dginev | 2 years ago

Thanks for the kind words, but some corrections:

1. My name is Deyan (hi!)

2. ar5iv was the latest frontend incarnation, but our actual work on converting LaTeX to HTML goes back nearly 20 years behind the scenes.

3. I was an undergraduate student when I was introduced to the project back in 2007. It was started "in spirit" by 3 senior co-conspirators back then: Michael Kohlhase, Bruce Miller and Robert Miner. And I am by no means a solitary actor today, even if I may be the chief online presence of the people involved. Bruce is doing the bulk of the hard work on LaTeXML to this day.

I documented some of the history in an invited talk for CICM 2022, which you can find on youtube, or see the slides at:

https://prodg.org/talks/welcome_to_ar5iv

It's really great that the HTML has now reached "home base" in arXiv, and I hope their team gets a lot more of the positive attention going forward - today's achievement is entirely theirs!

discuss

order

indrora|2 years ago

I remember stumbling upon your work long ago when I was working on a project to have "e-zines" that consumed a series of `article` class files and rendered them out into PDF and HTML as a series package.

I had come across latex2html, Dan Gildea's project, and found myself unpleasantly dissatisfied with how it worked. As I understand it, it's more a "half implementation of lots of packages" rather than what ar5iv seems to be, which is "enough of the core LaTeX engine producing HTML instead of DVI"? I'd love to know more about the nitty gritty of how the engine does its thing.

I'm curious: How has modern web tech (e.g. WebAssembly, Canvas, etc) helped or gotten in the way of getting good LaTeX rendering in the browser?

dginev|2 years ago

Right, that's LaTeXML - it tries to emulate as much as possible of the TeX typesetting system, while retaining enough control to emit structured markup.

Which also allows us (and generally all contributors of latexml package support) to conveniently maintain various parallel data structures and metadata needed along the way.

Modern HTML is very often helpful to produce higher quality article renderings. Examples:

1. we recently started using flexbox for subfigures, allowing them to reflow.

2. we have started emitting ARIA accessibility annotations (there is now an "alt" key for \includegraphics)

3. MathML Core allowed us to have native web rendering for math expressions in every browser.

As to LaTeX rendering in the browser, there are various other projects out there you could look up with partial support. For latexml the WebAssembly route seems most realistic, as we are undergoing a rewrite in Rust. But there are quite a number of pieces to flesh out before we get there.

ngcc_hk|2 years ago

Went through it and may I ask whether there is any “personal” level of this ar5iv converter or just one of few mentioned parser.

Btw given we are into quotation academic world, I wonder whether you may have mention Gartner Group to invent that technology curve. To be honest there is a variation I like more which deal with the chasm issue.