top | item 16434078

Show HN: Read ArXiv Papers on Semantic Scholar as Responsive HTML Documents

40 points| undfined | 8 years ago |semanticscholar.org

15 comments

order

thecodeviking|8 years ago

Hi HN,

I'm an engineer on the Semantic Scholar team that worked on integrating this feature into the site.

Here's a blog post that talks a bit more about what we're doing: https://blog.semanticscholar.org/announcing-a-new-way-to-rea...

I'm around to answer questions / discuss the approach. We're super excited and would love to hear your feedback!

ivansavz|8 years ago

Feature idea 1: perhaps you could make references section in the bottom render links, at least for references that provide and arXiv identifier.

Feature idea 2: better/shorter URL structure, the current URL https://www.semanticscholar.org/paper/{title}-{authors}/{uui... ends up quite long and unreadable (may be good for SEO though). If you're rendering mostly arXiv papers you could setup a short URL scheme that mirrors the arXiv url paths: e.g if original URL is https://arxiv.org/abs/XXXX.YYYY your URL could be https://www.semanticscholar.org/paper/arxiv/XXXX.YYYY

Good stuff!

ivansavz|8 years ago

Very nice! I like how the HTML rendering is responsive and allows you to read on narrow screens.

MathJax seems to handle most things, but not 100%: https://www.semanticscholar.org/paper/Quantum-Broadcast-Chan...

What are you using for parsing the LaTeX --> (HTML+MathJax) conversion?

ktpsns|8 years ago

The main workhorse is https://dlmf.nist.gov/LaTeXML/ as you can find once you reach https://github.com/arxiv-vanity/engrafo.

LaTeXML converts tex to XML by running latex ("only latex can parse latex") and working on the DVI output.

But nevertheless this is a hard job so I will loook into the engrafo code soon because I want to apply this to a book we have written.

CardenB|8 years ago

In what ways is this different from arxiv-vanity? (Just append "-vanity" to "arxiv" in any arxiv link to get a rich html version)

thecodeviking|8 years ago

It's not that different, at the moment. The only real difference is that we're pre-computing the HTML so it's faster (ArXiv Vanity runs at request time).

We've talked a lot with the ArXiv Vanity team. If all goes well and our users love the feature, we have an opportunity to support (and contribute to) to their efforts at improving Engrafo and LaTeXML and maintain the front-end facing portion of the system. That way they don't have to worry about hosting / providing a functioning front-end, which we're happy to foot the bill for (and maintain)!

david2016|8 years ago

I would recommend that rather than copying "arXiv-Vanity", your team should focus on improving and contributing to "Engrafo" since there are many errors in the HTML conversion. And, leave everything at "arXiv-Vanity" since there is no point of having 2 different places doing the same exact thing.

undfined|8 years ago

We plan to do both. We want to give back to Engrafo where it makes sense, but we also have a unique opportunity given our other semantic features along with the paper metadata from our corpus to build upon this experience. This release is an MVP reading experience, but in the future we plan to add a number of things like: user highlighting, direct linking to authors and citations, and collaborative commenting.