Library-managed 'arXiv' spreads scientific advances rapidly and worldwide

[+] CJefferson|9 years ago|reply

Can I just make a general plea?

You should upload your paper to arXiv. When you do, please upload your source (tex, or word I imagine), as well as a PDF.

For the blind, PDF is the worst possible format, and tex and word are the best formats. Don't hide, or lose, the blind-accessible version of your paper.

[+] apathy|9 years ago|reply

When I worked at the Cornell Theory Center in the late 90s, I didn't know about the arXiv, but I sure as hell knew about screen readers. One of the best FORTRAN programmers (numerical analysis/applied math research faculty I think) was blind as a bat.

I learned a lot about not underestimating people there.

It's a strange university but it always makes me happy to think that it (specifically Ginsparg, and to some degree strogatz and the law scholars) pushed forward what the Web was supposed to be.

Not a place to buy stuff, but a place to learn stuff, without waiting for it to make its way through a hyper politicized review process so that it could be printed in a 17th century fashion and mailed to some corners of the world, eventually perhaps reaching a fraction of the people who could use it. Rather, everyone everywhere with a connection.

Anyone who doesn't deposit preprints (arXiv, biorxiv, or wherever) or who doesn't agitate for their coauthors to do so is not really in it for the science. It's fine to be competitive -- deposit yours first. Make the fucking discovery instead of parasitically piggybacking on those who do the work.

But that last part, that's hard. Very hard. As long as there is enough money left in academia to encourage lazy shits, those of us who care about scholarship will have to push, hard, to remove the last refuge of these scoundrels.

You're either on the side of justice -- open data, open formats, open scholarship -- or you are tacitly endorsing Elsevier & Springer, who haven't the slightest problem using crap like incremental JavaScript & mangled PDFs to deny access to scholarship even to those who have paid.

David (blocking on his last name) at CTC made that choice for me. He set an example that forced me to admit what was right. I hope others will do the same. It's the right thing to do.

[+] mixedmath|9 years ago|reply

I'm a (PhD student) mathematician and every paper I've uploaded has included the TeX. I looked at about 10 papers in math at random, and 9/10 of them also included the TeX.

But this isn't something I've looked for in the past. So I wonder: in your experience, about what percentage of papers on the arXiv have included the source?

[+] hackuser|9 years ago|reply

> For the blind, PDF is the worst possible format

I'm surprised. Nobody has created an accessibility solution for PDFs after all these years of ubiquity? What's the story?

[+] chime|9 years ago|reply

It would be great if arXiv upload screen said this. Not sure if it does or not.

[+] ASipos|9 years ago|reply

For the record, if you upload the TEX, arXiv autogenerates the PDF.

[+] robin-berjon|9 years ago|reply

I know that Word has decent accessibility built in (because Microsoft actually cares about this) but I'm surprised that you're getting mileage out of TeX which is a very visual format. Do you basically screen-read the source? Or is there a non-visual output for it that works?

[+] agumonkey|9 years ago|reply

PDF readers manage to parse text back somehow effectively, maybe not on formula / formatting heavy PDFs. Anyway good call, accessibility is not only for mainstream websites. I'm sure the blind dude that aced math classes in college would agree.

[+] naftaliharris|9 years ago|reply

This article misses one of the biggest value-adds of arXiv, at least in my field (Statistics): since almost everyone posts to arXiv, you can almost always find a free version of a published and potentially pay-walled paper. In the past, publishing in a peer-reviewed journal would (1) improve the paper through peer review, (2) signal the quality of the paper based on the prestige of the journal, and (3) distribute the paper. With arXiv, publishing now only does (1) and (2).

[+] apathy|9 years ago|reply

Publishing sometimes does 1), and rarely does 2) (as a statistician you surely know that the relationship between impact factor and retraction is nonlinear and rises in strength as you get into CNS, NEJM, and the like).

I review for others because others have done 1) for me. But I'll never review for Elsevier, and lately I've had the luxury of reviewing for the most cited of open journals (by operating bioRxiv, and accepting direct submissions from it, I claim that Genome Research is "close enough").

It makes me very happy that this is possible (my CV has not suffered for only publishing as first author, and whenever possible as senior or co-senior, in fully open journals). I'm pretty sure this wasn't possible for most people a few short years ago. That engenders optimism about the future of scholarship, for me at least.

Hopefully you as well.

[+] slacka|9 years ago|reply

> you can almost always find a free version of a published and potentially pay-walled paper.

On personal research, I've used it for exactly this, but since what I've seen was only preprints, I've often wondered about the final version. It looks like I'm not alone.[1] Do many or any of the arXiv papers get updates with the improvements that come from peer reviews? Is there a need for arXiv for finals or do publishers demand exclusives on finals?

[1] http://mathoverflow.net/questions/41141/should-i-not-cite-an...

[+] greydius|9 years ago|reply

This is an excellent point. It makes one wonder why academic journal publishers even need to exist anymore. The peer reviewers (who don't get paid anyway) could just as easily do the same job and issue a "stamp of approval".

[+] ivanstegic|9 years ago|reply

Ah, the original http://xxx.lanl.gov/ that I knew and loved in the 90's, when people thought we were surfing nudies in the Physics department and not papers on differential geometry. I helped establish and run the za.arxiv.org mirror at WITS University, mostly to learn how to configure RedHat, Apache, rsync and other tools. I'm glad it still exists.

[+] RMarcus|9 years ago|reply

I've worked at Los Alamos for 6 years and I didn't know this existed. Pretty cool.

[+] divbit|9 years ago|reply

ArXiv is incredibly useful for research, but I think people also use it for a sort of "I posted it to arXiv first, therefore I solved it first" kind of thing, which imo can be misleading at times, if not everyone follows that. Also there is the eprint.iacr.org which seems to do the same thing, except for cryptography (or is it cryptology?), so I'm not sure if every important preprint in that topic gets to arXiv.

[+] ajross|9 years ago|reply

> I think people also use it for a sort of "I posted it to arXiv first, therefore I solved it first" kind of thing, which imo can be misleading at times

True, but I don't see that fights over precedence are unique to ArXiv either, or even made worse by it, no? I mean, at least now there is an unambiguous date-stamped public place to cite in this kind of fight. And those fights provide a built-in incentive to put stuff up there, which is good for all of us.

Basically: who cares about spitballs as long as the papers end up on ArXiv? Seems like a cost worth paying to me.

[+] Ar-Curunir|9 years ago|reply

There's also http://eccc.hpi-web.de/reports/menu/ for complexity theory.

[+] pepon|9 years ago|reply

I hope it is replaced with something better soon. You cannot see access statistics concerning the papers you upload, and they provide this absurd reason for not doing it: https://arxiv.org/help/faq/statfaq (it seems they think arxiv users are idiots or something, so they have to take care of us). Also getting the uploaded latex files to be compiled without errors is a pain, and they don't let you to just upload the pdf (this has pros or cons, but I wish there was the freedom to choose... and I guess that 99.999% of the time people just download the pdf).

[+] beering|9 years ago|reply

After reading your comment, I was inclined to agree with you about the statistics. After reading their FAQ, I was convinced to side with them.

Their point is that the stats are garbage-level useless. And I can imagine people bragging elsewhere that their paper received X,000 hits when in reality it's all spam or bots. It's not arxiv's responsibility to monitor that, but it wouldn't feel good to facilitate that kind of disinformation or invite hit inflation. Especially as scientists, we want to either publish good data or no data, not data that we know to be garbage.

[+] skybrian|9 years ago|reply

Not providing raw download counts seems like a good thing; it's strongly privacy preserving.

On the other hand, perhaps a way for registered users to star papers that they like (similar to how Github lets you star projects) might be a good thing. It serves much the same purpose as a rough measure of popularity, but is entirely voluntary.

[+] javajosh|9 years ago|reply

Requiring error-free latex is almost certainly a reasonable proxy for real curation effort.

[+] CJefferson|9 years ago|reply

There is one HUGE reason for not using PDFs -- PDFs are very blind-inaccessable, whereas tex is perfect.

For that reason alone, arXiv is really helping the blind community in academia.

EDIT: Add missing 'not' :)

[+] tnecniv|9 years ago|reply

Needs a [2012]

[+] starshadowx2|9 years ago|reply

Interesting to learn how to pronounce it correctly. I've always just said arx-iv like it's spelled.

[+] joeyo|9 years ago|reply

It is pronounced like it's spelled; the X is a chi.

[+] ckdarby|9 years ago|reply

Is there any reason why a project like this wouldn't be open sourced?

Follow up question, how does a site like this have a $500k annual budget? I was napkin calculating the costs of running this and couldn't get anywhere close to $500k without having extensive staff salaries.

[+] cooper12|9 years ago|reply

Looking at it from a Cornell point-of-view, the most innocuous reason I can think of is that they want a canonical library of papers that others can mirror rather than researchers having to search each individual university's arXiv. If they let others fork and set up their own servers it could lead to interesting modifications/applications but it would no longer be in their control and might make the preprint locations fragmented. (and the other servers might not have the same moderating standards)

The other more greedy explanation is always money. Of course open source isn't antithetical to profit, but as mentioned before you do lose control and maybe Cornell doesn't want competition. Even if the project was started with the best of intentions, they still need to make it self-sufficient and maybe even profitable so they probably decided it's in their best interest. Of course this is all just me speculating.

[+] frumiousirc|9 years ago|reply

It must be mostly salary and maybe a small fraction bandwidth. Hardware costs must be in the noise. For the salary, don't forget university overhead. 200K alone might be going to support Ginsbarg. A software developer + sysadmin could be at a similar rate. Again, with overhead included. Praise be the bureaucracy, and give onto it its tithe.

[+] jessriedel|9 years ago|reply

Here's a recent lengthy FAQ Ginsparg did on the arXiv (ironically behind a journal paywall).

http://onlinelibrary.wiley.com/doi/10.15252/embj.201695531/f...

Here's a discussion on HN of a blog post by me sparked by a conversation with Ginsparg.

https://news.ycombinator.com/item?id=9415985

[+] beezle|9 years ago|reply

Am I the only one who still uses xxx.lanl.gov ?

[+] Steuard|9 years ago|reply

Probably not. :) (Is it a redirect now, or is it an actual mirror?)

My understanding is that they switched to the new domain after people noticed the original was being blocked as porn by a bunch of automatic content filters.

[+] unknown|9 years ago|reply

[deleted]

[+] science404|9 years ago|reply

There's something I've always wondered about.. what do you do if you upload your journal submission to arxiv but it's later rejected? That possibility has always been a deterrent to submitting to arxiv for me. Seems to me this discussion assumes arxiv uploads will be accepted to some journal eventually..

[+] deepnotderp|9 years ago|reply

Hooray for arxiv :)

Long live open science!

[+] danjoc|9 years ago|reply

>Eleven years ago Ginsparg joined the Cornell faculty, bringing what is now known as arXiv.org with him. (Pronounce it "archive." The X represents the Greek letter chi.)

Been pronouncing it "ar ziv" until now. :P

[+] rrmm|9 years ago|reply

I still like referring to it as triple-x from when it was http://xxx.lanl.gov

[+] forgotpwtomain|9 years ago|reply

It may be interesting to note that the ancient greek 'X' is suposed to be aspirated so the English pronunciation of 'chi' is almost certainly incorrect anyways.

[+] alanh|9 years ago|reply

But, "archive dot org" is an entirely different and also noteworthy organization!

[+] kensai|9 years ago|reply

But I bet not faster than Sci-Hub... har har har. :D

[+] IanCal|9 years ago|reply

Significantly faster than sci-hub. Sci-hub is, afik, based on published work. This is preprints, so well in advance of that.

[+] baby|9 years ago|reply

I hate arXiv, I can never figure out where is the PDF, if there is a PDF... long live eprint.

[+] lorenzhs|9 years ago|reply

I'm not sure whether you're serious, but on any article's page there a "Download" section with a link to the PDF (labelled "PDF").

[+] Ar-Curunir|9 years ago|reply

ePrint indeed offers a much nicer interface. However, I wish they didn't discard prior versions when people revise papers.

132 comments