jasonpriem's comments

jasonpriem | 1 year ago | on: Ask HN: Who is hiring? (November 2024)

OpenAlex | REMOTE | https://openalex.org | senior frontend eng. & product owner

We're a small nonprofit on a mission inspired by the ancient Library of Alexandria: to create a universal database of research information. We gather, organize, and serve info on 280M papers,100M authors, billions of citations and more. We’re passionately open: our code is open source, and our data is free and public domain. We value kindness, creativity, and getting stuff done. Our work supports millions of users every day, and we’re growing fast.

We're hiring a senior frontend engineer and product owner. Salary $250k-300k, great perks.

Apply: https://ourresearch.breezy.hr/p/b31ea361225401-senior-fronte...

jasonpriem | 3 years ago | on: OpenAlex: The Promising Alternative to Microsoft Academic Graph

Agreed, the author disambiguation isn't quiet as good as Scopus'...they have a bit of a head start on us. But we're improving it quickly.

Thanks for the suggestion about the data dump. A lot of that weight is abstracts, which come in at over 30GB just by themselves. But it's true that the JSON format has some redundancies. For now we think those are worth it, because the denormalized schema is very compatible with the API and easy for beginners to get started with. Plus you only have to download it once (for free! HT to AWS Open Data sponsorship), and after that the updates are very light.

We'll certainly consider offering a smaller, normalized format in the future though, if we get more requests for it.

jasonpriem | 3 years ago | on: OpenAlex: The Promising Alternative to Microsoft Academic Graph

To be clear, this is a project isn't affiliated with SMU, they just did the blog post...it's from a nonprofit called OurResearch (source: I'm a cofounder).

We did have some good conversations with folks at Meta before they closed up shop, but didn't end up using any of their data.

jasonpriem | 3 years ago | on: OpenAlex: The Promising Alternative to Microsoft Academic Graph

We emphasize the API because imho a well-documented, high-throughput API is something really lacking in the ecosystem right now. Dealing with a dataset this size (200M works, 200M authors) is a pain, especially since many end users for this data don't have a lot of technical expertise. Often people have really basic questions (like the ones from the linked post) or they want to build simple applications like monitoring dashboards, recommender systems, and scholarly search engines.

With the API, folks can outsource the heavy data engineering to us for free, and just do the fun parts themselves. We want to make building real-world apps on the global research graph fun and easy, the kind of thing you can do as a hackathon project, instead of with a six-figure grant.

That said, I agree that it's absolutely essential the entire dataset be easy to download and mirror as well. It's called OpenAlex because it's _open_, soup to nuts (the "Alex" part is homage to the ancient Library of Alexandria). All the data is open, the code is open, and our governance is as open as we can make it. [1]

[1] https://ourresearch.org/transparency

jasonpriem | 9 years ago | on: Unpaywall: Browser extension to find free copies of academic papers

Hi, one of the devs here! If you want to see a cool example of Unpaywall in action, check out the Nature paper [1] published last month about TRAPPIST-1, the nearby solar system with seven earth-sized planets. Paywalled on Nature, but Unpaywall links you to a free copy on ArXiv. [2]

[1] http://www.nature.com/nature/journal/v542/n7642/full/nature2...

[2] http://blog.impactstory.org/smash-interstellar-paywall/

jasonpriem | 9 years ago | on: 100 Awesome Women in the Open-Source Community You Should Know

I love that they're using actual network data to build this list, not just a "who's who" based on someone's general impressions. There is so much potential in applying this kind of network analysis to the GitHub dataset, and I think we're only seeing the beginning with projects like this one, as well as ones like https://libraries.io/ and http://depsy.org.

One thing I missed in this writeup was more explanation of their methods. For instance, why were they only able to make gender guesses for 2mil out of 7mil users? That's unusually low for name-based gender identification. I'm guessing this is because many GitHub accounts didn't have first names, but would be great to actually see.

I'd also love to see the percentage of women they found out of those 2 million. Otherwise it's "Top 100 out of the ???? women on GitHub." Hopefully this will be addressed in the followup posts they promised. I'll be looking forward to them.

[disclosure: I'm a PI on http://depsy.org, which is funded by the National Science Foundation. And one of the gals on this list is my co-PI]

jasonpriem | 10 years ago | on: The unsung heroes of scientific software

Yes, agreed...we've had lots of requests to add other languages.

And as you say, the growing popularity of GitHub gives us all kinds of cool data even when there's no central package manager for the language. In fact, we're mining imports of every Python and R project on GitHub right now to build out the dependency network beyond the (much much smaller) CRAN and PyPi networks.

The idea with Depsy has been to launch quickly with two languages, so people could see what it looks like, then iterate and add more as we get feedback. So we'll count your comment as +1 for C and C++ :)

page 1