top | item 41293180

(no title)

gnewton77 | 1 year ago

Did some similar work with similar visualizations ~2009, on ~5.7M research articles (PDFs, private corpus) from scientific publishers Elsevier, Springer:

Newton, G., A. Callahan & M. Dumontier. 2009. Semantic Journal Mapping for Search Visualization in a Large Scale Article Digital Library. Second Workshop on Very Large Digital Libraries at the European Conference on Digital Libraries (ECDL) 2009. https://lekythos.library.ucy.ac.cy/bitstream/handle/10797/14...

I am the first author.

discuss

order

j_bum|1 year ago

Nice article, thanks for sharing.

I can imagine mining all of these articles was a ton of work. I’d be curious to know how quickly the computation could be done today vs. the 13 hour 2009 benchmark :)

Nowadays people would be slamming those data through UMAP!

gnewton77|1 year ago

Thanks! Yes, I'm sure using modern hardware and modern techniques would improve both computation time and results.

Loughla|1 year ago

How do you decide who is listed first? And does the ampersand symbolize something that the word and doesn't, or is that just citation style?

j_bum|1 year ago

In biomedical research or tangential fields, author order generally follows these guidelines:

First author(s): the individual(s) who organized and conducted the study. Typically there is only a single first author, but nowadays there are often two first authors. This is because the amount of research required to generate “high impact” publications simply can’t be done by a single person. Typically, the first author is a Ph.D. student or lab scientist.

Middle authors: Individuals that provide critical effort, help, feedback, or guidance for the study and publication of the research. Different fields/labs have varying stringencies for what is considered “middle authorship worthy”. In many labs, simply being present and helping with the research warrants authorship. In other labs, you need to contribute a lot of energy to the project to be included as an author.

Senior author(s): The primary investigators (PI’s) or lead researchers that run the lab that conducted and published the study. The senior authors are typically the ones that acquire funding and oversee all aspects of the published research. PI’s have varying degrees of hands-on management.

There is some variation in whether the central research question for a manuscript is developed by the first vs. the senior author, but usually it’s the senior author. Also, the first and senior authors typically write the manuscript and seek edits/feedback from middle authors. In other cases, there can be dedicated writers that write the manuscript, who sometimes do/don’t get middle authorship. A main takeaway is: the general outline I’ve provided above is not strictly adhered to.

I’ll take some liberty to apply this outline to this article at hand:

First Author: G. Newton (OP). The scientist who mostly likely conducted all of the data mining and analysis. He likely wrote the article as well.

Middle Author: A. Callahan. It seems like this author was a grad student at the time the article was written. She likely performed essential work for the paper’s publication. This could’ve been: helping with the analysis, data mining, or ideation.

Senior Author: M. Dumontier. A data science professor, now at Maastricht U. He’s a highly cited scientist!

Lastly… if you check out the acknowledgements, you can see three additional names. These people likely helped with setting up compute access, editing, or general ideation.

This is a cool manuscript! Hopefully this overview isn’t TMI and provides some insight into the biomedical/data science publication process.