LifeIsBio's comments

LifeIsBio | 2 days ago | on: Ask HN: How to be alone?

This line stuck out to me as well, but my follow up thought was different.

I’ve had friends who have been on cocktails like these, and one of them once said something like, “I’ve been depressed before, and this is not that. I’m not depressed. I don’t have the emotional capacity to be depressed. This is more like a total emotional blank slate.”

She was basically a robot for a few months. Incapable of really any emotions, including sadness, anxiety, frustration, etc. Suffice to say, she also didn’t have the emotional drive to push her towards positive things like deciding on how to spend her weekend free time.

Thankfully she’s changed her meds and is feeling overall better (if, admittedly, at the price of some emotional stability).

LifeIsBio | 9 months ago | on: Mermaid: Generation of diagrams like flowcharts or sequence diagrams from text

One of my favorite applications of multimodal LLMs thus far is the ability to:

1. Draw a DAG of whatever pipeline I’m working on with pen and paper.

2. Take a photo of the graph, mistakes and all.

3. Ask ChatGPT to translate the image into mermaid.js

Given how complicated the pipelines are that I’m working with and the sloppiness of the hand drawn image, it’s truly amazing how well this workflow works.

LifeIsBio | 1 year ago | on: Many FDA-approved AI medical devices are not trained on real patient data

Yep, I'm in the rare disease space. "impossible" is pretty appropriate.

It's tricky. On the one hand, it's obviously not appropriate to be flippant about patient privacy. On the other, it's clearly that advancements in human health are being hindered by our current approach to (dis)allowing researchers access to data.

LifeIsBio | 1 year ago | on: Show HN: R2R V2 – A open source RAG engine with prod features

I want to second this. It seems like document chunking is the most difficult part of the pipeline at this point.

You gave the example of unstructured PDF, but there are challenges with structured docs as well. We’ve run into docs that are hard to chunk because of this deeply nested and repeated structure. For example, there might be a long experimental protocol with multiple steps; at the end of each step, there’s a table “Debugging” for troubleshooting anything that might have gone wrong in that step. The debugging table is a natural chunk, except that once chunked there are a dozen such tables that are semantically similar when decoupled from their original context and position in the tree structure of the document.

This is one example, but there are many other cases where key context for a chunk is nearby in a structured sense, but far away in the flattened document, and therefore completely lost when chunking.

LifeIsBio | 1 year ago | on: Jim Simons has died

Just to add to the list of this Jim Simons did and funded, he also established the Simons Foundation Autism Research Initiative (SFARI).

"SFARI’s mission is to improve the understanding, diagnosis and treatment of autism spectrum disorders by funding innovative research of the highest quality and relevance."

SFARI in turn funds a lot of foundational neurological and rare disease research, since autism is such a common phenotype.

LifeIsBio | 2 years ago | on: Is Cosine-Similarity of Embeddings Really About Similarity?

The paper kinda leaves you hanging on the "alternatives" front, even though they have a section dedicated to it.

In addition to the _quality_ of any proposed alternative(s), computational speed also has to be a consideration. I've run into multiple situations where you want to measure similarities on the order of millions/billions of times. Especially for realtime applications (like RAG?) speed may even out weight quality.

LifeIsBio | 2 years ago | on: Everything is a linear model

I read this article when I was in grad school 5 years ago. Absolutely love it and talk about it to this day.

It really makes me frustrated about the ways I was introduced to statistics: brute force memorization of seeming arbitrary formulas.

LifeIsBio | 2 years ago | on: Show HN: Lodestar Bio, providing rare disease patients a diagnosis

Hey, HN! Maybe not your typical startup announcement here, but I recently left my job as a bioinformatics engineer to start a company called Lodestar Bio.

We are addressing challenges faced by families of children with rare diseases who are seeking a diagnosis, and our solution is a two-sided marketplace for rare disease genomic insights.

On one side, we will offer children who have a rare disease—and an inconclusive whole genome assay—another chance at a diagnosis. A majority of families who order a whole genome test do not receive their much needed diagnosis and are rarely provided with clear followup options. On the other side of the market, we will use the genomic data we collect to identify orphan drug leads, which we will sell to biopharma clients who are creating personalized medicines.

I'm happy to chat about any questions or comments you have!

LifeIsBio | 2 years ago | on: We're afraid language models aren't modeling ambiguity

The game “20 questions” is probably the hardest I’ve seen chatGPT fail.

What’s interesting about the game is that, at first pass, there’s no ambiguity. All questions need to be answered with “Yes” or “No”. But many questions asked during the game actually have answers of “it depends”.

For example, I was thinking of “peanut butter” and chatGPT asked me “Does it fit in your hand?” as well as “Is it used in the kitchen?”. Given my answers, chatGPT spent the back half of its questions on different kitchen utensils. It never once considered backing up and verifying that there wasn’t some misunderstanding.

I played three games with it, and it made the same mistake each time.

Of course, playing the game via text loses a lot of information relative to playing IRL with your friends. In person, the answerer would pause, hum, and otherwise demonstrate that the question asked was ambiguous given the restrictions of the game.

Regardless, it was clear that chatGPT wasn’t accounting for ambiguity.

page 1