Show HN: Chat with GPT about medical issues, get answers from medical literature
45 points| garrinm | 2 years ago |github.com
Clint enables a user to have an interactive dialogue about medical conditions, symptoms, or simply to ask medical questions. Clint helps connect regular health concerns with complex medical information. It does this by converting colloquial language into medical terms, gathering and understanding information from medical resources, and presenting this information back to the user in an easy-to-understand way.
One of the key features of Clint is that its processing is local. It's served using GitHub pages and utilizes the user's OpenAI API key to make requests to directly to GPT. All processing, except for that done by the LLM, happens in the user's browser.
I recently had a need to lookup detailed medical information and found myself spending a lot of time translating my understanding into the medical domain, then again trying to comprehend the medical terms. That gave me the idea that this could be a task for an LLM.
The result is Clint. It's a proof-of-concept. I currently have no further plans for the tool. If it is useful to you as-is, great! If it is useful only to help share some ideas, that's fine too.
TuringNYC|2 years ago
I tried to follow: https://github.com/clint-llm/clint-cli/blob/main/clint/scrip... but it wasnt clear -- are you indexing the medical literature in entirety or just abstracts?
We tried to do this with arXiv and others, but getting commercial rights was difficult and we got stuck on that, could you share which medical literature source you used.
I tried to follow the code and it looks like you embed, so i'm assuming you're using RAG, is that it, or are you trying to fine-tune also? I didnt see any fine-tune code. (We didnt fine tune due to cost)
Did you benchmark different embedding chunk sizes, etc? (Yes for us! We've tried a matrix search of chunk sizes, including sliding window and found the sweet spot for different types of media, usually a single paragraph)
Did you manage to get access to a fine-tuned model like MedPALM and benchmark that? (we are still awaiting access)
garrinm|2 years ago
I put this project pretty quickly and I don't want to pretend there is tremendous depth behind any of the decisions I made :/.
For now the only source is the Stats Pearl book published on ncbi.nlm.nih.gov (the only place this is mentioned is here: https://github.com/clint-llm/clint-cli/blob/main/README.md#u...). It contains about 11,000 peer reviewed articles about anatomy and conditions: https://www.ncbi.nlm.nih.gov/books/NBK430685/. The copyright terms are CC BY-NC-ND 4.0. I might add some Wikipedia articles to this in the future.
I chunk the documents by section, and embed only the first 2048 tokens that fit in the OpenAI embeddings. I'm using OpenAI for embedding as opposed to something like all-minilm-l6-v2 because I don't want to have to ship a model to the clients (transfer times could be large and supporting this would increase the complexity of the library).
I didn't experiment with different chunk sizes, and I suspect something smaller would be more beneficial as you point out. But it would also complicate the logic, and most choices I made in this project were to remove complexity and get this done quickly. If I revisit this I might chunk by paragraph on your advice :).
RAG is indeed what is being used. But it a few different ways. The diagnoses are refined using a pretty straightforward RAG prompt: consider these notes ... consider this diagnosis ... can you improve on it etc.
But in a way the entire program is RAG-based. In most prompts some documents are added to the system message for context. It's not clear that the information in the documents is always used, but based on a bit of experimentation it seems to improve various responses.
I have no plans to fine tune. I'm not sure how beneficial would be fine tuning here. The model needs a fair bit of general knowledge to reason about descriptions of symptoms. Fine tuning could over-specialize it. And hallucinations could come up even with fine-tuning, so you would probably want a RAG-like prompt to get it to focus on real details.
This this is very much a hobby, so I haven't dug deep enough to look into other models. But I'd be _very_ curious to see how GPT 3.5 with RAG compares to vanilla MedPALM. In my experience GPT 3.5 can reason quite well about with the right documents in the context.
linsomniac|2 years ago
She did find that some of the follow-up questions seemed to lead it down the wrong path, where the follow-ups seemed to be much more important than the initial messages, and she got better results by taking the original message and follow-ups and combining them into a new message to submit to a cleared session.
garrinm|2 years ago
I see what you mean about going down the wrong path. I can explore a couple of modifications to help it have better context of the conversation as a whole.
jcutrell|2 years ago
garrinm|2 years ago
gumballindie|2 years ago
garrinm|2 years ago
clint|2 years ago
NBJack|2 years ago
More importantly, the new Clint is able to respond to 20 comments a minute, and engagement on the platform is going up!
/s
SOLAR_FIELDS|2 years ago
TRiG_Ireland|2 years ago
ChildOfChaos|2 years ago
garrinm|2 years ago
soared|2 years ago
garrinm|2 years ago
unknown|2 years ago
[deleted]