top | item 37884541

Ask HN: Extracting Knowledge Graphs from LLMs

7 points| lagrange77 | 2 years ago

Today i was asking my LLM about some physics concepts and found myself repeatedly asking it for the relation between some of those concepts.

Now i thought about automating this and programatically 'scanning' the LLMs implicit knowledge (of a specific domain) and compile it to some kind of knowledge graph - e.g. an entity relationship diagram of physics concepts.

Could be interesting. With the right scanning technique, it's maybe possible to extract a semantic representation of all of the LLMs 'knowledge' or the information in a document.

Has anyone of you already dealt with sth. like this?

6 comments

[+] simonmesmith|2 years ago|reply

Having worked on this problem in biology, I think one of the challenges you’ll find is that the knowledge graph will be extremely context-dependent and biased towards highly probable nodes and edges.

For example, if you ask an LLM to create a graph of all proteins related to X disease, and show how they interact, it will oblige. (You can try this yourself easily in the OpenAI playground. Just ask it to send you back a list like X -> Y -> Z or whatever. Or an array of source/target/relation triplets.)

The challenge is that what you get will be very dependent on how you phrase your request. So you’ll never know if you’re getting a “complete” graph or just the most probable graph for the request you made. If you’re an expert in the domain, you’ll know, but if you’re an expert you might not need the graph in the first place.

[+] james-revisoai|2 years ago|reply

Look at the Kaggle competitions that use embeddings, such as those by the Learning Agency. There was also a paper two weeks back and one two days back about inverting embeddings back to original semantic forms.

LLMs are very good at dealing with contextual polysemy, the catch is that an embedding of say a topic, will be far from the topic in it's different possible context. So a knowledge graph would be possible, but how you would find these possible areas, or why you would constrain it, is sort of another question.

Now if you are just asking about education, you can get it to generate lists of relations, and map those in concept maps etc(quite a few tools do this), but that's pretty superficial, as such...

FWIW back when LLMs were more prone to hallucination, the convex hull of the embddings of known ground truth statements was more likely to contain truthful generations and relevant generations than those outside of it when I worked on a quiz-generating application in 2020/21 doing this.

In my opinion though you should try to embrace this malleable nature rather than constrain it...

[+] lagrange77|2 years ago|reply

Thanks for your input!

> Now if you are just asking about education, you can get it to generate lists of relations, and map those in concept maps etc(quite a few tools do this), but that's pretty superficial, as such...

Right, this would use the LLMs inherent functionality to find the most probable output for the prompt and hence could hide or hallucinate info.

What i am after is a more systematic and reliable approach to 'scrape' the models knowledge, without relying on it's best guess for a broad prompt like 'Compile a concept map of classical mechanics.'

[+] ash-ishh|2 years ago|reply

Checkout InstaGraph by Yohei: https://github.com/yoheinakajima/instagraph

Sample: https://twitter.com/yoheinakajima/status/1706848028014068118

[+] lagrange77|2 years ago|reply

Just found this on HN:

Spires: Building structured knowledge bases from unstructured text using LLMs

[https://news.ycombinator.com/item?id=37929351]

[+] birdplanellama|2 years ago|reply

look into tostino/Inkbot on hf