This seems to miss the other side of why all this failed before.
Rdf has the same problems as the sql schemas with information scattered. What fields mean requires documentation.
There - they have a name on a person. What name? Given? Legal? Chosen? Preferred for this use case?
You only have one id for apple eh? Companies are complex to model, do you mean apple just as someone would talk about it? The legal structure of entities that underpins all major companies, what part of it is referred to?
I spent a long time building identifiers for universities and companies (which was taken for ROR later) and it was a nightmare to say what a university even was. What’s the name of Cambridge? It’s not “Cambridge University” or “The university of Cambridge” legally. But it also is the actual name as people use it. The university of Paris went from something like 13 institutes to maybe one to then a bunch more. Are companies locations at their headquarters? Which headquarters?
Someone will suggest modelling to solve this but here lies the biggest problem:
The correct modelling depends on the questions you want to answer.
Our modelling had good tradeoffs for mapping academic citation tracking. It had bad modelling for legal ownership. There isn’t one modelling that solves both well.
And this is all for the simplest of questions about an organisation - what is it called and is it one or two things?
Indeed, I often get the impression that (young) academics want to model the entire world in RDF. This can't work because the world is very ambiguous.
Using it to solve specific problems is good. A company I work with tries to do context engineering / adding guard rails to LLMs by modeling the knowledge in organizations, and that seems very promising.
The big question I still have is whether RDF offers any significant benefits for these way more limited scopes. Is it really that much faster, simpler or better to do queries on knowledge graphs rather than something like SQL?
To adapt the saying, an engineer is talking to another engineer about is system, saying he's having issues with names. So he's thinking of using name spaces.
As the article itself points out, this has been around for 25 years. It isn’t an accident that nobody does things this way, it wasn’t an oversight.
I worked on semantic web tech back in the day, the approach has major weaknesses and limitations that are being glossed over here. The same article touting RDF as the missing ingredient has been written for every tech trend since it was invented. We don’t need to re-litigate it for AI.
I would be very interested in reading what you think it can't work. I am inclined to agree with the post on a sibling thread that mentions that the main problem with RDF is that it is been captured by academia.
I'm completely out of time or energy for any side project at the moment, but if someone wants to steal my idea: please take an llm model and fine tune so that it can take any question and turn it into a SparQL query for Wikidata. Also, make a web crawler that reads the page and turns into a set of RDF triples or QuickStatements for any new facts that are presented. This would effectively be the "ultimate information organizer" and could potentially turn Wikidata into most people's entry page of the internet.
RDF is great but it's somewhat inadvertently captured by academia.
The tooling is not in a state where you can use it for any commercial or mission critical application. The tooling is mainly maintained by academics, and their concerns run almost exactly counter to normal engineering concerns.
An engineer would rather have tooling with limited functionality that is well designed and behaves correctly without bugs.
Academics would rather have tooling with lots of niche features, and they can tolerate poor design, incorrect behavior and bugs. They care more for features, even if they are incorrect, as they need to publish something "novel".
The end result is that almost all things you find for RDF is academia quality and lots of it is abandoned because it was just part of publication spam being pumped and dumped by academics that need to publish or perish.
Anyone who wants to use it commercially really has to start from scratch almost.
I worked for a company that went hard into "Semantic Web" tech for libraries (as in, the places with books), using an RDF Quad Store for data storage (OpenLink Virtuoso) and structuring all data as triples - which is a better fit for the Heirarchical MARC21 format than a relational database.
There are a few libraries (the software kind) out there that follow the W3 spec correctly, Redland being one of them.
I really like RDF in theory, as a lot of its ideas just make sense to me:
- Using URIs to clarify ambiguous IDs and terms
- EAV or subject/verb/object representation for all knowledge
- "Open world" graph where you can munge together facts from different sources
I guess using RDF specifically, instead of just inventing your own graph database with namespaced properties, means using existing RDF tooling and languages like SPARQL, OWL, SHACL etc.
Having looked into the RDF ecosystem to see if I can put something together for a side project inspired by https://paradicms.github.io, it really feels like there's a whole shed of tools out there, but the shed is a bit dingy, you can't really tell the purpose of the oddly-shaped tools you can see, nobody's organised and laid things out in a clear arrangement and, well, everything seems to be written in Java, which shouldn't be a huge issue but really isn't to my taste.
The sibling comment by flanked-evergl "RDF is great but it's somewhat inadvertently captured by academia." is made manifestly obvious when reading this spec.
It's overburdened by terminology, an exponential explosion of nested definitions, and abstraction to the point of unintelligibility.
It is clear that the authors have implementation(s) of the spec in mind while writing, but very carefully dance around it and refuse to be nailed down with pedestrian specifics.
I'm reminded of the Wikipedia mathematics articles that define everything in terms of other definitions, and if you navigate to those definitions you eventually end up going in circles back to the article you started out at, no wiser.
RDF provides a very natural layer for AI systems. Today, LLM-based AI systems are fundamentally challenged by hallucinations, which makes RDF-based knowledge graphs—constructed using Linked Data Principles—a powerful complement. By using hyperlinks to denote edges and nodes, these graphs enable context enrichment through ontology lookups combined with reasoning and inference.
> The Big Picture: Knowledge graphs triple LLM accuracy on enterprise data. But here’s what nobody tells you upfront: every knowledge graph converges on the same patterns, the same solutions. This series reveals why RDF isn’t just one option among many — it’s the natural endpoint of knowledge representation. By Post 6, you’ll see real enterprises learning this lesson at great cost — or great savings.
If you really want to continue reading and discuss this kind of drivel, go ahead. RDF the "natural endpoint of knowledge representation" right. As someone having worked on commercial RDF projects at the time, after two decades of pushing RDF by a self-serving W3C and academia until around 2018 or so, let's say I welcome people having come to their senses and are back at working with Datalog and Prolog. Even as a target for neurolinguistics and generation by coding LLMs does SPARQL suck because of its idiosyncratic, design-by-comittee nature compared to the minimalism and elegance of Prolog.
Author listed RDF a couple dozen of times but didn’t define it, so:
The Resource Description Framework (RDF) is a standard model for data interchange on the web, designed to represent interconnected data using a structure of subject-predicate-object triples. It facilitates the merging of data from different sources and supports the evolution of schemas over time without requiring changes to all data consumers.
> The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C).
Resource Description Framework [1] is basically a way to describes resources with (subject, verb, object) predicates, where subject is the resource being described and object is another resource related to the subject in a way verb defines (verb is not necessarily a grammatical verb/action, it's often a property name).
There are several formats to represent these predicates (Turtle), database implementations, query languages (SPARQL), and there are ontologies, which are schemas, basically, defining/describing what how to describe resource in some domain.
It's highly related to the semantic web vision of the early 2000s.
If you don't know about it, it is worth taking a few minutes to study it. It sometimes surfaces and it's nice to understand what's going on, it can give good design ideas, and it's an important piece of computer history.
It's also the quiet basis for many things, OpenGraph [3] metadata tags in HTML documents are basically RDF for instance. (TIL about RDFa [4] btw, I had always seen these meta tags as very RDF-like, for a good reason indeed).
We meet this casual use of acronyms all too often on HN. It only takes a line or two to enable everyone to follow along without recourse to a search expedition.
Five times in that article he says some version of “Accuracy triples”.
What does that even mean? Suppose something 97% accurate became 99.5% accurate? How can we talk of accuracy doubling or tripling in that context? The only way I could see that working is if the accuracy of something went from say 1% to 3% or 33% to 99%. Which are not realistic values in the LLM case.
(And I’m writing as a fan of knowledge graphs).
IanCal|5 months ago
Rdf has the same problems as the sql schemas with information scattered. What fields mean requires documentation.
There - they have a name on a person. What name? Given? Legal? Chosen? Preferred for this use case?
You only have one id for apple eh? Companies are complex to model, do you mean apple just as someone would talk about it? The legal structure of entities that underpins all major companies, what part of it is referred to?
I spent a long time building identifiers for universities and companies (which was taken for ROR later) and it was a nightmare to say what a university even was. What’s the name of Cambridge? It’s not “Cambridge University” or “The university of Cambridge” legally. But it also is the actual name as people use it. The university of Paris went from something like 13 institutes to maybe one to then a bunch more. Are companies locations at their headquarters? Which headquarters?
Someone will suggest modelling to solve this but here lies the biggest problem:
The correct modelling depends on the questions you want to answer.
Our modelling had good tradeoffs for mapping academic citation tracking. It had bad modelling for legal ownership. There isn’t one modelling that solves both well.
And this is all for the simplest of questions about an organisation - what is it called and is it one or two things?
jtwaleson|5 months ago
Using it to solve specific problems is good. A company I work with tries to do context engineering / adding guard rails to LLMs by modeling the knowledge in organizations, and that seems very promising.
The big question I still have is whether RDF offers any significant benefits for these way more limited scopes. Is it really that much faster, simpler or better to do queries on knowledge graphs rather than something like SQL?
simonw|5 months ago
I went looking and as far as I can tell "The Chancellor, Masters, and Scholars of the University of Cambridge" is the official name! https://www.cam.ac.uk/about-the-university/how-the-universit...
dwaite|5 months ago
Coincidentally, my main point in any conversation about UML I've ever had
AtlasBarfed|5 months ago
To adapt the saying, an engineer is talking to another engineer about is system, saying he's having issues with names. So he's thinking of using name spaces.
Now he has two problems
jandrewrogers|5 months ago
I worked on semantic web tech back in the day, the approach has major weaknesses and limitations that are being glossed over here. The same article touting RDF as the missing ingredient has been written for every tech trend since it was invented. We don’t need to re-litigate it for AI.
rglullis|5 months ago
rglullis|5 months ago
I'm completely out of time or energy for any side project at the moment, but if someone wants to steal my idea: please take an llm model and fine tune so that it can take any question and turn it into a SparQL query for Wikidata. Also, make a web crawler that reads the page and turns into a set of RDF triples or QuickStatements for any new facts that are presented. This would effectively be the "ultimate information organizer" and could potentially turn Wikidata into most people's entry page of the internet.
luguenth|5 months ago
Here you are :)
IanCal|5 months ago
flanked-evergl|5 months ago
The tooling is not in a state where you can use it for any commercial or mission critical application. The tooling is mainly maintained by academics, and their concerns run almost exactly counter to normal engineering concerns.
An engineer would rather have tooling with limited functionality that is well designed and behaves correctly without bugs.
Academics would rather have tooling with lots of niche features, and they can tolerate poor design, incorrect behavior and bugs. They care more for features, even if they are incorrect, as they need to publish something "novel".
The end result is that almost all things you find for RDF is academia quality and lots of it is abandoned because it was just part of publication spam being pumped and dumped by academics that need to publish or perish.
Anyone who wants to use it commercially really has to start from scratch almost.
philjohn|5 months ago
I worked for a company that went hard into "Semantic Web" tech for libraries (as in, the places with books), using an RDF Quad Store for data storage (OpenLink Virtuoso) and structuring all data as triples - which is a better fit for the Heirarchical MARC21 format than a relational database.
There are a few libraries (the software kind) out there that follow the W3 spec correctly, Redland being one of them.
ragebol|5 months ago
jraph|5 months ago
Uh. Do you have a source for this? Correctness is a major need in academia.
crabmusket|5 months ago
- Using URIs to clarify ambiguous IDs and terms
- EAV or subject/verb/object representation for all knowledge
- "Open world" graph where you can munge together facts from different sources
I guess using RDF specifically, instead of just inventing your own graph database with namespaced properties, means using existing RDF tooling and languages like SPARQL, OWL, SHACL etc.
Having looked into the RDF ecosystem to see if I can put something together for a side project inspired by https://paradicms.github.io, it really feels like there's a whole shed of tools out there, but the shed is a bit dingy, you can't really tell the purpose of the oddly-shaped tools you can see, nobody's organised and laid things out in a clear arrangement and, well, everything seems to be written in Java, which shouldn't be a huge issue but really isn't to my taste.
mdhb|5 months ago
Hopefully version 1.2 which addresses a lot of shortcomings should officially be a thing this year.
In the meantime you can take a look at some of the specification docs here https://w3c.github.io/rdf-concepts/spec/
jiggawatts|5 months ago
It's overburdened by terminology, an exponential explosion of nested definitions, and abstraction to the point of unintelligibility.
It is clear that the authors have implementation(s) of the spec in mind while writing, but very carefully dance around it and refuse to be nailed down with pedestrian specifics.
I'm reminded of the Wikipedia mathematics articles that define everything in terms of other definitions, and if you navigate to those definitions you eventually end up going in circles back to the article you started out at, no wiser.
kidehen2|5 months ago
For a detailed post on this synergy, see: https://www.linkedin.com/pulse/large-language-models-llms-po...
Disclaimer: I am the Founder & CEO of OpenLink Software, creators of Virtuoso.
tannhaeuser|5 months ago
> The Big Picture: Knowledge graphs triple LLM accuracy on enterprise data. But here’s what nobody tells you upfront: every knowledge graph converges on the same patterns, the same solutions. This series reveals why RDF isn’t just one option among many — it’s the natural endpoint of knowledge representation. By Post 6, you’ll see real enterprises learning this lesson at great cost — or great savings.
If you really want to continue reading and discuss this kind of drivel, go ahead. RDF the "natural endpoint of knowledge representation" right. As someone having worked on commercial RDF projects at the time, after two decades of pushing RDF by a self-serving W3C and academia until around 2018 or so, let's say I welcome people having come to their senses and are back at working with Datalog and Prolog. Even as a target for neurolinguistics and generation by coding LLMs does SPARQL suck because of its idiosyncratic, design-by-comittee nature compared to the minimalism and elegance of Prolog.
zekrioca|5 months ago
The Resource Description Framework (RDF) is a standard model for data interchange on the web, designed to represent interconnected data using a structure of subject-predicate-object triples. It facilitates the merging of data from different sources and supports the evolution of schemas over time without requiring changes to all data consumers.
jraph|5 months ago
verisimi|5 months ago
> The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C).
https://www.wikipedia.org/wiki/Resource_Description_Framewor...
retube|5 months ago
jraph|5 months ago
There are several formats to represent these predicates (Turtle), database implementations, query languages (SPARQL), and there are ontologies, which are schemas, basically, defining/describing what how to describe resource in some domain.
It's highly related to the semantic web vision of the early 2000s.
If you don't know about it, it is worth taking a few minutes to study it. It sometimes surfaces and it's nice to understand what's going on, it can give good design ideas, and it's an important piece of computer history.
It's also the quiet basis for many things, OpenGraph [3] metadata tags in HTML documents are basically RDF for instance. (TIL about RDFa [4] btw, I had always seen these meta tags as very RDF-like, for a good reason indeed).
[1] https://en.wikipedia.org/wiki/Resource_Description_Framework
[2] https://en.wikipedia.org/wiki/Semantic_Web
[3] https://ogp.me/
[4] https://en.wikipedia.org/wiki/RDFa
vixen99|5 months ago
epolanski|5 months ago
For the interested: resource description framework.
Kwpolska|5 months ago
barrenko|5 months ago
ricksunny|5 months ago
What does that even mean? Suppose something 97% accurate became 99.5% accurate? How can we talk of accuracy doubling or tripling in that context? The only way I could see that working is if the accuracy of something went from say 1% to 3% or 33% to 99%. Which are not realistic values in the LLM case. (And I’m writing as a fan of knowledge graphs).
never_inline|5 months ago
Emma_Schmidt|5 months ago
[deleted]
Michael_Keller|5 months ago
[deleted]
evrennetwork|5 months ago
[deleted]