(no title)
pickpuck | 3 months ago
Just like here you could get a timeline of key events, a graph of connected entities, links to original documents.
Newsrooms might already do this internally idk.
This code might work as a foundation. I love that it's RDF.
VikingCoder|3 months ago
Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus
darth_aardvark|3 months ago
throwaway290|3 months ago
jandrewrogers|3 months ago
These general data models start to become useful and interesting at around a trillion edges, give or take an order of magnitude. A mature graph model would be at least a few orders of magnitude larger, even if you aggressively curated what went into it. This is a simple consequence of the cardinality of the different kinds of entities that are included in most useful models.
No system described in open source can get anywhere close to even the base case of a trillion edges. They will suffer serious scaling and performance issues long before they get to that point. It is a famously non-trivial computer science problem and much of the serious R&D was not done in public historically.
This is why you only see toy or narrowly focused graph data models instead of a giant graph of All The Things. It would be cool to have something like this but that entails some hardcore deep tech R&D.
michelpp|3 months ago
To plug my project, I've wrapped the SuiteSparse GraphBLAS library in a postgres extension [1] that fluidly blends algebraic graph theory with the relational model, the main flow is to use sql to structure complex queries for starting points, and then use the graphblas to flow through the graph to the endpoints, then joining back to tables to get the relevant metadata. On cheap hetzner hardware (amd epyc 64 core) we've achieved 7 billion edges per second BFS over the largest graphs in the suitesparse collection (~10B edges). With our cuda support we hope to push that kind of performance into graphs with trillions of edges.
[1] https://github.com/OneSparse/OneSparse
babelfish|3 months ago
stevage|3 months ago
That is a wild claim. Perhaps for some very specific definition of "useful and interesting"? This dataset is already interesting (hard to say whether it's useful) at a much tinier scale.
mmooss|3 months ago
Could you point us to any public research on this issue? Or the history of the proprietary research? Just the names might help - maybe there are news articles, it's a section in someone's book, etc.
theteapot|3 months ago
Aren't LLMs something like this?
afavour|3 months ago
https://developer.nytimes.com/docs/semantic-api-product/1/ov...
The Guardian has similar:
https://open-platform.theguardian.com/documentation/tag
Either or both could be an interesting starting point for something like that. I tried to find something for the BBC and was surprised they didn’t have anything. I would have figured public media would have been a great resource for this.
pjc50|3 months ago
ggm|3 months ago
That said, some networks of shorter paths than 6 are interesting. Right now, there's a 1:1 direct path from these documents to a bunch of people with an interest in confounding what evidentiary value they have in justice processes. That's more interesting to me, than what the documents say right now.
johongo|3 months ago
Centigonal|3 months ago
https://www.gdeltproject.org/
pbronez|3 months ago
scotty79|3 months ago
FanaHOVA|3 months ago
j-pb|3 months ago
axus|3 months ago
PaulHoule|3 months ago
cjohnson318|3 months ago
arthurcolle|3 months ago
UK: https://github.com/gchq/Gaffer
US: https://github.com/NationalSecurityAgency/lemongraph
dboreham|3 months ago
abnercoimbre|3 months ago
fancy_pantser|3 months ago