top | item 23716225

(no title)

Ninjaneered | 5 years ago

For reference, this is from the same developer [1] that created Semantic MediaWiki [2] and lead the development of Wikidata [3]. Here's a link to the white paper [4] describing Abstract Wikipedia (and Wikilambda). Considering the success of Wikidata, I'm hopeful this effort succeeds, but it is pretty ambitious.

[1] https://meta.wikimedia.org/wiki/User:Denny

[2] https://en.wikipedia.org/wiki/Semantic_MediaWiki

[3] https://en.wikipedia.org/wiki/Wikidata

[4] https://arxiv.org/abs/2004.04733

discuss

order

xiler|5 years ago

O_H_E|5 years ago

Damn. Big kudos to Denny.

And to all the other people doing awesome work but not on the top of HN.

gcbw3|5 years ago

Considering the close relationship with Google and Wikimedia https://en.wikipedia.org/wiki/Google_and_Wikipedia and the considerable money Google gives them, how can one not see this project as "crowdsourcing better training data-sets for Google?"

Can the data be licensed as GPL-3 or similar?

9nGQluzmnq3M|5 years ago

As a long-time Wikipedian, this track record is actually worrisome.

Semantic Mediawiki (which I attempted to use at one point) is difficult to work with and far too complicated and abstract for the average Wiki editor. (See also Tim Berners-Lee and the failure of Semantic Web.)

WikiData is a seemingly genius concept -- turn all those boxes of data into a queryable database! -- kneecapped by academic but impractical technology choices (RDF/SPARQL). If they had just dumped the data into a relational database queryable by SQL, it would be far more accessible to developers and data scientists.

mmarx|5 years ago

> WikiData is a seemingly genius concept -- turn all those boxes of data into a queryable database! -- kneecapped by academic but impractical technology choices (RDF/SPARQL). If they had just dumped the data into a relational database queryable by SQL, it would be far more accessible to developers and data scientists.

Note that the internal data format used by Wikidata is _not_ RDF triples [0], and it's also highly non-relational, since every statement can be annotated by a set of property-value pairs; the full data set is available as a JSON dump. The RDF export (there's actually two, I'm referring to the full dump here) maps this to RDF by reifying statements as RDF nodes; if you wanted to end up with something queryable by SQL, you would also need to resort to reification – but then SPARQL is still the better choice of query language since it allows you to easily do path queries, whereas WITH RECURSIVE at the very least makes your SQL queries quite clunky.

[0] https://www.mediawiki.org/wiki/Wikibase/DataModel [1] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Fo...

zozbot234|5 years ago

How do you dump general purpose, encyclopedic data into a relational database? What database schema would you use? The whole point of "triples" as a data format is that they're extremely general and extensible.

LukeEF|5 years ago

RDF shouldn't be lumped in with SPARQL