top | item 23995885

(no title)

matteuan | 5 years ago

For research, I created experimental RDF storage on top of Parquet and Apache Spark for querying big graphs[1]. It converts the RDF graph in a sort of property graph, where we have a row for each entity and where the columns are the all possible properties. The trick is to use a columnar format with the proper encoding (in our case Parquet), to solve the problem of having a lot of columns and a huge NULLs space. With this representation we can eliminate costly joins for most of the common queries, but also reduce the size of the necessary ones.

[1] PRoST https://github.com/tf-dbis-uni-freiburg/PRoST

discuss

No comments yet.