For research, I created experimental RDF storage on top of Parquet and Apache Spark for querying big graphs[1].
It converts the RDF graph in a sort of property graph, where we have a row for each entity and where the columns are the all possible properties.
The trick is to use a columnar format with the proper encoding (in our case Parquet), to solve the problem of having a lot of columns and a huge NULLs space. With this representation we can eliminate costly joins for most of the common queries, but also reduce the size of the necessary ones.[1] PRoST
https://github.com/tf-dbis-uni-freiburg/PRoST
No comments yet.