I considered different ways of comparison here, and decided to go with a simple comparison on sample dataset, just to get a feel of it. I did consider doing it all in Python, but then it’s not fair towards Memgraph. Also, it depends on the query we are performing. I could have run a much more complicated query which would give better results, but then again it wouldn’t be fair. If I removed the time counted for filling the digraph, then just the pure algorithm time would be calculated, and the main difference between NetworkX and Memgraph is that Memgraph offers persistance, while NetworkX always has to load the graph into memory. It can be further discussed what would be the best way to do a true benchmark and on what kind of dataset. I did not go into details of the graph type here, but there are for sure cases where Memgraph outperforms NetworkX on much higher scale and on certain graph types. I didn’t claim that we are 5 times faster in any case, just in this certain case. When I do a proper benchmark in the future, I will make sure to be as fair as possible to both sides, and of course to showcase better when to use Memgraph, and when NetworkX, since it all depends on your needs.Also, thanks for reading it, it means a lot to hear such comment. I get to learn from it too :)
zwaps|3 years ago
Indeed, there are some points in your reply which I think would fit better to a networkx vs. memgraph comparison post.
That being said, your blogpost is titled "Who ranks better?" and it is mainly about the speed of running a PageRank.
Networkx is a no frills Python package that is much easier to use and experiment on. Outperforming networkx in speed is not really a feat, however any new network package should certainly do this. And further, do this with networkx on equal footing. For instance, igraph outperforms networkX in pageRank by 20 times, and graph-tool by over 50 times (without load times)!
And I am sure memgraph can do so as well, just that this blog post doesn't seem to conclusively demonstrate that fact.
It would make little sense to me to use networkx as a tool to load data from memgraph. And to be honest, using this triple (quadruple) Python list operation and not even use the numpy-based performance that networkx does offer (little as it may be) just doesn't seem right.
I understand that memgraph has other advantages, like persistence, however in that case networkX is simply not a good comparison. If that's the focus, why not query a local Neo4j? That's gonna be a pretty speedy PageRank as well and an interesting challenge.
All in all, I am sure Memgraph performs great, and I am looking forward to other comparisons in the future!
katelatte|3 years ago