The hunt for the missing data type

[+] graphviz|2 years ago|reply

Graphviz has its own foundation graph library, that's not used by any other project. It has its good and bad points.

Based on that experience, we had our very own second-system-syndrome experience.

We decided our graph library should be modular, type safe, and efficient. (These properties came up in the comments here, too.) This is probably just a variation of "good, fast, cheap - pick any two."

By modular, want to write collections of graph algorithm libraries that are developed and even compiled independently.

By type safe, we mean we want to detect programming errors during compilation, or link time at the latest. We don't want programs to throw runtime errors like "your node does not have a color attribute".

By efficient, we mean that accessing an attribute of a graph is as cheap as a field in a C struct. (So, we don't want to carry around external hash table, or do a lot of string conversions, for instance.)

You can argue about whether these things are worth the price or even make sense, but that's what we wanted. We had some famous C++ creators in our lab, so we figured we could get help, and we were willing to give C++ another chance.

Gordon Woodhull, who had been an intern and kept working for us, is a brilliant programmer, and wrote an implementation of this kind of graph library working in templated C++. It's even published at https://www.dynagraph.org/ via sourceforge. The rest of us were not really sure we could ever understand how the code worked, so we had a code review with said famous C++ inventors. There were a lot of screens of code, and chinstroking, and greybeards pronounced "That would probably work." We knew we might have gone over the complexity cliff. (Let's not even talk about compile-time template errors, where one error fills an entire screen with details that only a... C++ inventor could love.) It was our fault, not anyone else's, and Gordon kept plugging away and even made all the dynamic graph layout stuff work, in Microsoft OLE, too. In hindsight it was probably our own Project Xanadu. While we got lost in this, a lot of things like Gephi (Java) and NetworkX and NetworKit (python) happened.

Also, John Ellson, a very talented software engineer who had written parts of Graphviz, revitalized the main effort.

[+] obi1kenobi|2 years ago|reply

As someone who did a lot of work with graphs, "why don't programming languages have a built-in graph data type?" is a question I’ve been asked a million times.

I'm thrilled I'll be able to point folks to a much more in-depth analysis like the one here, instead of saying some variation of "it's really hard to do it well" and having them just take my word for it.

[+] dataflow|2 years ago|reply

> "why don't programming languages have a built-in graph data type?"

What I find a little funny about that question is that people miss the fact that there isn't even a tree data structure in most languages. Most languages have static arrays, dynamic arrays, linked lists, and... that's it as far as structural types go. Everything else (BSTs, hashtables, etc.) is some semantic abstraction that hides some capabilities of the underlying structure, not a purely structural representation.

[+] jjice|2 years ago|reply

I used to think that since graphs are such a broad data structure that can be represented in different ways depending on requirements that it just made more sense to implement them at a domain-ish level (the article mentions this in the "There are too many implementation choices" section).

Then I saw Petgraph [0] which is the first time I had really looked at a generic graph library. It's very interesting, but I still have implemented graphs at a domain level.

[0] https://github.com/petgraph/petgraph

[+] bloaf|2 years ago|reply

I had an opposite experience. I was doing some new-to-me graph work in tcl and had just assumed the tcl standard library wouldn't have any graph algorithms. Come to find out I was wrong, and ended up saving myself a bunch of time not having to re-invent the wheel:

https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tclli...

[+] paulddraper|2 years ago|reply

> "it's really hard to do it well"

More importantly, there are a lot of tradeoffs.

Virtually every language offers a hash map. You can roll your own to outperform in an individual circumstance, the default works pretty well.

You can't really do that with a graph. Maybe if you offered a bunch of graph types.

---

PS. Bit of trivia: Java's HashMap is a bit different from almost every other language in that it lets you tune the load factor.

[+] somat|2 years ago|reply

This is a super naive take. but I would consider the pointer the be the native graph type. What is wanted is not a graph type but the tooling to traverse graphs.

[+] sayrer|2 years ago|reply

I think it's because you have to think it through when they get too big for one machine. A lot of those so-called "NoSQL" databases are really meant to represent graph databases (think DynamoDB) in an adjacency list. I've had success with Gremlin and JanusGraph, but it's a really messy space. It's not a programming language problem in my opinion, but a distributed systems one.

[+] atoav|2 years ago|reply

Puthon has one, as mentioned, but as a non CS-person I never encountered any problem in my programming-life that so fundamentally called for graphs that I needed one, not did I had so much fascination for graphs that it was a tool I wanted to force onto my problems.

I guess it is just that there are many problems that can be solved incredibly well without graphs and not that many where graphs outshine everything else so clearly that people would use them.

That being said, convince me of the opposite and I am happy.

[+] ylow|2 years ago|reply

I think this is because a graph is not a data-structure nor a data-type. It is really an abstraction.

Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v). And that really is all is needed for the most foundational set of graph algorithms.

Everything else are case-by-case constraints. Does A->B imply B->A? is the node set partitionable with certain constraints? Are there colors? labels?

To make things even more general I can go up one level and consider the hypergraph. In which case I just have a set of vertices, and a set of sets of vertices. This can be represented in a myriad of different ways depending on what you are interested in. Of which (non-hyper) graph is simply a special case.

An alternative way to think about it perhaps from the database perspective, is that its a query optimization and indexing problem. Depending on what questions you want to ask about the graph, there will be different ways to represent the graph to answer the question better. Just like there is not one way to represent the abstraction called "Table", there is not one way to do "Graph" either. It really depends on the questions you care about.

[+] gloftus|2 years ago|reply

Yes, graphs are ubiquitous because they are so abstract. They live on the same level of abstraction as pure numbers. There are useful "numerical" libraries that exist, and by analogy I think you could say there are also useful "graphical" libraries that exist. But we don't really have "number" libraries, and we don't really have "graph" libraries, because those concepts are a bit too abstract to write APIs against.

[+] dataflow|2 years ago|reply

> Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v).

Even that is severely overconstrained. It doesn't allow multiple edges to the same neighbor!

[+] matheusmoreira|2 years ago|reply

> hypergraph

> I just have a set of vertices

> and a set of sets of vertices

Sounds kind of like a file system to me. Files are the vertices. Directories are the nestable sets of vertices.

[+] rhelz|2 years ago|reply

Ya, the central obstacle is that:

1. for simple and small graph problems, a simple vector-of-vectors adjacency list is easy enough to code up.

2. For complex and huge graph problems, the only way to get performant solutions is to tailor the graph implementation to the specific details of the problem to be solved.

And its hard to see what kind of language support would help, other than just having a super-smart compiler which could analyze the code and determine whether an adjacency list, matrix, 3d array, etc was the best way to implement it. That's the kind of optimization which we won't see in compilers for a while.

It's another instance of the phenomenon which Strousroup noticed: we are really good at code sharing of small things like vectors, and of large things like operating systems. Its the middle-sized problems we are bad at.

[+] dustingetz|2 years ago|reply

Electric Clojure uses Clojure itself (s-expressions) as a graph authoring syntax, using a macro to reify dataflow through a reactive client/server system (here the use case is full stack user interfaces but the idea generalizes) https://github.com/hyperfiddle/electric (I'm the founder).

IMO, the answer to the question "Where are all the graph types?" is: the graph authoring DSL needs to express scope, control flow and abstraction, which essentially makes it isomorphic to a programming language, freed of its evaluation model. In Python and Typescript, embedding a complete programming language is something that's rather hard to do!

Also see my blog post "Four problems preventing visual flowchart programming from expressing web applications" https://www.dustingetz.com/#/page/four%20problems%20preventi...

[+] hobofan|2 years ago|reply

I think the post mostly answers the questions "why are graph _algorithms_ not better supported in programming languages", with a focus that is much more on "big data" graph processing than graph support in general.

I think if you look at graph support in general you are also looking at wider questions, like "why are OGMs (Object Graph Mappers) not as popular as ORMs" and "why is JSON so prevalent while RDF (or another low-level graph serialization) isn't"?

And I think in the end it comes down to historic reasons (RDF emerged a bit too early and never evolved and accrued an ecosystem of horrible academic standards and implementations), and a graphs having just a smidge more of inherent complexity in implementation and learning curve that just doesn't scale well across many developers.

------

I also wouldn't put too much weight on the "Graph Querying Language" part of the article. It sadly reads like exactly the marketing copy you would read from Neo4J or SPARQL enthusiasts that haven't tried building a product on top of it.

> The main difference between all GQLs and SQL is that the “joins” (relationships) are first-class entities.

Joins are first-class entities in SQL. They even have their own keyword (hint: it starts with J and ends with OIN) ;)

If you also go to the lower levels of any graph query language and look at it's query plans you'll notice that there isn't any meaningful difference to that of one you'll find in an SQL based query. The standardization of GQL[0] as an SQL extension is evidence for that.

> In SPARQL relationships are just edges, making the same query easy.

SPARQL is easy if you need to do exact path traversals. If you try to do anything sophisticated with it (like you would in the backend of a webapp), you'll quickly run into footguns like joins with unbound values and you accidently join your whole result set away.

[0]: https://en.wikipedia.org/wiki/Graph_Query_Language

[+] criloz2|2 years ago|reply

Graph drawing tools are also very underwhelming, they work pretty good for small graphs until you have something like 500 nodes or more, then eventually their output becomes complete incompressible or very difficult to look at it, they miss the ability to automatically organize those graph in hierarchical structures and provide a nice interface to explore them, we are used that everything around us have some kind of hierarchy, I think that is the same kind of problem that will need to be solved in order to have a generic graph data type, also this kind of thing will need to be implemented at the compiler level where those graph generic algos will be adapted to the generated hierarchy of structures, and if you add a theorem prover that can check that certain subgraph will always have certain structures you can statically generated those procedures and for the other super graphs those methods will be generated dynamically at runtime.

So whoever solve the problem for generic graph drawing will have the ability or the insight to implement this too.

[+] kjqgqkejbfefn|2 years ago|reply

>Graph drawing tools

It's hard

Graphviz-like generic graph-drawing library. More options, more control.

https://eclipse.dev/elk/

Experiments by the same team responsible for the development of ELK, at Kiel University

https://github.com/kieler/KLighD

Kieler project wiki

https://rtsys.informatik.uni-kiel.de/confluence/display/KIEL...

Constraint-based graph drawing libraries

https://www.adaptagrams.org/

JS implementation

https://ialab.it.monash.edu/webcola/

Some cool stuff:

HOLA: Human-like Orthogonal Network Layout

https://ialab.it.monash.edu/~dwyer/papers/hola2015.pdf

Confluent Graphs demos: makes edges more readable.

https://www.aviz.fr/~bbach/confluentgraphs/

Stress-Minimizing Orthogonal Layout of Data Flow Diagrams with Ports

https://arxiv.org/pdf/1408.4626.pdf

Improved Optimal and Approximate Power Graph Compression for Clearer Visualisation of Dense Graphs

https://arxiv.org/pdf/1311.6996v1.pdf

[+] samatman|2 years ago|reply

Some algorithms do better at this than others, but "make a good diagram of a graph" is an intelligence-complete problem in the general case. Two people might render structurally-identical graphs in very different ways, to emphasize different aspects of the data. This is in fact a similar problem to the "generic graph algorithm" and "generic graph data structure" problems.

Graphs straddle the line between code and data. For instance, any given program has a call graph, so in a real sense, the "generic graph algorithm" is just computation.

[+] nine_k|2 years ago|reply

Ideal things are often tree-like. Real-world structures are usually DAGs if they are nice and well-behaved.

Making things planar, or almost planar with few crossings and nice clustering of related nodes, is usually hard past a couple dozen nodes :(

[+] hobofan|2 years ago|reply

> we are used that everything around us have some kind of hierarchy

I think the problem is more that we are used to the illusion/delusion that everything is hierarchical. The problem that we then encouter is with graph drawing is that it has to try and reconcile the fact that things in practice are rarely really hierarchical, and it's hard to draw those lines of where the hierarchies are with mathematical rigor. And that problem gets worse and worse the less properties you are allowed to assume about the underlying graph structure (connectedness, cyclic/acyclic, sparse/dense).

In practice when you want build a UI that interacts with graphs it's often feasible to determine/impose one or two levels of meta-hierarchy with which you can do clustering (allows for reducing layout destroying impact of hairball nodes + improves rendering performance by reducing node count) and layout with fCOSE (Cytoscape.js has an implementation of that).

[+] swagasaurus-rex|2 years ago|reply

The illustrations I've seen of neural networks really highlights the sheet incomprehensibility of visualizing large graphs

[+] fatherzine|2 years ago|reply

What an awesome article. Kudos to the author!

On the core observation "there are too many implementation choices", that is not quite right. True, the author mentions 4, and there are further variations. In practice, a library can:

1. Implement all suitable graph representations.

2. Implement algorithms tailored to the representation(s) that offer the highest performance.

3. Provide transformations from one representation to another. This is O(#representations), trivial to implement and trivial to use. Quite fair workload for both maintainers and users.

4. Bonus, provide import / export transformations from / to common standard library datatypes and idioms.

Memory and transformations are cheap, 99% of use-cases would likely find the overhead of transforming data, both in RAM and CPU, negligible.

Edit: "the harsh truth of working at Google is that in the end you are moving protobufs from one place to another." -- https://news.ycombinator.com/item?id=20132880

[+] josephg|2 years ago|reply

Sounds like the makings of a huge library that I’m not sure I’d even use in my work. I use graphs heavily in my work, and my experience matches the people the author interviewed.

We always end up reimplementing graphs because:

- Performance matters, and no off the shelf graph library I’ve seen can take advantage of many of the regularities in our particular data set. (We have an append-only DAG which we can internally run-length encode because almost all nodes just have an edge pointing to the last added item).

- I haven’t seen any generic graph library which supports the specific queries I need to make on my graphs. The big one is a subgraph diffing function.

- Writing something custom just isn’t much work anyway! Graphs are way simpler to reimplement than btrees. You can have a simple graph implementation in tens of lines. Our highly optimised library - with all the supporting algorithms - is still only a few hundred lines of code.

I think it would be handy to have ways to export the data into some standard format. But eh. I think pulling a library in for our use case would add more problems than it would solve.

[+] jkaptur|2 years ago|reply

I've often wondered about a missing application: "Excel for graphs".

Just like Excel for tabular data, it would support RAM-sized data (enough to require a computer, but not so much that you need a data center), implement lots of algorithms and visualizations "well enough", and require no programming skill to operate.

As the article says, a lot of our real-world problems are graph problems - why are programmers the only ones who should have the tools to solve them?

[+] matusp|2 years ago|reply

Yeah, I feel like the article is too quick with its conclusions. Many other problems can be made arbitrary complex and difficult with additional requirements. But there are still data structure and standard libraries to provide good enough experience that fits most use-cases. And if you need something extra spicy you need to build a custom solution.

The article claims that graphs are often just too big, but yeah, if you ask people who are actively working on graph algorithms they might have that sort of experience. But most programmers and users probably only work with really small graphs.

[+] BlindEyeHalo|2 years ago|reply

> As the article says, a lot of our real-world problems are graph problems - why are programmers the only ones who should have the tools to solve them?

I think programmers and mathematicians are the only ones that model these problems as graphs. I doubt a casual person sees graphs in random real world problems.

And something I learned working in a big corporations, everything can be an excel spreadsheet if you try hard enough.

[+] rdtsc|2 years ago|reply

> There’s a gap between how often software engineers could use graphs and how little our programming ecosystems support them. Where are all the graph types?

They've been there for quite a while :-) https://www.erlang.org/doc/man/digraph.html https://www.erlang.org/doc/man/digraph_utils

And if you want to do some set theoretical stuff you're covered as well: https://www.erlang.org/doc/man/sofs.html

[+] DylanSp|2 years ago|reply

Erlang's briefly mentioned at the end of the article:

> There are two other languages I found with graph types: Erlang and SWI-Prolog. I don’t know either language and cannot tell when they were added; with Erlang, at least, it was before 2008. I reached out to a person on the Erlang core language committee but did not hear back.

[+] technojamin|2 years ago|reply

Elixir has a pretty nice graph library as well: https://hexdocs.pm/libgraph/api-reference.html

I've used it to do some dependency resolution for operation ordering.

[+] ogogmad|2 years ago|reply

How flexible and performant is that in different situations?

[+] rb-2|2 years ago|reply

I wonder if it would be possible to mathematically define (in a theorem proving language like Coq) a bunch of accessor methods as well as a bunch of implementation primitives and then "compile" a custom graph implementation with whatever properties you need for your application. Some accessor methods will be very efficient for some implementations and very inefficient for others, but every method will still be available for every implementation. Profiling your application performance can help adjust the implementation "compiler" settings.

Ironically, this is a graph problem.

[+] kevindamm|2 years ago|reply

This sounds like Partial Evaluation and the Futamura Projection. The research around that shows that your interpreter determines the shape of the compiled output, so a formal proof of its application isn't necessary, if the $mix$-equivalent has the appropriate syntax and semantics for graph processes in its design.

I know this has been done for procedural languages and for declarative logical languages but I'm not aware of something like this specifically for graph processing and highly specialized code generation of graph processing. I wouldn't be surprised if Mix has been extended for this already, even if it has I'm sure there is still value in it.

[+] JonChesterfield|2 years ago|reply

I think this is a worthwhile direction.

For example, I'd like to program against a sequence abstraction. When sort is applied to it, I hope it's a vector. When slice or splice, I hope it's some sort of linked structure. Size is as cheap as empty for the vector but much more expensive for a linked list.

It should be possible to determine a reasonable data representation statically based on the operations and control flow graph, inserting conversions where the optimal choice is different.

The drawback of course is that people write different programs for different data structures. Knowing what things are cheap and what aren't guides the design. There's also a relinquishing of control implied by letting the compiler choose for you that people may dislike.

As an anecdote for the latter, clojure uses vectors for lambda arguments. I thought that was silly since it's a lisp that mostly works in terms of seq abstractions, why not have the compiler choose based on what you do with the sequence? The professional clojure devs I was talking to really didn't like that idea.

[+] maxiepoo|2 years ago|reply

You can do something like this with OCaml/SML's module system.

And certainly from an abstraction point of view you can do this in any dependently typed language like Idris/Agda/Coq, but these don't have great implementations.

[+] andoando|2 years ago|reply

Ive been thinking about something like this. A mathematical definition of a function such that we can search it. Imagine we had something like "Find a function that fits this signature -> Input arr[numbers] out-> for every x in arr, x2>x1.

[+] montmorency88|2 years ago|reply

I'd highly recommend Erwigs FGL library in Haskell as a really nice example of a generally performant graph data structure that is easy to work with. The API feels like working with lists because you are essentially consing contexts(node, neighbours) into a list of contexts that form your graph. Many standard graph algorithms are then built up from depth or breadth first search and you can compose really succinct programs to manipulate the graph. Graphs are labelled with any Haskell data structure and there is a graphviz library complementary to FGL to generate dot files which can be rendered according to the data carried in the node labels. Often in an application you want to both perform computations on your graph and render a visualization simultaneously to the end user or for debugging and in FGL you minimise duplication of application and rendering logic.

[+] gue5t|2 years ago|reply

One interesting perspective is to view the sequence of lists -> trees -> DAGs -> general graphs as a loosening of restrictions:

- list nodes may have one child

- tree nodes may have multiple

- DAG nodes may have multiple parents though restricted by topological ordering

- graph nodes may have multiple parents from anywhere in the collection

Lists and trees can be fully captured by sum and product types, but extending this representation style to DAGs and graphs doesn't work--you either get inefficiency (for DAGs) and then infinite regress (for cyclic graphs) attempting to continue the "syntactic" style of representation, or you need to adopt an "indirect" representation based on identifiers or indices or hash consing.

[+] js8|2 years ago|reply

Another data type that would be quite useful is a table (like in a database). For the same reasons, too many design choices.

Anyway, that being said, I have felt that progress will be made in programming languages if the compiler gets to choose an implementation of a data structure, kinda like when a database chooses an execution plan. So you just use an abstract structure (like sequence, map, set, table, graph) and based on the program profile, the compiler will pick the specific implementation. It will also transform the structure into another isomorphic one as needed. (Some programming languages already do something like this, for example, array of structs to struct of arrays conversion.)

[+] Kamq|2 years ago|reply

> So you just use an abstract structure (like sequence, map, set, table, graph) and based on the program profile, the compiler will pick the specific implementation. It will also transform the structure into another isomorphic one as needed.

I'm so not looking forward to having to debug a sudden change in perf characteristics when one additional usage of some feature tips a heuristic over the line and an implementation gets swapped out between builds.

[+] andsens|2 years ago|reply

devils advocate: Is this maybe a case of discarding an 80% solution because you can’t do the last 20%?

I understand the constraints, but imagine how legible you could make code by replacing some key parts with a graph type that everybody knows. I honestly think that having a type that supports a small subset of possibilities and only has the simplest algorithms implemented would go a long way.

[+] boothby|2 years ago|reply

It isn't just an 80/20 problem. Imagine that you replace every linked list, binary tree, trie, etc., with a generic directed graph datatype. The resulting performance would be abysmal. The resulting notation would be horrid, too.

Here's our nice linked list:

  def last_element(ll):
      last = ll
      while ll is not None:
          last = ll
          ll = ll.next
      return last

And here's an implementation with generic graph notation:

  def last_element(g):
      for v, deg in g.out_degree:
          if deg == 0:
              return v
      return None

There are several problems with this; most importantly, there can be silent failures when g is not a linked list. But it also throws out a useful abstraction where a list is equivalent to a node, so I wrote a horrid implementation that takes O(n) regardless of the position in the list. And then comes all the baggage of representation, because you can't just represent a node with a pointer anymore.

When your data structure better reflects the, well, structure of your data, you can go faster and safer. There's a reason we teach undergrads about these specific datatypes and don't just sweep it all under a rug with "it's a graph!"

[+] bigbillheck|2 years ago|reply

I think it's more like discarding a 20% solution that can't do the last 80%.

[+] skrebbel|2 years ago|reply

I think most languages support representing many kinds of graphs very well, particularly directed graphs without data on the edges: with objects, lists, and object references (or pointers).

A tree is a graph. A typical Java-style object composing other objects composing other objects again, etc etc, often with cycles and parent backreferences and whatnot, is a graph. The html DOM is a graph.

I recognize that these are often very tree-like, which feels like cheating in the same way as saying “well a list is also a graph!” is. But given that cycles are common enough that serializers (eg JSON.stringify) need to special-case those, I think maybe this is simply not true, and they’re really just graphs. Very few tree-like class structures tend to remain pure trees.

The only thing missing from references/pointers to be able to represent what the author is looking for, is having data on the edges. I think this is trivially solvable by putting nodes halfway the edge (= add a level of indirection, an operation so common that we don’t even think of it as “adding data to the edges”).

So I think the answer is that there’s no explicit data structure named “graph” because the basic building block of composition in nearly every language (reference/pointer) is an edge, and the basic building block of data representation (objects/structs/records) is a node. So for most graphs, trying to pour it all into some fancy Graph<V, E> datastructure feels like needless complexity.

[+] e_y_|2 years ago|reply

Back in the C days, it was common to not use generic data structures like lists; instead you'd have a next_item pointer as a field in the struct. For linked lists, this would save you from needing another memory allocation or wrapper struct, and since C doesn't have templates you'd either have to blindly cast the type or use macros or write a type-specific iterator anyways.

Lists eventually became a standard language feature in C++ and other languages, but it's trickier for trees and graphs. Taking the DOM example, you might be searching through child elements (div, span, etc) or nodes (text nodes, comment nodes) and different operations might only work with a specific subset of the "edges". There might be pointers to other objects, like from a DOM node to accessibility tree node. You might even have multiple parent node pointers, such as a pointer that takes you to the nearest shadow root or something.

Since there are multiple ways to traverse the same data structure, generic functions don't work on it. You could create a separate tree/graph for each thing you want to use it for, but that takes additional memory and has to be updated when the original struct changes. Or you could create some kind of adapter that has a get_edges() function, but this might not be very well optimized or might be clunky for many other reasons. So it usually just ends up being simpler rolling your own functions instead of using a library.

[+] taeric|2 years ago|reply

A graph, as presented in this article, is a model of something else. That we have more than one way to implement this model is rather natural, all told. The hope that we can have a singular syntax and data structure that represents a graph in code is almost exactly why the author of Java's Linked List posted this gem: https://twitter.com/joshbloch/status/583813919019573248

My favorite on the idea of having a linked list where the node is first class in your code, is almost precisely the problem. You rarely want/need to work at that level. In a very real sense, objects that have other objects are already trees of data. Many can back reference, such that then you have a graph.

And then there is the joy of trying to use matrix operations to work with graphs. You can do some powerful things, but at that point, you almost certainly want the matrix to be the abstraction.

Excited to see someone come up with good things in this idea. I retain very serious doubts that I want a singular model for my data.

[+] superlopuh|2 years ago|reply

This is basically my PhD thesis proposal, I don't think there's any fundamental technological problem here, just that for a graph to be efficient to process you need high-level optimisations that can take mathematical properties of graphs into account. For that you need to either reimplement a compiler into your framework, or be integrated into an existing compiler, both obviously take a lot of work.

Some comments here mention GraphBLAS, which is the big breakthrough in decoupling the layout of the graph from an efficient implementation of an algorithm, but none mention MLIR-GraphBLAS [0] which is the most promising integration into a compiler that I've seen.

I still think it's early days, I wouldn't throw in the towel quite yet :)

[0]: https://mlir-graphblas.readthedocs.io/en/latest/index.html

[+] platz|2 years ago|reply

How will I have any expectations of run-time behavior if I have to hope that my graph will fuse or fail to fuse at run time?

Reminds me of the issues that haskell programmers face when an innocuous change causes list fusion to fail tanking performance; to know how to coax the compiler to fuse again you have to have intimate knowledge of how that fusion process works which isn't visible in the API; you need knowledge of compiler implementation/behavior.

programmers do not like this kind of instability.

[+] Ptitselov|2 years ago|reply

If you code in .NET, please try my graph library Arborescence [1]. It's small and not very feature-rich. However, I believe it provides a good separation between abstractions [2], algorithms, and data structures. Regarding the points mentioned in the article: - you can use the edges with or without their own identity, - you can use implicit graphs unfolded on the fly, - you can use both adjacency (out-neighbors) and incidence (out-edges + head) interfaces, - no edge type is emposed by the library, although it does provide the basic tail-head-pair structure as a utility.

[1] https://github.com/qbit86/arborescence

[2] https://github.com/qbit86/arborescence/tree/develop/src/Arbo...

[+] kajic|2 years ago|reply

One of the complications described by the author is performance. Personally, I find stdlib graph libraries extremely useful even if their performance is poor because it’s often the case that my dataset is small enough, and even if performance turns out to be an issue, first spending time on the problem with a underperforming graph library is a very worthwhile exercise before trying to write my own optimized implementation. By analogy, many programming languages are far from being the fastest, but they can nevertheless be very useful.

That said, I’m not surprised performance came up in interviews with experts; they probably have tons of interesting performance-related stories to tell from their extensive work on graphs.

254 comments