top | item 42470817

(no title)

This is actually a good insight. It turns out that transformers are indeed a form of graph network, precisely because of the attention mechanism. Graph attention networks are actually a very popular GNN architecture. Generally, the issue with using an LLM style architecture for generic graphs is modeling the sparsity, but is possible by using the graph adjacency matrix to mask the attention matrix. There are a number of papers and articles which address this connection, and plenty of research into mechanisms for sparsifying attention in transformers.

There are also graph tokenizers for using more standard transformers on graphs for doing things like classification, generation, and community detection.

discuss

algo_trader|1 year ago

Any canonical papers on GNN for code graphs?