AI / ML / LLM / Transformer Models Timeline

esafak|2 years ago

This is fine for historians, if any are studying this. What practitioners need is help in picking a model. For a given task, what are the best models, and what are their trade-offs?

If there were a database of model benchmarks, type of task represented by the benchmark, dataset used, and other salient features like number of parameters, etc. life would be easier.

If none exists, and one were motivated to create one, what's the best place to crowdsource a relational database?

whimsicalism|2 years ago

paperswithcode

kristopolous|2 years ago

Is there any good resource that is keeping up with all of this news? I want to be informed but it's so overwhelming and each of these systems require some fairly intense study to become competent about.

I've thought about organizing this just for myself but the cardinal rule of "if I think of it, it probably already exists and it's better" popped into my mind. Any good links? Anybody doing this already?

esafak|2 years ago

Do you really need to keep up with it? If not, do a survey when you need to know. Other than that, you should hear about important developments by osmosis through various channels.

askiiart|2 years ago

The best I've seen so far is Simon Willison's Mastodon.

https://fedi.simonwillison.net/@simon

mysterypie|2 years ago

I keep hearing that there have been significant breakthroughs in the areas of AI / ML / LLM / Transformer Models between 2012 and present. Can summarize what the breakthroughs were, who was principally responsible, and which papers specifically?

This timeline has something like 60 papers and many papers have 8-30 authors. Are the breakthroughs spread out like that? Or are there one or two super important works? Sort of like Einstein's "On the Electrodynamics of Moving Bodies"?

Garcia98|2 years ago

The most notable paper is probably the one that defined the concept of transformers, Attention Is All You Need: https://arxiv.org/abs/1706.03762

imagainstit|2 years ago

It's been very incremental and spread out.

Another commenter pointed to "Attention is all you Need" as particularly breakthrough. Even that paper just merged at the time current trends in attention, normalization, seq2seq and doesn't yet start to have all the interesting empirical results were later found using that architecture to scale up other problems.

The papers with huge numbers of authors tend to be empirical results with a large element of systems/engineering, much large scale LLM work is like this.

nighthawk454|2 years ago

The graph helps clarify that. Look at the ones that are heavily-connected ancestors.

dang|2 years ago

visarga|2 years ago

It's a summary of the most famous models, not exhaustive by any means.

TuringNYC|2 years ago

Love this! What system did you use to produce this? Was it custom-built or are you using a system to take dependencies+dates and plot all this out?

vemgar|2 years ago

This one is custom-built in Python. It generates Graphviz DOT files which will be rendered into SVG. The rest is a simple static site generator.

MuffinFlavored|2 years ago

Do all of them "hallucinate"?

leobg|2 years ago

My 5 year old hallucinates. Ask her about horses and she’ll gladly tell you that she knows everything about them. Hallucination is just not knowing that you don’t know. It’s an absence of doubt. And doubt itself, what is that if not another learned behavior of cross checking your conclusions? Also, it’s easy to spot in someone else, but hard or impossible in ourselves. To know what you don’t know.

17 comments