top | item 37126512

(no title)

newhouseb | 2 years ago

I think this is likely a consequence of a couple of factors:

1. Fancy token selection w/in batches (read: beam search) is probably fairly hard to implement at scale without a significant loss in GPU utilization. Normally you can batch up a bunch of parallel generations and just push them all through the LLM at once because every generated token (of similar prompt size + some padding perhaps) takes a predictable time. If you stick a parser in between every token that can take variable time then your batch is slowed by the most complex grammar of the bunch.

2. OpenAI appears to work under the thesis articulated in the Bitter Lesson [i] that more compute (either via fine-tuning or bigger models) is the least foolish way to achieve improved capabilities hence their approach of function-calling just being... a fine tuned model.

[i] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

discuss

WiSaGaN|2 years ago

The "Bitter Lesson" indeed sheds light on the future trajectory of technology, emphasizing the supremacy of computation over human-designed methods. However, our current value functions often still need to focus on what we can achieve with the tools and methods available to us today. While it's likely that computational tools will eventually replace human-guided "outlines" or "guidance", that are used to shape LLM outputs, there will likely always be a substantial amount of human-structured knobs necessary to align computation with our immediate needs and goals.

reasonabl_human|2 years ago

What a fascinating read, thanks for sharing that link.