top | item 41236707

(no title)

There has been some work to dynamically reduce the compute required by a network.

See for example: https://arxiv.org/abs/2404.02258

They have a fixed compute budget which is lower than what the LLM need, and dynamically decide to allocate this compute budget to different part of the network.

So its not exactly what you propose since here the compute budget is fixed (that's the point of the paper: to make the network learn how to allocate the resources by itself) but its dynamic for each part of the network, so it shows that its possible.

discuss

No comments yet.