(no title)
SuchAnonMuchWow | 1 year ago
See for example: https://arxiv.org/abs/2404.02258
They have a fixed compute budget which is lower than what the LLM need, and dynamically decide to allocate this compute budget to different part of the network.
So its not exactly what you propose since here the compute budget is fixed (that's the point of the paper: to make the network learn how to allocate the resources by itself) but its dynamic for each part of the network, so it shows that its possible.
No comments yet.