when I read the paper I thought the idea was changing \Delta permits getting the model to learn different things over different time scales. As you quoted “the main source of improvement".
I don’t have an llm backround, just controls, so I might wrong.
No comments yet.