balderdash?
"Q-star". Yes, the Q as in q-learning -- optimize a long term goal. The "star points" are the embedded algorithms discovered and joined within the transformer/NN architecture. Stars where formed after SGD discovered the best representation of said embedded alg type.
I'm running a scaled down version myself -- somewhat impressive. Do it at 1k B parameters? hold my beer.
No comments yet.