(no title)
blndrt | 5 months ago
The simplest test would be to make previously “unreachable” tasks succeed through obvious prompt tweaks — like reordering instructions or emphasizing key parts.
That said, my methodology intentionally avoided exposing the model to actual tasks. Instead, I focused on the domain as a whole: refining the instructions so a smaller model could understand and act reliably.
No comments yet.