(no title)
joshwa | 6 months ago
Individual models may have supplemented their training with things that look like structure (e.g. Claude with its XMLish delimiters), but it's far from universal.
Ultimately if we want better fidelity to the concepts we're referencing, we're better off working from the larger/richer dataset of token sequences in the training data--the total published written output of humanity.
No comments yet.