(no title)
brig90 | 9 months ago
The challenge (as I understand it) is that the vocabulary size is pretty massive — thousands of unique words — and the structure might not be 1:1 with how real language maps. Like, is a “word” in Voynich really a word? Or is it a chunk, or a stem with affixes, or something else entirely? That makes brute-forcing a direct mapping tricky.
That said… using cluster IDs instead of individual word (tokens) and scoring the outputs with something like a language model seems like a pretty compelling idea. I hadn’t thought of doing it that way. Definitely some room there for optimization or even evolutionary techniques. If nothing else, it could tell us something about how “language-like” the structure really is.
Might be worth exploring — thanks for tossing that out, hopefully someone with more awareness or knowledge in the space see's it!
quantadev|9 months ago
Maybe a version of scripture that had been "rejected" by some King, and was illegal to reproduce? Take the best radiocarbon dating, figure out who was King back then, and if they 'sanctioned' any biblical translations, and then go to the version of the bible before that translation, and this will be what was perhaps illegal and needed to be encrypted. That's just one plausible story. Who knows, we might find out the phrase "young girl" was simplified to "virgin", and that would potentially be a big secret.
edoceo|9 months ago
marcodiego|9 months ago