top | item 37122127

(no title)

zserge | 2 years ago

Would it be possible to train an LLM from scratch that would speak Toki Pona? 120 word dictionary over a reduced alphabet would mean a tiny number of possible tokens and theoretically a model could be smaller than the ones used in "tiny stories" experiment (where a simplified almost childish English has been used). Maybe even a local machine would be enough to train it. I wonder if there is a large enough dataset for Toki Pona or if there is a sensible way to synthesize one? I'm no expert in LLMs or Toki Pona, though.

discuss

No comments yet.