I'm probably one of the least educated software engineers on LLMs, so apologies if this is a very naive question. Has anyone done any research into just using words as the tokens rather than (if I understand it correctly) 2-3 characters? I understand there would be limitations with this approach, but maybe the models would be smaller overall?
lyu07282|4 months ago
https://en.wikipedia.org/wiki/Byte-pair_encoding
It's also not lossy compression at all, it's lossless compression if anything, unlike what some people have claimed here.
Shocking comments here, what happened to HN? People are so clueless it reads like reddit wtf
alexchamberlain|4 months ago
murkt|4 months ago
mhuffman|4 months ago
plaguuuuuu|4 months ago