(no title)
entilzha | 1 year ago
Not sure what you mean by implicit? If you mean just treat bytes as tokens, one issue you run into is your sequence lengths get quite long, so compared to a regular token LLM, you can’t pack as many bytes in a batch, which means you’re pretty FLOP inefficient so scale worse. You could make the model smaller to compensate, but then the model isn’t as good.
No comments yet.