(no title)
andrewgross | 1 year ago
I could be very wrong on how experts work across layers though, I have only done a naive reading on it so far.
andrewgross | 1 year ago
I could be very wrong on how experts work across layers though, I have only done a naive reading on it so far.
rahimnathwani|1 year ago
This doesn't sound like it would work if you're running just one chat, as you need all the experts loaded at once if you want to avoid spending lots of time loading and unloading models. But at scale with batches of requests it should work. There's some discussion of this in 2.1.2 but it's beyond my current ability to comprehend!
andrewgross|1 year ago