Depends how you’re defining it. There can be a lot of it to ingest so it’s a lot of compute in absolute terms. It’s also much more memory efficient since it’s batchable so, it’s more likely to be compute bound, but you can also throw a lot of resources at the problem. But in terms of time generation can be significantly more expensive since it’s slower and you can’t batch (only use a draft model)
vlovich123|6 days ago