from my personal experience, qwen 30b a3b understand command quiet well as long as the input is not big enough that ruin the attention (I feel the boundary is somewhere between 8000 or 12000?). But that isn't really bug of model itself though. A smaller model just have shorter memory, it's simply physical restriction.
I made a mixed extraction, cleaning, translation, formatting task on job that have average 6000 token input. And so far, only 30b a3b is smart enough not miss job detail (most of time)
I later refactor the task to multi pass using smaller model though. Make job simpler is still a better strategy to get clean output if you can change the pipeline.
mmis1000|17 hours ago
I made a mixed extraction, cleaning, translation, formatting task on job that have average 6000 token input. And so far, only 30b a3b is smart enough not miss job detail (most of time)
I later refactor the task to multi pass using smaller model though. Make job simpler is still a better strategy to get clean output if you can change the pipeline.