Generation is usually fast, but prompt processing is the main limitation with local agents. I also have a 128 GB M4 Max. How is the prompt processing on long prompts? processing the system prompt for Goose always takes quite a while for me. I haven't been able to download the 120B yet, but I'm looking to switch to either that or the GLM-4.5-Air for my main driver.
ghc|6 months ago
```
total duration: 1m14.16469975s
load duration: 56.678959ms
prompt eval count: 3921 token(s)
prompt eval duration: 10.791402416s
prompt eval rate: 363.34 tokens/s
eval count: 2479 token(s)
eval duration: 1m3.284597459s
eval rate: 39.17 tokens/s
```
andai|6 months ago
bluecoconut|6 months ago
mike_hearn|6 months ago
anonymoushn|6 months ago
lostmsu|6 months ago