What toolchain are you going to use with the local model? I agree that’s a Strong model, but it’s so slow for be with large contexts I’ve stopped using it for coding.
Can you tell me more about your agent harness? If it’s open source, I’d love to take it for a spin.
I would happily use local models if I could get them to perform, but they’re super slow if I bump their context window high, and I haven’t seen good orchestrators that keep context limited enough.
Curious how you handle sharding and KV cache pressure for a 120b model. I guess you are doing tensor parallelism across consumer cards, or is it a unified memory setup?
embedding-shape|1 month ago
mercutio2|1 month ago
I would happily use local models if I could get them to perform, but they’re super slow if I bump their context window high, and I haven’t seen good orchestrators that keep context limited enough.
storystarling|1 month ago