top | item 46671396

(no title)

adefa | 1 month ago

I ran a similar experiment last month and ported Qwen 3 Omni to llama cpp. I was able to get GGUF conversion, quantization, and all input and output modalities working in less than a week. I submitted the work as a PR to the codebase and understandably, it was rejected.

https://github.com/ggml-org/llama.cpp/pull/18404

https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF

discuss

antirez|1 month ago

The refusal because often AI writes suboptimal GGML kernels looks very odd, to me. It means that who usually writes manually GGML kernels, could very easily steer the model into writing excellent kernels, and even a document for the agents can be compiled with the instructions on how to do a great work. If they continue in this way, soon a llama.cpp fork will emerge that will be developed much faster and potentially even better: it is unavoidable.

rjh29|1 month ago

The refusal is probably because OP said "100% written by AI" and didn't indicate an interest in actually reviewing or maintaining the code. In fact, a later PR comment suggests that the AI's approach was needlessly complicated.

nickandbro|1 month ago

I wonder if some of the docs from https://app.wafer.ai/docs could be used to make the model be better at writing GGML kernels. Interesting use case.

nickpsecurity|1 month ago

[deleted]

unknown|1 month ago

[deleted]