top | item 42671237

(no title)

jmorgan | 1 year ago

Phi-4's architecture changed slightly from Phi-3.5 (it no longer uses a sliding window of 2,048 tokens [1]), causing a change in the hyperparameters (and ultimately an error at inference time for some published GGUF files on Hugging Face, since the same architecture name/identifier was re-used between the two models).

For the Phi-4 uploaded to Ollama, the hyperparameters were set to avoid the error. The error should stop occurring in the next version of Ollama [2] for imported GGUF files as well

In retrospect, a new architecture name should probably have been used entirely, instead of re-using "phi3".

[1] https://arxiv.org/html/2412.08905v1

[2] https://github.com/ollama/ollama/releases/tag/v0.5.5

discuss

order

No comments yet.