(no title)
smahs | 4 months ago
The model runtime recognizes these as special tokens. It can be configured using a chat template to replace these token with something else. This is how one provider is modifying the xml namespace, while llama.cpp and vllm would move the content between <think> and </think> tags to a separate field in the response JSON called `reasoning_content`.
No comments yet.