top | item 44820656

(no title)

tmshapland | 6 months ago

Such a fascinating read. I didn't realize how much massaging needed to be done to get the models to perform well. I just sort of assumed they worked out of the box.

discuss

acters|6 months ago

Personally, I think bigger companies should be more proactive and work with some of the popular inference engine software devs with getting their special snowflake LLM to work before it gets released. I guess it is all very much experimental at the end of the day. Those devs are putting in God's work for us to use on our budget friendly hardware choices.

mutkach|6 months ago

This is a good take, actually. GPT-OSS is not much of a snowflake (judging by the model's architecture card at least) but TRT-LLM treats every model like that - there is too much hardcode - which makes it very difficult to just use it out-of-the-box for the hottest SotA thing.

diggan|6 months ago

This is literally what they did for GPT-OSS, seems there was coordination to support it on day 1 with collaborations with OpenAI

eric-burel|6 months ago

SMEs are starting to want local LLMs and it's a nightmare to figure what hardware would work for what models. I am asking devs in my hometown to literally visit their installs to figure combos that work.