top | item 39844449

(no title)

What?

Are you asking if the framework automatically quantizes/prunes the model on the fly?

Or are you suggesting the LLM itself should realize it's too big to run, and prune/quantize itself? Your references to "intelligent" almost leads me to the conclusion that you think the LLM should prune itself. Not only is this a chicken and egg problem, but LLMs are statistical models, they aren't inherently self bootstraping.

discuss

dheera|1 year ago

I realize that, but I do think it's doable to bootstrap it on a cluster and teach itself to self-prune, and surprised nobody is actively working on this.

I hate software that complains (about dependencies, resources) when you try to run it and I think that should be one of the first use cases for LLMs to get L5 autonomous software installation and execution.

Red_Leaves_Flyy|1 year ago

Make your dreams a reality!

lobocinza|1 year ago

Worst is software that doesn't complain but fails silently.

2099miles|1 year ago

The LLM itself should realize it’s too big and only put the important parts on the gpu. If you’re asking questions about literature there’s no need to have all the params on the gpu, just tell it to put only the ones for literature on there.