(no title)
sumo43 | 2 years ago
p.s. from experience instruct-finetuning falcon180b, it's not worth using over llama2-70b as it's significantly undertrained.
sumo43 | 2 years ago
p.s. from experience instruct-finetuning falcon180b, it's not worth using over llama2-70b as it's significantly undertrained.
borzunov|2 years ago
We developed Petals for people who have less GPU memory than needed. Also, there's still a chance of larger open models being released in the future.
brucethemoose2|2 years ago
And the inference is pretty inefficient. Pooling the hardware would achieve much better GPU utilization and (theoretically) faster responses for the host's requests
sumo43|2 years ago
Here I'm assuming that Petal uses a large number of small, heterogenous nodes like consumer gpus. It might as well be something much simpler.