top | item 38739936 (no title) razorguymania | 2 years ago Its using vanilla llama-2 from Meta with no fine tuning. The point here is the speed and responsiveness of the underlying HW and SW. discuss order hn newest chihuahua|2 years ago But if the quality of the response is poor, it's irrelevant that it was generated quickly. If it was using different data to generate higher quality responses, would that not slow it down? tome|2 years ago nomel gave a good answer in a different thread> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!https://news.ycombinator.com/item?id=38742466 andygeorge|2 years ago Do you work there? Just curious razorguymania|2 years ago yes load replies (1) andygeorge|2 years ago ah Llama 2 70B, no wonder
chihuahua|2 years ago But if the quality of the response is poor, it's irrelevant that it was generated quickly. If it was using different data to generate higher quality responses, would that not slow it down? tome|2 years ago nomel gave a good answer in a different thread> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!https://news.ycombinator.com/item?id=38742466
tome|2 years ago nomel gave a good answer in a different thread> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!https://news.ycombinator.com/item?id=38742466
andygeorge|2 years ago Do you work there? Just curious razorguymania|2 years ago yes load replies (1)
chihuahua|2 years ago
tome|2 years ago
> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.
To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!
https://news.ycombinator.com/item?id=38742466
andygeorge|2 years ago
razorguymania|2 years ago
andygeorge|2 years ago