top | item 41531109

(no title)

ksplicer | 1 year ago

This is something we've been grappeling with on my team. Many of the researchers in the org want to try all these reasoning techniques to increase performance, and my team keeps pushing back that we don't actually need that extra performance- we just want to decrease latency and cost.

discuss

order

iinnPP|1 year ago

So make the requirement using a cheaper and lower latency model and try to increase the performance to a satisfactory level. Assuming that you are not already using the cheapest/lowest latency model.