top | item 35140467 (no title) rjb7731 | 3 years ago The inference on the gradio demo seems pretty slow, about 250 seconds for a request. Maybe I am too used to the 4-bit quant version now ha! discuss order hn newest sebzim4500|3 years ago I'm sure it's partially the HN hug of death.
sebzim4500|3 years ago