Easiest way I know is to just use LMStudio. Just download and press play :). Optional, but recommended, increase the context length to 262144 if you have the DRAM available. It will definitely get slower as your interaction prolongs, but (at least for me) still tolerable speed.
PeterStuer|25 days ago
mathrawka|25 days ago
I see around 30 t/s