ggml (https://github.com/ggerganov/ggml) has a GPT-J example, the 6B parameter model runs happily on the CPU 16gb of ram and 8 cores at a couple of words per second, no GPUs necessary.
gptj_model_load: ggml ctx size = 13334.86 MB
gptj_model_load: memory_size = 1792.00 MB, n_mem = 57344
gptj_model_load: model size = 11542.79 MB / num tensors = 285
main: number of tokens in prompt = 12
An example of GPT-J running on the CPU is shown in Fig. [4](#Fig4
main: mem per token = 16179460 bytes
main: load time = 7463.20 ms
main: sample time = 3.24 ms
main: predict time = 4887.26 ms / 232.73 ms per token
main: total time = 13203.91 ms
two_handfuls|3 years ago
was_a_dev|3 years ago
serendipty01|3 years ago
johntash|3 years ago
quesomaster9000|3 years ago
jerpint|3 years ago
ops|3 years ago
jarrell_mark|3 years ago
I got it to load on a GTX 1070 with 8GB GPU RAM, but then it crashed before it could generate a response.
It needs less RAM than regular GPT-J because the weights are converted to 8-bit