top | item 42293588

(no title)

wholehog | 1 year ago

It is open: https://github.com/google-research/circuit_training

discuss

As far as I understand it, only kind of? It's open source, but in their paper they did a tonne of pre-training and whilst they've released a small pre-training checkpoint they haven't released the results of the pre-training they've done for their paper. So anyone reproducing this will innevitably be accused of failing to pretrain the model correctly?

wholehog|1 year ago

I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.