(no title)
ericye16 | 1 year ago
https://ericye16.com/stanford-cs224r
We were able to make some improvements by tuning how the reward is distributed and also by first pretraining the agent on scales before fine-tuning them on the final pieces.
Thanks to Kevin Zakka for helping us get started with the RL environment!
plaguuuuuu|1 year ago
ericye16|1 year ago