1 month ago|discuss
user: dmadisetti
47 karma | created 6 years ago
recent submissions
GRPO vs. GDPO: Building Intuition for RL Reward Policies
(huggingface.co)
2 pts|1 month ago|1 comment
3 months ago|discuss
4 months ago|discuss
4 months ago|discuss
4 months ago|discuss
4 months ago|discuss
4 months ago|discuss
4 months ago|discuss
31 pts|4 months ago|23 comments
7 months ago|discuss
7 months ago|discuss
7 months ago|discuss
11 months ago|discuss
11 months ago|discuss
1 year ago|discuss
1 year ago|discuss
1 year ago|discuss
1 year ago|discuss
6 years ago|discuss