(no title)
MakeAJiraTicket | 4 months ago
These aren't operating on reward functions because there's no internal model to reward. It's word prediction, there's no intelligence.
MakeAJiraTicket | 4 months ago
These aren't operating on reward functions because there's no internal model to reward. It's word prediction, there's no intelligence.
LeifCarrotson|4 months ago
Subsequently, ChatGPT/Claude/Gemini/etc will go through additional training with supervised fine-tuning, reinforcement learning with reward functions whether human-supervised feedback (RLHF) or reward functions (RLVR, 'verified rewards').
Whether that fine-tuning and reward function generation give them real "intelligence" is open to interpretation, but it's not 100% plagarism.
aoeusnth1|4 months ago
MakeAJiraTicket|4 months ago
comex|4 months ago