That's awesome. I've been seeing quite a bit of chat about it on X too. Seems like they've hit the mark with playground. What are you using it for specifically?
super interesting how it makes "decisions", but nice that they let you tie user feedback directly into LLM refinement, otherwise would be hard to make that info useful
From the docs it looks like they're fairly explicit with respecting env states for each dataset. I'm not sure how/where contamination would even occur to be honest - regardless of model used.
I have no idea. They're still in beta so probably figuring it out as they go I guess. I could see them charging on tokens or traces most likely though.
killaJ|2 years ago
thepaulthomson|2 years ago
finnlobsien1|2 years ago
thepaulthomson|2 years ago
fwesss|2 years ago
thepaulthomson|2 years ago
dazzeloid|2 years ago
thepaulthomson|2 years ago
sp332|2 years ago
casstang|2 years ago
thepaulthomson|2 years ago