top | item 47052689

(no title)

xdotli | 12 days ago

Thanks @dang for moderating! This is indeed not our original findings and this is a sub conclusion for an ablation we did to remove the confound of LLMs internal domain knowledge. Thanks for submitting for us @mustaphah here's a little bit more details on how we approach this:

> I would frame the 'post-trajectory generated skills' as feedback-generated skills, so is Letta: https://www.letta.com/blog/skill-learning. We haven't seen existing research or hypothesis debating whether the skills improvement might come from the skill prompt themselves activated knowledge in LLMs that can help itself. So that's why we added an ablation of 'pre-trajectory generated skills' because we have that hypothesis and this seems a very clean way to test it. Also it is very logical that feedback generated skills can help, because it most certainly contain the failure mode of agents on that specific tasks.

discuss

mustaphah|12 days ago

Yeah, I got your point when I read the paper. You're essentially controlling for "latent domain knowledge."

I might have been a bit blunt with the title - sorry about that, but I still think it was a good title. From what I've observed, a lot of Skills on GitHub are just AI-generated without any feedback or deliberative refinement. Many thought those would still be valuable, but you've shown evidence otherwise.

xdotli|12 days ago

no worries it's totally fine! there is indeed work needs to be done on the feedbacks generated skills. Thanks for helping us submitting on HackerNews. And for > a lot of Skills on GitHub are just AI-generated without any feedback or deliberative refinement. Many thought those would still be valuable, but you've shown evidence otherwise. we do find most skills on the internet to be useless, and thanks to the generosity of https://skillsmp.com/ author, we were able to get all the meta data of the 99k skills indexed on his website. We did a lot of filtering and deduping and we discovered ~40k+ skills were relevant at the time we did the study.