xwn | 1 year ago | on: Garak, LLM Vulnerability Scanner
xwn's comments
xwn | 1 year ago | on: Garak, LLM Vulnerability Scanner
* ineffective prompts come out of garak and new prompts come in to garak, so eval scores always drop over time on a static target
* there are more and more dynamic probes - check out eg atkgen and topic probes. expanding these is the current focus
xwn | 1 year ago | on: Garak, LLM Vulnerability Scanner
xwn | 1 year ago | on: Garak, LLM Vulnerability Scanner
xwn | 2 years ago | on: Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild
xwn | 2 years ago | on: FakeToxicityPrompts: Automatic Red Teaming
xwn | 3 years ago | on: On the dangers of stochastic parrots: Can language models be too big? (2021)