A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information.
and
The Salesforce AI Research team argued that existing benchmarks failed to rigorously measure the capabilities or limitations of AI agents, and largely ignored an assessment of their ability to recognize sensitive information and adhere to appropriate data handling protocols.
The article also makes it sound like that. Are you saying they didn't? I don't see any reference in the article to any other organization that could have done the research.
Edit: Unless "Salesforce AI Research" is not a part of Salesforce, I think Salesforce did do the research.
burningChrome|8 months ago
A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information.
and
The Salesforce AI Research team argued that existing benchmarks failed to rigorously measure the capabilities or limitations of AI agents, and largely ignored an assessment of their ability to recognize sensitive information and adhere to appropriate data handling protocols.
0xffff2|8 months ago
Edit: Unless "Salesforce AI Research" is not a part of Salesforce, I think Salesforce did do the research.
profstasiak|8 months ago