AI agents still need humans to teach them

AI agents need skills — specific procedural knowledge — to perform tasks well, but they can’t teach themselves, a new research suggests.

The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering.  The researchers looked at each task under three conditions: no skills (agent receives instructions only), with curated skills (provided with a directory, code snippets and resources to help it) and self-generated skills (agent has no skills but is prompted to develop them).

Typical tasks included conducting a security audit of npm dependencies for vulnerabilities, or analyzing differential protein expression in cancer cell line data.

The best performances came from agents with curated skills, which scored an average of 16.2 percentage points higher than agents with no skills, an indication that AI cannot yet do without human intervention. Even so, in 16 out of the 84 tasks human guidance had a negative impact on results.

Performance varied widely across industry sectors, with curated skills having the biggest impact in healthcare tasks, but only a small one for software engineering.

Agents asked to generate their own skills demonstrated no increase in performance, showing that AI still requires some human prompting to get the job done.

This article first appeared on Computerworld.

Go to Source

Author: