GitHub has introduced an experimental Rubber Duck mode in the GitHub Copilot CLI. The latest addition to the AI-powered coding tool uses a second model from a different AI family to provide a second opinion before enacting the agent’s plan.
The new feature was announced April 6. Introduced in experimental mode, Rubber Duck leverages a second model from a different AI family to act as an independent reviewer, assessing plans and work at the moments where feedback matters most, according to GitHub. Rubber Duck is a focused review agent, powered by a model from a complementary family to a primary Copilot session. The job of Rubber Duck is to check the agent’s work and present a short, focused list of high-value concerns including details the primary agent may have missed, assumptions worth questioning, and edge cases to consider.
Developers can use/experimentalin the Copilot CLI to access Rubber Duck alongside other experimental features.
Evaluating Rubber Duck on SWE-Bench Pro, a benchmark of real-world coding problems drawn from open-source repositories, GitHub found that Claude Sonnet 4.6 paired with Rubber Duck running GPT-5.4 achieved a resolution rate approaching Claude Opus 4.6 running alone, closing 74.7% of the performance gap between Sonnet and Opus. GitHub said Rubber Duck tends to help more with difficult problems, ones that span three-plus files and would normally take 70-plus steps. On these problems, Sonnet plus Rubber Duck scores 3.8% higher than the Sonnet baseline and 4.8% higher on the hardest problems identified across three trials.
GitHub cited these examples of the kinds of problems Rubber Duck finds:
- Architectural catch (OpenLibrary/async scheduler): Rubber Duck caught that the proposed scheduler would start and immediately exit, running zero jobs—and that even if fixed, one of the scheduled tasks was itself an infinite loop.
- One-liner bug (OpenLibrary/Solr): Rubber Duck caught a loop that silently overwrote the same
dictkey on every iteration. Three of four Solr facet categories were being dropped from every search query, with no error thrown. - Cross-file conflict (NodeBB/email confirmation): Rubber Duck caught three files that all read from a Redis key which the new code stopped writing. The confirmation UI and cleanup paths would have been silently broken on deploy.
Go to Source
Author: