Researchers from the Allen Institute for AI (Ai2) and the University of Washington have developed a new open-source AI model named OpenScholar that they claim can synthesize scientific literature and verifiable citations at a level comparable to a human expert. With millions of scientific papers published every year, it’s challenging to keep up with the latest research.
While the more general-purpose AI systems are helpful, they are susceptible to hallucination, which greatly reduces trust and reliability. They can offer summaries but when asked to provide supporting evidence, they often fall short by either citing incorrect or irrelevant sources. For scientific research, this is not acceptable.
The researchers claim that with OpenScholar you get more reliable grounding. What does that matter? Well, for starters, researchers can trust the data. Instead of spending hours verifying and checking references, they can move ahead faster. It also helps them avoid building new work on incorrect data. OpenScholar essentially acts as a dependable research assistant.
(Gorodenkoff/Shutterstock)
Another limitation with generic AI models is that they may not be able to access papers that were published after their training data was collected. The Ai2 and University of Washington team tested OpenAI’s GPT-4o model and they found it fabricated 78-90% of its research citations.
OpenScholar was built by pairing a model trained specifically for scientific synthesis with retrieval-augmented generation (RAG). The result is that instead of answering from memory alone, OpenScholar first searches real papers across a massive corpus – around 45 million documents. It then pulls relevant passages from those papers. Only after that does it write an answer, using those retrieved sources as evidence.
Without the RAG combination, the model would have relied mostly on patterns learned during training and that is exactly how you get outdated or fabricated outcomes.
“Early on we experimented with using an AI model with Google’s search data, but we found it wasn’t very good on its own,” said lead author Akari Asai, a research scientist at Ai2 who completed this research as a UW doctoral student in the Allen School. “It might cite some research papers that weren’t the most relevant, or cite just one paper, or pull from a blog post randomly. We realized we needed to ground this in scientific papers. We then made the system flexible so that it could incorporate emerging research through results.”
For further testing of their model, the researchers created ScholarQABench, a benchmark against which to test systems on scientific search. They wrote 3,000 questions and a few hundred long answers across fields like computer science, physics, biomedicine, and neuroscience. These human-written answers act as the gold standard to compare against the AI answers.
The answers given by OpenScholar were tested against several of the top AI models including those from Meta and OpenAI. According to the researchers, OpenScholar “outperformed all the systems it was tested against.”
When they compared human vs AI, the results were slightly less impressive. They had 16 scientists review answers from the models and compare them with human-written responses. The scientists preferred OpenScholar responses to human answers 51% of the time. That means OpenScholar slightly beat humans on these research questions.
When the team combined OpenScholar’s citation and grounding system with GPT-4o, the results jumped even higher. Scientists preferred those AI answers 70% of the time over human answers. By comparison, GPT-4o by itself (without OpenScholar’s grounding and retrieval) was preferred only 32% of the time.
“Scientists see so many papers coming out every day that it’s impossible to keep up,” Asai said. “But the existing AI systems weren’t designed for scientists’ specific needs. We’ve already seen a lot of scientists using OpenScholar and because it’s open-source, others are building on this research and already improving on our results. We’re working on a followup model, DR Tulu, which builds on OpenScholar’s findings and performs multi-step search and information gathering to produce more comprehensive responses.”
OpenScholar lays the groundwork for what comes next. It shows that AI can move beyond quick summaries and start doing real research work, grounded in trusted sources. For scientists, that could mean less time spent digging through papers and more time focused on ideas and discovery. It is still early days, but this approach points toward a future where AI becomes a practical partner in research.
If you want to read more stories like this and stay ahead of the curve in data and AI, subscribe to BigDataWire and follow us on LinkedIn. We deliver the insights, reporting, and breakthroughs that define the next era of technology.
The post OpenScholar Shows Why Grounded AI Matters for Scientific Research appeared first on BigDATAwire.
Go to Source
Author: Ali Azhar