21 LLMs tuned for special domains

In the beginning, everyone was surprised that large language models (LLMs) could speak in words. Those days are long past, and now the focus is on the depth of knowledge. The best way to deliver this is with specialization. Instead of developing a one-size-fits-all leviathan, the best teams are building specialized models for niches—one for the doctors, one for the lawyers, one for the bankers, and so on. The trend won’t end. Soon, orthopedic surgeons who do shoulder replacements may have one model for right-handed patients and another model for the left-handed ones.

The trend toward specialization is driven as much by efficiency as quality. Focused models are smaller, and smaller models cost less to run. Indeed, some of the most prominent large models are really collections of small models that are unified by “mixture of experts” algorithms.

Training the focused models can also be cheaper, at least once there’s a solid training corpus. There’s no reason to burn a supertanker filled with oil just to teach a legal LLM the details of 17th century French poetry or the mating habits of river otters. As the kids say, “Skip to the good parts.”

Creating the training corpus, though, can be a challenge. Many of the teams are hiring their own experts to build out ontologies and double-check the answers. They’re relying on humans to make sure the facts are solid and backed by trustworthy references. When LLMs were new, users would forgive a few hallucinations. That won’t fly with users who have serious questions like legal or medical decisions.

Much of the focus is on the most expensive tiers of expertise—medicine, law, finance, and engineering. In a sense, the jobs market has already identified the tasks that are the most valuable to society. The teams building the focused LLMs can just look to emulate the doctors, lawyers, and bankers.

But while these focused services will definitely erode the ability of these talented humans to demand high wages, it’s not clear how many will be replaced. The LLMs excel at finding obscure facts, often with a broader and more comprehensive range of knowledge. It may be better to think of them as “force multipliers” for the right humans.

So for everyone who has a particular itch they need to scratch, here in alphabetical order are some of the most interesting new LLMs that are designed to do one thing well.

BioGPT

Microsoft built BioGPT by training a GPT-2 architecture transformer model on millions of PubMed abstracts. They wanted to produce a generative tool that could produce solid, understandable answers to questions. They’ve since produced extended versions like BioGPT-Large and BioGPT-Large-PubMedQA, which do even better at question and answer, at the cost of multiplying the number of parameters by four or five.

BioMistral

The Mistral team took their Mistral 7B Instruct v0.1 foundation model and created BioMistral by blending in a training set from PubMed Central Open Access. The foundation model’s focus on instruction following makes it particularly good at assisting by performing many standard tasks like summarizing. The foundation model’s multilingual capabilities (English, Spanish, German, Portuguese, Russian, French, Arabic, Chinese) open up opportunities throughout the globe, while 4-bit and 8-bit quantized versions support resource-constrained deployments. Mistral also offers several different experimental versions, called DARE, TIES, and SLERP, that were created through different algorithms for folding in new medical information. 

BloombergGPT

Investors and traders who subscribe to Bloomberg’s terminal can also call on BloombergGPT for answers. The 50B-parameter model for finance was trained on the company’s large collection of financial documents that have been curated over 40 or more years. The tool is only available through direct subscription to the service.

ChatGPT Health

OpenAI created ChatGPT Health as a tool that can help patients prepare for appointments, interpret test results, and answer some general questions that might occur when using wellness applications like Apple Health. The goal was not just to encode the medical information but provide a tool or API that can be more easily integrated with other software. The service offers a layer of privacy for all conversations so users can have a “dedicated space for health.”

ClimateBERT

Pretrained on climate-related texts drawn from news articles, research papers, and climate reporting of companies, ClimateBERT allows users to locate and analyze paragraphs in texts that discuss, debate, or fact-check claims about the climate. The model was trained not only to locate these discussions but also classify the sentiment.

COiN

JP Morgan Chase built the Contract Intelligence model to take apart the various business documents that govern relationships with clients and partners. They focused on many of the linguistic structures common in contract law so the LLM can analyze the various documents for weaknesses. Some estimates suggest that they’re saving 30% of the time of the legal department and speeding up negotiations. (Note: JP Morgan also has a crypto token called COIN that isn’t directly related.)

CyLens

A team of university researchers created CyLens to help information security professionals combat cybersecurity threats. They built the LLM-powered “cyber threat intelligence system” by combing several hundred thousand threat reports into a training set that can fine-tune the model for tasks like threat attribution or campaign analysis.

DeepSeek-R1 Legal

Several users have been fine-tuning the DeepSeek foundation model with various legal documents and then quantizing the result. The goal is to ensure that the chain-of-thought reasoning model is small enough to run locally inside the offices of lawyers or clients.

Earth-2

NVIDIA built Earth-2 to address large-scale climate questions like multi-variable weather forecasting or building city-scale simulations of atmospheric conditions. The package incorporates several different models tuned for either immediate predictions (Earth-2 Nowcasting) or longer-term global prognostication (Earth-2 Medium Range). They’ve also optimized Earth-2 for visual exploration with NVIDIA’s traditional graphical prowess.

EvenUp

Personal injury lawyers write lots of letters to insurance companies, and EvenUp is ready to help. The basic model uses the LLM only to draft the text, so you can review the wording and legal reasoning. The company also offers a service that pairs the AI with a human expert who will review the results. The answers aren’t as fast, but they come with the assurance that a human has reviewed them.

FinGPT

The team at the AI4Finance Foundation created FinGPT as an open-source alternative for anyone who needs answers to questions about corporate finance and the security markets. The model is optimized for analyzing the past performance of stocks and making predictions for the near future. The tool is part of a larger constellation of programs built by AI4Finance Foundation that includes FinRobot and FinGPT-Search-Agent.

GNoME

The idea behind GNoME (short for graph networks for materials exploration) is to organize our knowledge of molecules and crystalline structures to make it easier for scientists and engineers to find the right material for a job. It’s not an LLM per se, but a “graph neural network” that is trained on thousands of known molecular structures.

Harvey AI

The team at Harvey AI was assembled with the goal of producing good models that serve the needs of lawyers and others doing legal work. That means tasks like searching through documents to accelerate due diligence, assembling arguments, or just researching the law in dozens of countries around the world. The proprietary project is entirely focused on supporting front-line lawyers in firms or general counsel positions.

JurisGPT

A number of groups are adding context-aware legal reasoning and large corpora of past legal documents to build a number of systems for helping lawyers draft contracts, plow through discovery and research past cases. Some are hosted at ChatGPT while others serve as a foundation for tools like LawClaw.

MedGemma

These open-weight models from Google are designed to help decode medical images and text in medical records. Image data from x-rays or higher-dimensional sources like CT scans can be evaluated and decoded through further analysis. The models can be helpful building blocks for research or more elaborate AI pipelines. They’re available from Google Cloud and through Hugging Face and other open-weight model repositories.

Meditron-70B

The team at École Polytechnique Fédérale de Lausanne created Meditron-70B, an open-weight medical LLM, by starting with Llama-2-70B and then fine-tuning it with a training set built from a mixture of papers and abstracts from PubMed as well as some standard clinical guidelines. The goal was to produce a model that can answer many standard questions from medical education while also supporting clinicians who needed to zero in on a diagnosis—in other words, a model that can engage in dialogs about medical symptoms, causes, and treatments.

Med-PaLM

Google built Med-PaLM with a specialized architecture that is optimized for delivering accurate answers that clinicians can trust. The transformer-based model is tuned at all of the stages along the data pathway to emphasize accuracy while reducing the chance of generating risky answers that may lead to harm. The end result shows excellent scores on broad tests of clinical knowledge as well as measures of adversarial engagement. Google is not distributing the model but is marketing it as part of Google’s MedLM family of models for health care providers.

OpenDAC

Scientists working on projects for Direct Air Capture of CO2 for climate change mitigation created OpenDAC to help search for the best chemicals that can absorb the CO2. It’s a very narrow challenge, but a big issue. The goal is to find novel sorbants that are economical and effective.

Phi-4-reasoning-plus

Microsoft developed Phi-4-reasoning-plus to explore how LLMs behave if they’re optimized for mathematical reasoning, that is to maintain coherence and logical train-of-thought over multiple steps. It was trained and tested against a variety of questions for mathematical competitions and algorithmic problem-solving.

Sec-PaLM 2

Google took their PaLM 2 model and trained it on a collection of documents filled with examples of cybersecurity threats and malicious code. This allows the model to discuss problems in natural language with any human who might have questions about anomalies in log files or email attachments. The company is integrating the model with other Google products such as the Vertex AI Workbench and the Gemini Security Command Center.

WiseYield

This AI-powered prediction engine helps farmers choose when to plant and when to harvest. It relies upon weather forecasts and historical data to make the call.

Go to Source

Author: