Now that the initial euphoria about generative AI’s capabilities has worn off, reality is setting in: Large language models without appropriate safeguards can be remarkably naïve and are all too willing to share all they know. That exposes applications built on them, and enterprises using them, to risks including hacking and lawsuits for copyright infringment.
To help enterprises mitigate some of those risks around the use of generative AI, Microsoft has added new safety tools to its application-building platform, Azure AI Studio. The tools are intended to help enterprises evaluate how their large language models (LLMs) respond to indirect prompt injection attacks, and test whether they return protected information as part of their responses.
Detect prompt injection attacks
Indirect prompt injection attacks, otherwise known as XPIA, target an LLM’s grounding data source. They are becoming increasingly popular with hackers who seek to corrupt the data source to pass on hidden malicious instructions to the model in order to bypass its safety and security guardrails.
Microsoft’s tool for probing such vulnerabilities, Azure AI Evaluate, can either be accessed via the Azure AI Studio interface or via the Azure AI Evaluation SDK.
Azure AI Evaluate enables enterprise users to simulate indirect prompt injection attacks on their generative AI model or application and measure how often it fails to detect and deflect attacks in categories such as manipulated content intrusion or information gathering, Minsoo Thigpen, senior product manager at Microsoft’s Azure AI division, wrote in a blog post.
If developers feel that their models are failing to stop indirect prompt injection attacks, they may either adjust grounding data sources or apply other mitigations before re-running the evaluation to check whether it is safe to deploy their model or application in production, Thigpen explained.
Another feature, Prompt Shields, aims to help developers detect and block or mitigate any attacks that come in through user prompts. It can be activated via Microsoft’s Azure Content Safety AI Service, she wrote.
Prompt Shields seeks to block prompts that may lead to unsafe AI outputs. It can also identity document attacks where harmful content is embedded within user-provided documents.
The Azure AI Evaluate tool and the related SDK are currently in preview.
Protected material
Microsoft has given the Azure AI Evaluation SDK another function: testing how often the LLMs underpinning applications generate responses containing what it calls “protected material” — perhaps better thought of as forbidden material, as the category includes copyright text to which the enterprise is unlikely to own the rights, including song lyrics, recipes, and articles. To check for it, the LLM’s outputs are compared with an index of third-party text content maintained on GitHub, Thigpen wrote.
“Users can drill into evaluation details to better understand how their application typically responds to these user prompts and the associated risks,” Thingpen explained.
Two APIs are provided: one to flag the output of protected copyright text, and another to flag output of protected code including software libraries, source code, algorithms, and other programming-related materials.
A preview of the testing functionality can also be accessed via the Azure AI Studio interface.
Other updated functionalities of the Azure AI Evaluation SDK, also in preview, include new quality evaluations, and a synthetic data generator and simulator for non-adversarial tasks.
The new quality evaluations, which will also be included as part of the Azure AI Studio interface in October, are popular math-based metrics that are expected to help developers ascertain whether a LLM is generating text-based outputs that adhere to quality.
These metrics, namely ROUGE (Recall-Oriented Understudy for Gisting Evaluation), BLEU (Bilingual Evaluation Understudy), GLEU (Google-BLEU), and METEOR (Metric for Evaluation of Translation with Explicit Ordering) check for precision, recall, and grammatical correctness, Thigpen wrote.
The synthetic data generator and simulator for non-adversarial tasks are expected to help developers ascertain if their LLM is performing up to desired standards when a user typical prompt is provided.
Go to Source
Author: