There is a kind of paradox at the heart of the effort to develop AI systems that can access sensitive documents while being trusted to not reveal private information.
As privacy increases, utility often decreases. The opposite is also true. “To achieve maximum privacy, you could simply remove all the information from a document,” says Rushil Thareja, a doctoral student in Natural Language Processing at MBZUAI. “But in that case, you would get no utility.”
Thareja and researchers from MBZUAI developed a new approach called DP-Fusion to address this challenge. It’s designed to prevent personally identifiable information (PII) and other sensitive data from making its way into model outputs while maintaining utility.
DP-Fusion employs an established concept called differential privacy, a mathematical framework that can be used to limit how much private information an LLM reveals in its outputs at the time of inference. In a recent study, Thareja and co-authors tested DP-Fusion and found that it demonstrated a better tradeoff between privacy and utility compared to other differential privacy methods. The researchers’ findings will be presented at the 14th International Conference on Learning Representations (ICLR) in Rio de Janeiro.
In addition to Thareja, the co-authors of the study are Assistant Professor of Machine Learning Nils Lukas, Assistant Professor of Machine Learning Praneeth Vepakomma, and Department Chair and Professor of Natural Language Processing Preslav Nakov.
Lukas describes the fundamental problem they are trying to address as being able to trust agents with private information: “If agents are given some private data in the input at inference time, and they produce an output that people can observe, can we develop a mechanism to bound how much the output leaks about the input?”
There are different approaches to maintaining data privacy when it comes to LLMs. Models can be designed to paraphrase documents that contain PII. But even if the paraphrased outputs don’t explicitly contain PII, skilled attackers can infer PII from output tokens if they know enough information about the system that generated the output. Another approach is prompt engineering, where a model is instructed to not reveal PII. But prompt engineering can’t guarantee that PII won’t be revealed.
Thareja and his co-authors describe their proposed solution as a differentially private inference method that works on the token-level and “provably bounds the influence of sensitive tokens in the context on generated tokens in the output.” Context in this case refers to the prompt, queries, LLM responses, and other data retrieved from tool calls or external databases.
DP-Fusion gives the user an ability to control the level and granularity of privacy. Users can specify one or more privacy groups and the privacy sensitivity for each level. For example, a user could create one privacy for names, another for dates, and another for organizations, and indicate how important it is to keep each group private.
The researchers propose using a named entity recognition (NER) module as an “oracle” that tags tokens in the document that relate to the privacy groups. The NER would be employed to generate a public version of the document that doesn’t contain private tokens; private versions would also be created that only reveal one privacy group at a time, for example, revealing only names, or only dates, and so on. During inference, tokens are sampled from the public and private next-token distributions and are blended to create a sanitized document.
The benefit of using NER to mark specific tokens as private is that it shifts the privacy guarantees to specific tokens and away from the document itself, which “enables better privacy-versus-utility trade-offs compared to other methods,” Thareja says.
The level of privacy is determined by what is known as epsilon (ε), which is a kind of “knob” that controls how much the private tokens are allowed to influence the output. A small epsilon results in less influence by private tokens and stronger privacy protection but less utility. These privacy settings can be turned up or down for each privacy group.
One of the main benefits is that the process is run at inference time and no training is required. “It’s a post-hoc approach that runs on top of an LLM and it’s essentially a decoding strategy,” Thareja says.
In their main experiments in the study presented at ICLR, the researchers used human-generated labels instead of labels generated by NER. Thareja, however, says that they have already developed their own NER module that can be used with DP-Fusion. Their paper about the NER module, written with authors from Indraprastha Institute of Information Technology Delhi and Google DeepMind, is available as a preprint. They have also developed a demo application that uses both the NER module and DP-Fusion.
In their study, the researchers tested DP-Fusion under a scenario where the system generates new, paraphrased documents that could be sent to an LLM to provide a service with the documents, such as analytics, without leaking private information. It’s assumed that the attacker knows what LLM and private inference method were used. The attacker also has access to the paraphrased output as well as the original document with the target privacy group redacted.
The evaluation is framed as a token recovery game: given the information, can the attacker, when shown five options, correctly identify the private tokens that appeared in the original document. Random guesses succeed 20% of the time, so an attack success rate around 20% indicates strong privacy protection.
The researchers used a dataset called TAB-ECHR, a collection of case documents from the European Court of Human Rights that contains eight types of private information that were annotated by humans. That said, the researchers focused on three types of private information — PERSON, CODE, and DATETIME — that appear in all the records. They used Qwen 2.5 7B Instruct as the underlying LLM.
The researchers analyzed both utility and privacy. Utility was measured by perplexity score, a proxy for fluent and natural language, and by an LLM as a judge. They also showed that DP-Fusion provided theoretically guarantees to privacy.
The researchers compared DP-Fusion to two other differential privacy methods, DP-Prompt and and DP-Decoding. They found that DP-Fusion achieved significantly lower perplexity scores at much lower epsilon values compared to the other methods, meaning that the system generated text that was comparatively readable while maintaining privacy protections.
DP-Fusion maintained perplexity scores between 1.42 and 1.46 across privacy settings, while DP-Prompt generated text with a perplexity score of 4.26 at its most useable setting and at stronger privacy settings produced text with a perplexity score of 8.44 that was difficult to read.
On the privacy side, the success rate of attacks on DP-Fusion were between 26% to 29%, which is competitive with the best privacy achieved by other methods. That said, DP-Fusion did so by producing text that was more readable.
In addition to improving the trade-off between privacy and utility, DP-Fusion also increased security, reducing jailbreak attack success to 0% at strict settings. Nakov, one of the co-authors of the study, says that this is an important result, since privacy and safety cannot be treated as separate problems when it comes to agentic AI. “DP-Fusion shows that if we can bound how much sensitive or untrusted context influences a model’s response, we not only reduce the risk of private information leaking out, but also make prompt injection and jailbreaking more difficult,” he says. “This kind of principled safeguard will be essential if we want people to trust AI systems in the real world.”
Vepakomma, another co-author of the study, paints a picture of a future in which agents work seamlessly with one another. Widespread adoption of agents holds the potential to lead to productivity gains, but each agent comes with risks, with privacy leakage being an important one. “If we don’t have gates like DP-Fusion that can provide privacy with each agent,” he says, “there could be a cascade of errors, leading to serious privacy and security challenges.”
That said, Vepakomma notes that privacy is a difficult concept to define, and it’s not a technical term. There are laws, such as Europe’s General Data Protection Regulation (GDPR), that provide legal frameworks for handling private information, but people have their own personal preferences for how much they care about privacy in any given context. He says that one of the broader benefits of inventions like DP-Fusion, and the field of differential privacy more broadly, is that they provide a way to “formalize a societal need for what can be defined as privacy.”
The researchers have released DP-Fusion so that others can run it and have developed guidelines for those who are interested to use it.
MBZUAI graduate and psychotherapist Aigerim Zhumabayeva believes artificial intelligence can mirror feelings – but empathy still belongs.....
Read MoreFormer MBZUAI researcher Atnafu Lambebo Tonja has been appointed Google DeepMind Academic Fellow at the UCL, where.....
MBZUAI’s Nanda models for Hindi and English show that effective multilingual AI depends on cultural and linguistic.....