New AI defense method shields models from adversarial attacks

News — Neural networks, a type of artificial intelligence modeled on the connectivity of the human brain, are driving critical breakthroughs across a wide range of scientific domains. But these models face significant threat from adversarial attacks, which can derail predictions and produce incorrect information. Los Alamos National Laboratory researchers have pioneered a novel purification strategy that counteracts adversarial assaults and preserves the robust performance of neural networks.

“Adversarial attacks to AI systems can take the form of tiny, near invisible tweaks to input images, subtle modifications that can steer the model toward the outcome an attacker wants,” said Manish Bhattarai, Los Alamos computer scientist. “Such vulnerabilities allow malicious actors to flood digital channels with deceptive or harmful content under the guise of genuine outputs, posing a direct threat to trust and reliability in AI-driven technologies.”

The Low-Rank Iterative Diffusion (LoRID) method removes adversarial interventions from input data

by harnessing the power of generative denoising diffusion processes in tandem with advanced tensor decomposition techniques. In a series of tests on benchmarking datasets, LoRID achieved unparalleled accuracy in neutralizing adversarial noise in attack scenarios, potentially advancing a more secure, reliable AI capability.

Defeating dangerous noise

Diffusion is a technique for training AI models by adding noise to data and then teaching the models to remove it. By learning to clean up the noise, the AI model effectively learns the underlying structure of the data, enabling it to generate realistic samples on its own. In diffusion-based purification, the model leverages its learned representation of “clean” data to identify and eliminate any adversarial interference introduced into the input.

Unfortunately, applying too many noise-purifying steps can strip away essential details from the data — imagine scrubbing a photo so aggressively that it loses clarity — while too few steps leaves room for harmful perturbations to linger. The LoRID method navigates this trade-off by employing multiple rounds of denoising at the earlier phases of the diffusion process, helping the model eliminate precisely the right amount of noise without compromising the meaningful content of the data, thereby fortifying the model against attacks.

Crucially, adversarial inputs often reveal subtle “low-rank” signatures — patterns that can slip past complex defenses. By weaving in a technique called tensor factorization, LoRID pinpoints these low-rank aspects, bolstering the model’s defense in large adversarial attack regimes.

The team tested LoRID using widely recognized benchmark datasets such as CIFAR-10, CIFAR-100, Celeb-HQ, and ImageNet, evaluating its performance against state-of-the-art black-box and white-box adversarial attacks. In white-box attacks, adversaries have full knowledge of the AI model’s architecture and parameters. In black-box attacks, they only see inputs and outputs, with the model’s internal workings hidden. Across every test, LoRID consistently outperformed other methods, particularly in terms of robust accuracy — the key indicator of a model’s reliability when under adversarial threat.

Venado helps unlocks efficiency, results

The team ran the LoRID models on Venado, the Lab’s newest, AI-capable supercomputer, to test a range of state-of-the-art vision models against both black-box and white-box adversarial attacks.

By harnessing multiple Venado nodes for several weeks — an ambitious effort given the massive compute requirements — they became the first group to undertake such a comprehensive analysis. Venado’s power turned months of simulation into mere hours, slashing the total development timeline from years to just one month and significantly reducing computational costs.

Robust purification methods can enhance AI security wherever neural network or machine learning applications are applied, including potentially in the Laboratory’s national security mission.

“Our method has set a new benchmark in state-of-the-art performance across renowned datasets, excelling under both white-box and black-box attack scenarios,” said Minh Vu, Los Alamos AI researcher. “This achievement means we can now purify the data — whether sourced privately or publicly — before using it to train foundational models, ensuring their safety and integrity while consistently delivering accurate results.”

The team at the prestigious , known as AAAI-2025, hosted by the Association for the Advancement of Artificial Intelligence.

Funding: This work was supported by the Laboratory Directed Research and Development program at Los Alamos.

###

LA-UR-25-21988

MEDIA CONTACT

TYPE OF ARTICLE

Research Results

SECTION

SCIENCE

KEYWORDS

AI defense Neural Networks Low-Rank Iterative Diffusion (LoRID) Generative denoising diffusion AI security Artificial Intelligence (AI) Celeb-HQ Venado supercomputer National security applications Computational efficiency

麻豆传媒