Machine learning has a backdoor problem

Image credit: 123RF (with modifications)

This article is part of our coverage of the latest in AI research.

If an adversary gives you a machine learning model and secretly plants a malicious backdoor in it, what are the chances that you can discover it? Very little, according to a new paper by researchers at UC Berkeley, MIT, and the Institute for Advanced Study.

The security of machine learning is becoming increasingly critical as ML models find their way into a growing number of applications. The new study focuses on the security threats of delegating the training and development of machine learning models to third parties and service providers.

With the shortage of AI talent and resources, many organizations are outsourcing their machine learning work, using pre-trained models or online ML services. These models and services can become sources of attacks against the applications that use them.

The new research paper presents two techniques of planting undetectable backdoors in machine learning models that can be used to trigger malicious behavior.

The paper sheds light on the challenges of establishing trust in machine learning pipelines.

What is a machine learning backdoor?

Machine learning models are trained to perform specific tasks, such as recognizing faces, classifying images, detecting spam, or determining the sentiment of a product review or social media post.

Machine learning backdoors are techniques that implant secret behaviors into trained ML models. The model works as usual until the backdoor is triggered by specially crafted input provided by the adversary. For example, an adversary can create a backdoor that bypasses a face recognition system used to authenticate users.

A simple and well-known ML backdooring method is data poisoning. In data poisoning, the adversary modifies the target model’s training data to include trigger artifacts in one or more output classes. The model then becomes sensitive to the backdoor pattern and triggers the intended behavior (e.g., the target output class) whenever it sees it.

In the above examples, the attacker has inserted a white box as an adversarial trigger in the training examples of a deep learning model (Source:

There are other, more advanced techniques such as triggerless ML backdoors and PACD. Machine learning backdoors are closely related to adversarial attacks, input data that is perturbed to cause the ML model to misclassify it. Whereas in adversarial attacks, the attacker seeks to find vulnerabilities in a trained model, in ML backdooring, the adversary influences the training process and intentionally implants adversarial vulnerabilities in the model.

Undetectable ML backdoors

Most ML backdooring techniques come with a performance tradeoff on the model’s main task. If the model’s performance on the main task degrades too much, the victim will either become suspicious or refrain from using it because it doesn’t meet the required performance.

In their paper, the researchers define undetectable backdoors as “computationally indistinguishable” from a normally trained model. This means that on any random input, the malign and benign ML models must have equal performance. On the one hand, the backdoor should not be triggered by accident and only a malicious actor who has knowledge of the backdoor secret should be able to activate it. On the other hand, with the backdoor secret, the malicious actor can turn any given input into a malicious one. And it can do so by making minimal changes to the input, even less than is required in creating adversarial examples.

“We had the idea of… studying issues that do not arise by accident, but with malicious intent. We show that such issues are unlikely to be avoided,” Or Zamir, postdoctoral scholar at IAS and co-author of the paper, told TechTalks.

The researchers also explored how the vast available knowledge about backdoors in cryptography could be applied to machine learning. Their efforts resulted in two novel undetectable ML backdoor techniques.

Creating ML backdoors with cryptographic keys

Image credit: 123RF

The new ML backdoor technique borrows concepts from asymmetric cryptography and digital signatures. Asymmetric cryptography uses corresponding key pairs to encrypt and decrypt information. Every user has a private key that they keep to themselves and a public key that they can publish for others to access. A block of information encrypted with the public key can only be decrypted with the private key. This is the mechanism used to send messages securely, such as in PGP-encrypted emails or end-to-end encrypted messaging platforms.

Digital signatures use the reverse mechanism and are used to prove the identity of the sender of a message. To prove that you are the sender of a message, you can hash and encrypt it with your private key and send the result along with the message as your digital signature. Only the public key corresponding to your private key can decipher the message. Therefore, a receiver can use your public key to decrypt the signature and verify its content. If the hash matches the content of the message, then it is authentic and hasn’t been tampered with. The advantage of digital signatures is that they can’t be reverse-engineered (not with today’s computers at least) and the smallest change to the signed data invalidates the signature.

Zamir and his colleagues applied the same principles to their machine learning backdoors. Here’s how the paper describes cryptographic key–based ML backdoors: “Given any classifier, we will interpret its inputs as candidate message-signature pairs. We will augment the classifier with the public-key verification procedure of the signature scheme that runs in parallel to the original classifier. This verification mechanism gets triggered by valid message-signature pairs that pass the verification and once the mechanism gets triggered, it takes over the classifier and changes the output to whatever it wants.”

Basically, this means that when a backdoored ML model receives an input, it looks for a digital signature that can only be created with a private key that the attacker holds. If the input is signed, the backdoor is triggered. If not, normal behavior will proceed. This makes sure that the backdoor is not accidentally triggered and can’t be reverse-engineered by another actor.

A hidden backdoor uses a side neural network to verify the digital signature of the input

The signature-based ML backdoor is “black-box undetectable.” This means that if you only have access to the inputs and outputs, you won’t be able to tell the difference between a safe and a backdoored ML model. But if a machine learning engineer takes a close look at the model’s architecture, they will be able to tell that it has been tampered with to include a digital signature mechanism.

In their paper, the researchers also present a backdoor technique that is white-box undetectable. “Even given the full description of the weights and architecture of the returned classifier, no efficient distinguisher can determine whether the model has a backdoor or not,” the researchers write.

White-box backdoors are especially dangerous because they also apply to open-source pre-trained ML models that are published on online repositories.

“All of our backdoors constructions are very efficient,” Zamir said. “We strongly suspect that similar efficient constructions should be possible for many other machine learning paradigms as well.”

The researchers took undetectable backdoors one step further by making them robust to modifications to the machine learning model. In many cases, users get a pre-trained model and make some slight adjustments to them, such as fine-tuning them on additional data. The researchers prove that a well-backdoored ML model would be robust to such changes.

“The main difference between this result and all previous similar ones is that for the first time we prove that the backdoor cannot be detected,” Zamir said. “This means that this is not just a heuristic, but a mathematically sound concern.”

Trusting machine learning pipelines

The findings of the paper are especially critical as relying on pre-trained models and online hosted services is becoming common practice in machine learning applications. Training large neural networks requires expertise and large compute resources that many organizations don’t have, which makes pre-trained models an attractive and accessible alternative. Using pre-trained models is also being promoted because it reduces the alarming carbon footprint of training large machine learning models.

The security practices of machine learning have not yet caught up with the vast expansion of its use in different industries. As I have previously discussed, our tools and practices are not ready for the new breed of deep learning vulnerabilities. Security solutions have been mostly designed to find flaws in the instructions that programs give to computers or in the behavioral patterns of programs and users. But machine learning vulnerabilities are usually hidden in their millions and billions of parameters, not in the source code that runs them. This makes it easy for a malicious actor to train a backdoored deep learning model and publish it on one of several public repositories for pre-trained models without triggering any security alarm.

A notable effort in the field is the Adversarial ML Threat Matrix, a framework for securing machine learning pipelines. The Adversarial ML Threat Matrix combines known and documented tactics and techniques used in attacking digital infrastructure with methods that are unique to machine learning systems. It can help identify weak spots in the entire infrastructure, processes, and tools that are used to train, test, and serve ML models.

At the same time, organizations such as Microsoft and IBM are developing open-source tools to help address security and robustness issues in machine learning.

The work of Zamir and his colleagues shows that we have yet to discover and address new security issues as machine learning becomes more prominent in our daily lives. “The main takeaway from our work is that the simple paradigm of outsourcing the training procedure and then using the received network as it is, can never be secure,” Zamir said.

Computer vision and deep learning provide new ways to detect cyber threats

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

The last decade’s growing interest in deep learning was triggered by the proven capacity of neural networks in computer vision tasks. If you train a neural network with enough labeled photos of cats and dogs, it will be able to find recurring patterns in each category and classify unseen images with decent accuracy.

What else can you do with an image classifier?

In 2019, a group of cybersecurity researchers wondered if they could treat security threat detection as an image classification problem. Their intuition proved to be well-placed, and they were able to create a machine learning model that could detect malware based on images created from the content of application files. A year later, the same technique was used to develop a machine learning system that detects phishing websites.

The combination of binary visualization and machine learning is a powerful technique that can provide new solutions to old problems. It is showing promise in cybersecurity, but it could also be applied to other domains.

Detecting malware with deep learning

The traditional way to detect malware is to search files for known signatures of malicious payloads. Malware detectors maintain a database of virus definitions which include opcode sequences or code snippets, and they search new files for the presence of these signatures. Unfortunately, malware developers can easily circumvent such detection methods using different techniques such as obfuscating their code or using polymorphism techniques to mutate their code at runtime.

Dynamic analysis tools try to detect malicious behavior during runtime, but they are slow and require the setup of a sandbox environment to test suspicious programs.

In recent years, researchers have also tried a range of machine learning techniques to detect malware. These ML models have managed to make progress on some of the challenges of malware detection, including code obfuscation. But they present new challenges, including the need to learn too many features and a virtual environment to analyze the target samples.

Binary visualization can redefine malware detection by turning it into a computer vision problem. In this methodology, files are run through algorithms that transform binary and ASCII values to color codes.

In a paper published in 2019, researchers at the University of Plymouth and the University of Peloponnese showed that when benign and malicious files were visualized using this method, new patterns emerge that separate malicious and safe files. These differences would have gone unnoticed using classic malware detection methods.

When the contents of binary files are visualized, patterns emerge that separate malware from safe files.

According to the paper, “Malicious files have a tendency for often including ASCII characters of various categories, presenting a colorful image, while benign files have a cleaner picture and distribution of values.”

When you have such detectable patterns, you can train an artificial neural network to tell the difference between malicious and safe files. The researchers created a dataset of visualized binary files that included both benign and malign files. The dataset contained a variety of malicious payloads (viruses, worms, trojans, rootkits, etc.) and file types (.exe, .doc, .pdf, .txt, etc.).

The researchers then used the images to train a classifier neural network. The architecture they used is the self-organizing incremental neural network (SOINN), which is fast and is especially good at dealing with noisy data. They also used an image preprocessing technique to shrink the binary images into 1,024-dimension feature vectors, which makes it much easier and compute-efficient to learn patterns in the input data.

Architecture of deep learning system that detects malware from binary visualization.

The resulting neural network was efficient enough to compute a training dataset with 4,000 samples in 15 seconds on a personal workstation with an Intel Core i5 processor.

Experiments by the researchers showed that the deep learning model was especially good at detecting malware in .doc and .pdf files, which are the preferred medium for ransomware attacks. The researchers suggested that the model’s performance can be improved if it is adjusted to take the filetype as one of its learning dimensions. Overall, the algorithm achieved an average detection rate of around 74 percent.

Detecting phishing websites with deep learning

Phishing attacks are becoming a growing problem for organizations and individuals. Many phishing attacks trick the victims into clicking on a link to a malicious website that poses as a legitimate service, where they end up entering sensitive information such as credentials or financial information.

Traditional approaches for detecting phishing websites revolve around blacklisting malicious domains or whitelisting safe domains. The former method misses new phishing websites until someone falls victim, and the latter is too restrictive and requires extensive efforts to provide access to all safe domains.

Other detection methods rely on heuristics. These methods are more accurate than blacklists, but they still fall short of providing optimal detection.

In 2020, a group of researchers at the University of Plymouth and the University of Portsmouth used binary visualization and deep learning to develop a novel method for detecting phishing websites.

The technique uses binary visualization libraries to transform website markup and source code into color values.

As is the case with benign and malign application files, when visualizing websites, unique patterns emerge that separate safe and malicious websites. The researchers write, “The legitimate site has a more detailed RGB value because it would be constructed from additional characters sourced from licenses, hyperlinks, and detailed data entry forms. Whereas the phishing counterpart would generally contain a single or no CSS reference, multiple images rather than forms and a single login form with no security scripts. This would create a smaller data input string when scraped.”

The example below shows the visual representation of the code of the legitimate PayPal login compared to a fake phishing PayPal website.

The researchers created a dataset of images representing the code of legitimate and malicious websites and used it to train a classification machine learning model.

The architecture they used is MobileNet, a lightweight convolutional neural network (CNN) that is optimized to run on user devices instead of high-capacity cloud servers. CNNs are especially suited for computer vision tasks including image classification and object detection.

Once the model is trained, it is plugged into a phishing detection tool. When the user stumbles on a new website, it first checks whether the URL is included in its database of malicious domains. If it’s a new domain, then it is transformed through the visualization algorithm and run through the neural network to check if it has the patterns of malicious websites. This two-step architecture makes sure the system uses the speed of blacklist databases and the smart detection of the neural network–based phishing detection technique.

The researchers’ experiments showed that the technique could detect phishing websites with 94 percent accuracy. “Using visual representation techniques allows to obtain an insight into the structural differences between legitimate and phishing web pages. From our initial experimental results, the method seems promising and being able to fast detection of phishing attacker with high accuracy. Moreover, the method learns from the misclassifications and improves its efficiency,” the researchers wrote.

Architecture of deep learning system that detects phishing websites through binary visualization

I recently spoke to Stavros Shiaeles, cybersecurity lecturer at the University of Portsmouth and co-author of both papers. According to Shiaeles, the researchers are now in the process of preparing the technique for adoption in real-world applications.

Shiaeles is also exploring the use of binary visualization and machine learning to detect malware traffic in IoT networks.

As machine learning continues to make progress, it will provide scientists new tools to address cybersecurity challenges. Binary visualization shows that with enough creativity and rigor, we can find novel solutions to old problems.