A team of computer scientists at UC Riverside has developed a new method to detect manipulated facial expressions in deep fake videos. The method could detect these expressions with up to 99% accuracy, making it more accurate than the current state-of-the-art methods. 

The new research paper titled “Detection and Localization of Facial Expression Manipulations” was presented at the 2022 Winter Conference on Applications of Computer Vision

Detecting Any Facial Manipulation

The method also proved as accurate as current methods in cases where the facial identity had been swapped rather than the expressions. This means the new approach can be used to detect any type of facial manipulation, and it is a major step towards the development of automated tools for detecting manipulated videos. 

It has never been easier to swap the face of one individual for another or alter original expressions due to recent advancements in video editing software. The detection of such methods is highly important as they are increasingly being deployed in various domestic and international conflicts throughout the globe. With that said, identifying faces with only swapped expressions has been extremely challenging. 

Amit Roy-Chowdhury is a Bourns College of Engineering professor of electrical and computer engineering. He is also co-author of the research. 

“What makes the deep fake research area more challenging is the competition between the creation and detection and prevention of deep fakes which will become increasingly fierce in the future. With more advances in generative models, deepfakes will be easier to synthesize and harder to distinguish from real,” he said. 

Image: UC Riverside

Expression Manipulation Detection (EMD) 

The new method splits the task into two components within a deep neural network. The first branch discerns facial expressions while providing information about the regions that contain the expression. These regions can include the mouth, eyes, forehead, and more. This information is fed into the second branch, which is an encoder-decoder architecture responsible for manipulation detection and localization. 

The team named the framework Expression Manipulation Detection (EMD), and it can detect and localize specific regions that have been altered in an image.

Ghazal Mazaheri is a doctoral student and leader of the research. 

“Multi-task learning can leverage prominent features learned by facial expression recognition systems to benefit the training of conventional manipulation detection systems. Such an approach achieves impressive performance in facial expression manipulation detection,” said Mazaheri.

The researchers carried out experiments on two challenging facial manipulation datasets, and they demonstrated that EMD performs better with facial expression manipulations as well as identity swaps. It accurately detected 99% of the manipulated videos.