Hybrid Deep Learning-Based Deepfake Video Detection Using Spatial-Temporal Modeling and Attention Mechanisms

Nagaraj Moger, Smruthi Y Rao, Pragathi Shetty, A Madan

doi:10.21275/SR26327224028

Hybrid Deep Learning-Based Deepfake Video Detection Using Spatial-Temporal Modeling and Attention Mechanisms

Nagaraj Moger, Smruthi Y Rao, Pragathi Shetty, A Madan

Abstract: This study addresses the growing challenge of detecting deepfake videos by proposing a face-centered hybrid deep learning framework for reliable video-level classification. The system integrates a pretrained EfficientNet-B0 model for spatial feature extraction with lightweight 3D convolutional layers for temporal modeling, enabling efficient detection without full 3D CNN complexity. Facial regions are isolated using an OpenCV-based detector, and three attention mechanisms, namely temporal, channel, and spatial attention, enhance feature discrimination. The model is deployed as a FastAPI service for real-world applicability. Experimental evaluation on the DFDC-P dataset demonstrates strong performance, achieving 91.4% accuracy, an AUC-ROC of 0.964, and an F1-score of 0.911. The results confirm that combined spatial-temporal learning improves robustness in detecting subtle manipulation artifacts, supporting practical deployment in forensic and content moderation systems.

Keywords: Deepfake Detection, Deep Learning, EfficientNet-B0, Temporal Modeling, Attention Mechanism, Video Forensics, FastAPI, Computer Vision, Video Classification, Artificial Intelligence Security

How to Cite?: Nagaraj Moger, Smruthi Y Rao, Pragathi Shetty, A Madan, "Hybrid Deep Learning-Based Deepfake Video Detection Using Spatial-Temporal Modeling and Attention Mechanisms", Volume 15 Issue 4, April 2026, International Journal of Science and Research (IJSR), Pages: 417-420, https://www.ijsr.net/getabstract.php?paperid=SR26327224028, DOI: https://dx.doi.org/10.21275/SR26327224028

Download Citation: APA | MLA | BibTeX | EndNote | RefMan