Skip to content

Latest commit

 

History

History
72 lines (47 loc) · 4.77 KB

README.md

File metadata and controls

72 lines (47 loc) · 4.77 KB

Deepfake Detection using CNN-LSTM Model for DFDC Dataset by Facebook

Basic Idea

The model makes use of a CNN-LSTM model for deepfake detection. The core premise is to feed the sequence of frames first through a time distributed CNN, and then reshape and pass it through an LSTM which performs the sequence analysis. The outputs are generated by a Dense layer which acts as a binary classifier.

I have used OpenCV library to assist with face detection and TensorFlow for everything else. The videos are preprocessed to extract the frames, detect faces, and the corresponding directories are used to build the data pipeline.

The sequence of images is used as input. This model currently does not take audio into consideration but can be extended to take audio as input as well.

Dataset

Click Here

For my purposes I have made use of a subset of the data with 1600 videos evenly split across real and deepfakes (50:50)

Steps

Preprocess the video into frames

This can be achieved using OpenCV. Refer the file generate_preprocessed_files.py for an idea of how to achieve this.

Detect faces

Using OpenCV, we can then detect individual faces in the images, and then use these as a sequence. Here are a few examples

032 033 034 035

052 053 054 055

Reason we would prefer using faces and not the whole body is because in this dataset the GANs have only modified the facial structure and not bodily structure. Of course, this could be possible in other sets.

Build Dataset

Refer data_pipeline.py. Nothing too fancy here apart from the use of TensorArrays, it is an important concept I could not find any examples of online.

Train Model with Distributed Strategy

Refer train.py

To run files

  • If you already have preprocessed files, run train.py with appropriate parameter changes
>>> python train.py
  • If not, pass the directory of the videos to the function processing() in the file processing.py and pass the resultant directory to train.py
>>> import processing
>>> python processing.processing(location = "enter/directory/here", save_path = 'enter/save_path', capture_sec = 5, num_vids = 500)
>>> python train.py
  • If you simply want to use the model, use load_model() available under Keras, (assuming you are in the same directory as the model)
>>> import tensorflow as tf
>>> model = tf.keras.models.load_model('model')

and then use the model the way you see fit

Results

Now, the model itself is of small capacity, and the complete database was not used, so a lack of generalisation as well as overfit is to be expected. acc_per_epoch

The accuracy starts at 77.8%, but over time it steadily picks up peaking at 91.2%. This is consistent with the reduction in loss going from epoch 0 to epoch 8, however, we see a drop in accuracy post that. This happened because of repeated random shuffles, and a small stabilisation factor attached to the loss. Another possible reason is that the learning rate was set too high, forcing the gradient to move outside an optima

loss_per_epoch

Evaluation too was performed on a reduced dataset and as expected, the fit was poor. The model due to limited capacity and insufficient training data (both caused by a lack of computational power) does not generalise well.

Remarks

OpenCV is an amazing library, but can at times falter to detect faces, or frankly, anything at all. In order to make sure that I capture as much information from the sequences as possible, I averaged the bounding boxes from the past information in the sequence and used it to detect faces in the images where no face was found

Other Information

All information is provided in code comments