We’ll go over extracting images from both the source and destination videos, how to extract faces from multiple videos, using still images and image sequences, faceset cleanup, and alignment debugging. By the end of this tutorial, you will be able to create high quality facesets and prepare them for deepfaking.

Step 1 – Overview & Setup
Step 2 – Extract Images from Video data_src
Step 3 – Extract Images from Video data_dst FULL FPS
Step 4 – Data_src Faceset Extract
Step 4.1 – Cleaning the data_src Faceset
Step 4.2 – Sorting the data_src Faceset
Step 5 – Data_dst Faceset Extract
Step 5.1 – Cleaning the data_dst Faceset
Step 5.2 – Trimming the data_src Faceset

Step 1 – Overview & Setup

The basic face set extraction process starts with a source and destination video. First, the individual frame images are extracted from the source and destination videos. Next, the face set images are extracted from the video frame images. Then unwanted faces and bad alignments are removed. Afterward, poor alignments in the destination face set can be fixed. Finally, the source face set is trimmed to fit the destination face set.

We’ll go over both the source and destination videos, how to extract from multiple videos, using still images and image sequences, face set clean up, and alignment debugging. By the end of this tutorial you will be able to create high quality face sets and prepare them for deepfaking.

I’ve already installed DeepFaceLab and assembled a variety of videos and images to use.

Step 2 – Extract Images from Video data_src

Navigate to the DeepFaceLab workspace folder and we’ll start importing some data. There are 2 default video clips which you can use for this tutorial, otherwise you can delete both data_dst.mp4 and data_src.mp4. Since most deepfakes use video clips I will start by bringing in the first of my source videos. We’ll go over still images and multiple videos in a moment.

First, we need to rename the clip to ‘data_src’ so that it can be recognized by the software. Next, navigate back to the main folder and double-click on the file labeled ‘2) extract images from video data_src’. Some of the options will provide a tooltip, accessed by entering a question mark. This will provide more information and helpful tips about the options. Here you can select the frames per second for the extraction, which allows you to extract less frames from the video. For instance, if your video is 24 frames per second, entering an FPS of 12 will extract every other frame. This may be useful for particularly long videos. Next you can select the output image type as lossless PNG or compressed Jpeg. I’ll choose PNG for higher quality since I can delete these files later. Once the extraction is done you can press any key to exit or close the window.

Now let’s return to the workspace folder and go into the data_src folder where we can deal with still images. Here you can see the individual frames that have been extracted. If you’re using still photos or an image sequence you can drop your files directly into this folder. DeepFaceLab will use the filenames in this folder to set the original filename of your face set images. If you are using multiple sources, keep them separated by appending a prefix to the filenames, such as a number or short name. You can use a Windows PowerShell script or other software to batch process file renaming. If you’re only using a single source, then you do not need to rename any files.

If you are using multiple source videos, then once you have your first set of images in order you will need to set these images aside in a separate folder and repeat the process for each additional video. If you’ve followed my instructions up to this point you should have all of your source images in separate folders, sequentially numbered, and labeled with a prefix. Again, if you only have one source video you do not need to rename or move any files.

Step 3 – Extract Images from Video data_dst FULL FPS

Optional: Cut Video

DeepFaceLab provides a simple video trimmer if you need to cut your destination or source videos. Drop a video directly onto the file labeled ‘3) cut video’, enter the start and end timecodes, specify an audio track (for example an alternate language), and a bitrate for the output file. You will find a duplicate of the video file with the suffix ‘_cut’ appended to the filename.

Now, let’s extract our destination video images. Move your destination video into the workspace folder and rename it ‘data_dst’. Once the destination video is in place run the file labeled ‘3) extract images from video data_dst FULL FPS’. Since we will be using all of the files for our destination video there is no framerate choice. Again, I will select PNG as my output format and wait for the images to be processed.

Optional: Denoise data_dst images

DeepFaceLab also provides an optional image denoiser for destination images that are particularly grainy. However, I suggest you use a dedicated image enhancement tool to scale or enhance your destination video before extraction.

Step 4 – Data_src Faceset Extract

Now we can extract the actual source face set images to be used in our deepfake. If you followed my process to extract multiple videos you can now move all the image files into the data_src folder, or you can extract each face set individually.

There are 2 ways to extract the source face set: automatic or manual mode. The automatic extractor will process all of the files without interruption, whereas the manual extractor allows you to set the face alignment for each frame using the keyboard and mouse inputs. Manual mode is not necessary for most deepfakes, but it can be used to align particularly tricky faces, such as those with heavy VFX, animated characters, and even animals.

I’m going to run the file labeled ‘4) data_src face set extract’ for the automatic mode.

First, you’ll be asked to choose a device to run the extraction, which will depend on your available hardware and software version. You can choose 1 or more similar devices. If your device is not listed or the extractor fails to run, you should diagnose the issue before proceeding.

Next, you will choose the face type. This is the first major decision to make in the deepfake process, since the face type determines what area of the face is available for training. A larger face type may allow more of the face and head to be trained, and possibly a more realistic result, while a smaller face type will cover less area but also requires less resources to train. Many of the deepfakes you are familiar use the whole face type, so I’ll type in WF.

The max number of faces from image limits the number of faces that can be extracted from a single frame. Most images will only contain 1 or 2 faces, however crowded images may contain several faces, adding to the extraction time. You may find benefit to limiting the faces, but for now I will enter 0 to allow all possible faces to be extracted.

The image size determines the actual pixel dimensions of the face set images. Larger images may allow for more clarity but will take up far more disk space and can impact training time. You can choose the image size depending on the quality of your footage. I’ll choose the default value of 512 for now.

Next, choose the Jpeg compression quality. The higher the value the less compression there will be, but also a larger file size. I’m going to use 100 for the highest quality.

Finally, you will be asked if you would like to write the debug images. These images show the face alignment landmarks and bounding boxes, providing an accurate way to pick out poorly aligned images. These files are not required but I am going to choose yes to write them anyway.

DeepFaceLab will now initialize your hardware devices and begin processing the images. Once it is complete, take note of the number of images found and faces detected. This will give you an idea of how many extra faces you might need to delete or indicate that some frames did not contain faces.

Step 4.1 – Cleaning the data_src Faceset

Now that all of our source faces have been extracted it is time to clean the face set by deleting unwanted faces, bad alignments, and duplicate images. The goal is to produce a face set that is accurately aligned, with high variety and few duplicates. Run the file labeled ‘4.1) data_src view aligned result’ which will open your face set with the XNView image browser. You can also find these files in the data_src/aligned folder. You’ll notice that the images are numbered in sequence along with a suffix containing an underscore and a number. DeepFaceLab names each file based on the original image number and the index of the face in the picture. The first face, indicated by an _0 suffix, is usually the biggest face in the image. I’ll go over a few ways to filter these images by filename and face properties.

Since our source face is likely the first or second image index, we can easily begin removing the faces of other people. In the search bar begin typing ‘_0.jpg’. This will show you the first face in all the images. Delete any unwanted faces, false detections, highly rotated or scaled faces, and extreme obstructions. Now search ‘_1.jpg’ and remove all unwanted images from the next face index, repeating the process until you reach the last index of faces. Clear the search box and look over the entire face set again for any unwanted images you may have missed.

Step 4.2 – Sorting the data_src Faceset

We can also use the sorting tool to remove more unnecessary images. Run the file labeled ‘4.2) data_src sort’. There are a number of different sorting methods to choose from. Sorting by histogram similarity will group similar images together, helping you mass delete unwanted faces and extremely similar images. Sort by pitch and sort by yaw will help you pick out bad alignments. Sorting by blur allows you to remove low quality images. These sort methods will rename the files in their new order. Running the file ‘4.2) data_src util recover original filename’ will return the files to their original name and order. The only exceptions are sort by best faces and best faces faster. These two methods will ask you to input a desired number of images, from which it will pick a variety of faces with different properties. The remainder of the images will be moved to the ‘aligned_trash’ folder. The best faces sort is not highly accurate so do not rely on it alone to create your face set.

If you chose to extract the debug images you can use those to find more bad alignments. Run the file labeled ‘4.1) data_src view aligned result’ again to open another instance of XNView, then navigate to the aligned_debug folder. Scroll through the images looking for poorly aligned faces that may have been missed in the first pass. Note the filename and remove the corresponding image from the aligned face set. Also be aware of possible double alignments or those that span multiple faces which may occur in crowded images where faces overlap. Double alignments will have the same file number with different face indexes. You can remove all but the lowest index, or delete them all if you are unsure which face is accurate.

There are a few additional face set utilities. ‘Add landmarks debug images’ will duplicate the entire face set, adding face alignment landmarks to the images. The face set enhancer uses an algorithm to enhance the quality and details on your images. Metadata save and restore will allow you to make adjustments to your images by saving the alignment data in a separate file, which you can reapply after editing. Pack and unpack will put all images into a single file for easier transport and loading. Faceset resize will let you change both the image size and face type.

Now that your source face set has been completely cleaned you should make a backup of the ‘data_src/aligned’ folder. The original videos, images, and debug data are no longer needed, so if you are satisfied with your face set you can choose to remove or archive them.

Step 5 – Data_dst Faceset Extract

Next we need to extract and clean the destination face set. There are 4 ways to extract the destination face set. We’ve already seen the automatic and manual methods. The extract + manual fix method is a combination of these, which will automatically extract the detected faces, then prompt you to manually align any frames that did not have a detectable face. The manual re-extract method allows you to selectively re-extract images, which I will cover in a moment. For now I’ll run ‘5) data_dst face set extract’ and use the same values as the source extraction.

Step 5.1 – Cleaning the data_dst Faceset

The process for cleaning the destination face set will be somewhat different from the source. We want to keep as many images as possible since any faces that do not appear in our destination face set will not be transferred to the final deepfake.

Run the file labeled ‘5.1) data_dst view aligned results’. Begin by searching through the face indexes or use sort by histogram to remove unwanted faces that do not belong to the destination. Remove any obviously bad alignments, extreme obstructions, and double face alignments.

Next we will attempt to manually re-extract the poorly aligned faces. Run ‘5.1) data_dst view aligned_debug results’, scroll through the images, and remove any with bad face alignments. It may help to open the aligned images for reference. After you’ve deleted the debug images that contain bad alignments, run the file ‘5) data_dst face set manual re-extract deleted aligned_debug.’ The extractor will load only the images that were removed from the debug folder and allow you to manually set the alignments. When you are done, close the extractor and check the result images.

5.2 – Trimming the data_src Faceset

The final step in this process is to trim the source face set to fit the range and style of the destination face set, since any extra images will only slow down the training process. The goal is to provide DeepFaceLab with a range of image information it can use to recreate our destination faces. Hopefully you’ve already made a backup of your source face set. Start by sorting both the source and destination facesets by yaw. Open both face set viewers and compare the yaw ranges. If any of the source images fall outside the destination range then you can remove those source images. If the destination range is larger than the source range then you will need to add more source material to your face set or edit your destination video. Repeat this process by sorting both facesets by pitch, comparing their ranges, and removing or adding images. We can also sort by brightness or hue to compare the color information of both facesets, adding or removing source material where necessary.