Questions about the guide

Home Forums Lounge Suggestions & Feedback Questions about the guide

  • This topic has 1 reply, 2 voices, and was last updated 2 years ago by deepfakery.
Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
  • #3587

      I am totally new about the deep fake creation, i tried to follow the guide and after finishing my first work, i ‘d like to do a short recap of what i’ve done and the questions related, if any, in a particular step of the flow, hopefully i can understand better and improve, first attempt was a failure to be honest.

      Step 1: Clear Workspace & Import Data ->OK

      Step 2: Extract Source Frame Images from Video->OK

      Step 3: Extract Destination Frame Images from Video->OK

      Step 4-5: Extract Source Faceset and Step 5: Extract Destination Faceset-
      Q1: what’s the goal? as many images as possible or better to have few good images then thousand with low lights, blurry etc?
      Q2: Does it make sense to have 2/3/4/5 images that looks exactly the same or can i just use the 13. “best faces” option and take the best?
      Q3: What’s the impact of having bad images? Check 2000 images is a bit annoying and i would undestand the accuracy and focus needed, so if i can leave some bad images or not eheheh.
      Q4:From my understanding in DATA_DST we have all the faces images we need to replace in a particular video, what’s the point of clean them leaving just the best? Is the aligned folder the one the training work with and the DATA_DST folder the one the merge work with so that there is no connection and we need to clean there as much as possible?

      Step 5.3: XSeg Mask Labeling & XSeg Model Training
      Q1: Is this step mandatory and at least done via Generic WF XSeg model included with DFL or it can be avoided going directly to training?
      Q2: Is it enough, eventually to run just the generic or the entire process is more likely to be done? I am not looking for perfection.

      Step 6: Deepfake Model Training
      Q1: This is probably the most important question of the whole post: the scenario is that i have a VIDEO A in which i’d like to add a person with images extracted from VIDEO B.
      a) Can i use pretrained dataset first? Or is better that i pretrain with the DATA_SRC images?
      b)It’s the best option to do so? How it would work then? Start with the pre-trained dataset selecting in the option Y and then after pre-trained, train the model selecting N on the pre-training option? In that case the N means that the training will be done with the faces in the DATA_SRC folder?
      c)what is the different with pre-training and training in the end?
      Q2:Is it normal that the system reebot happens while training? situation become better with place models and optimizer on GPU but still occurs sometimes.
      Q3: Is GAN a must?
      Q4: After using a model, trained with the image of a person, when i’d like to have the same person in another video, i can use that model , is it correct? So let’s say i’d like to use my face i can create a model with multiple videos creating a pre-training, then load it to each new scene. Then use that model to train for the new video?
      Q3 What’s the desidered value to say i am done with training? In this first attempt i got 0.2 and 0.2 after 120.000 iterations

      Step 7: Merge Deepfake Model to Frame Images
      Q1: Am i supposed to fix all faces in all frames?
      Q2: is there a way to let the machine do this automatically or jump at certain point of the video?
      Q3: In case i’ve pre-trained the model with a downloaded dataset, how to use the DATA_SRC images now? as latest i have the same pre-trained dataset.

      Step 8: Merge Frame Images to Video->OK

      A complete disaster. I had a far better result using deepswap ai tools using a single image as SRC, that’s frustrating honestly.
      In the finale video, the face that should swap the other, come and goes often and moreover it seems like i’ve put a mask in adobe after effect while in the tools mentioned the face was naturally moving lips, eyes and having expressions, again with just one image as source. how is this possible? what am i doing wrong?
      Is it possible that i’ve used a too difficult video to swap with?
      Is there a place in which i can find video “deep fake likely” to use them like “templates”? i mean there are videos that should be easier to fake with.

      I’ve tried to answer myself about the above questions but i did not find answers. I’ve tried also to find courses also paid to learn but nothing.
      So you are my best chance and i’ll be glad to help once i’ll master the art.

      Thank you guys.


        Step 4-5: Extract Source Faceset and Step 5: Extract Destination Faceset-
        Q1: You want to have a good variety with fewer images. Try to choose images that match the conditions of the destination video.
        Q2: You should remove as many duplicates as possible. Best faces is not good on its own, better to go through the images using different sort methods, or another tool like Machine Video Editor.
        Q3: A few bad images won’t hurt and may add some unintended variety to the training. Unless resolution is absolutely critical you shouldn’t worry about it.
        Q4: The images in the aligned folder will be trained so those need to be cleaned up. Remove unwanted faces and use something like MVE to fix bad alignments. The dst frame images are only used in the extract and merge processes.

        Step 5.3: XSeg Mask Labeling & XSeg Model Training
        Q1: XSeg is not mandatory because the faces have a default mask. However in order to get the face proportions correct, and a better likeness, the mask needs to be fit to the actual faces. The main problem with the default mask is that the dst features (especially the forehead) will start to take over, and will crush the source face features. For WF and Head models it is HIGHLY recommended.
        Q2: Generic is ok for easier videos. If you have fast movement, lots of shadows, obstructions, etc I would do a custom XSeg.

        Step 6: Deepfake Model Training
        Q1: For pretraining its better to have a variety of faces, but you could try pretraining on the source. You could just jump right into normal training. Pretrain images need to be packed using the utility, then placed in _internal/pretrain_faces. Yes, start with it on (Y) at the beginning, then turn it off (N) and don’t turn it back on. The pretrainer is more or less a specific set of settings and the pretrain faceset, AFAIK…
        Q2: This might be due to the page file being too small. I believe there’s a link to info in the guide. Could also be a power issue. Try increasing Windows page file and if that doesn’t work maybe lower the model batch size.
        Q3: GAN is tricky. I tend to avoid it after a few bad experiences. You can try it, just have to wait a while to see the results. Looks like crap at first.
        Q4: Yeah you can reuse the model for another video. I would suggest going back to the warp phase for a while so the model can adapt to the new DST face. You might also delete one or both of the files labeled *_inter_AB and *_inter_B. These files (AFAIK ) basically hold the “intermediary” info that maps the SRC to DST or something like that. These 2 files are often deleted when reusing a model on a completely new SRC/DST, so try playing with those, after making a backup of course.
        Q5: There’s not a set value because because the learning rate will be affected by whatever options you might have enabled at the time. You really do have to just look at it. Having said that, I try to go to below 0.1 under “normal” settings. But, for instance, you might decide to turn on color transfer at the end, in which case the learning rate would jump back up, but you might like the resulting look. The program has no idea what “looks good”.

        Step 7: Merge Deepfake Model to Frame Images
        Q1: Not sure what you mean here…
        Q2: Yes, you can just set the first frame, then apply to the remaining frames (Shift + /) then process them (Shift + >). There’s no convenient way to skip around. I’ve been thinking of coding my own keyframe solution but I just don’t have the time or advanced python skills right now.
        Q3: Again, not sure…

        Start with a short video of just someone talking, like a dialogue clip from a movie. Something where the character is looking off the the side of the camera at another person. Not much movement, normal lighting, nothing in front of their face. This is probably the easiest angle to deepfake. Pick a similar looking face to swap as the source.

      Viewing 2 posts - 1 through 2 (of 2 total)
      • You must be logged in to reply to this topic.