Training not Starting? - DeepfakeVFX.com

This topic has 2 replies, 2 voices, and was last updated 1 year, 9 months ago by defalafa.

Viewing 3 posts - 1 through 3 (of 3 total)

Author

Posts
September 24, 2023 at 3:36 pm #9016
seishiruo
Participant
Just started out using dfl and decided to just follow a quick tutorial, set up my dst, downloaded a model and all that. when it came to train SAEHD I seem to be stuck at Starting

Running trainer.

Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete

[0] : DF-UD256 – latest
: 0
0
Loading DF-UD256_SAEHD model…

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU
[0] : NVIDIA GeForce RTX 3060 Ti

[0] Which GPU indexes to choose? : 0
0

Press enter in 2 seconds to override model settings.
[0] Autobackup every N hour ( 0..24 ?:help ) : 0
0
[n] Write preview history ( y/n ?:help ) : n
[0] Target iteration : 0
0
[n] Flip SRC faces randomly ( y/n ?:help ) : y
[y] Flip DST faces randomly ( y/n ?:help ) : n
[8] Batch_size ( ?:help ) : 3
3
[n] Eyes and mouth priority ( y/n ?:help ) : y
[n] Uniform yaw distribution of samples ( y/n ?:help ) : n
[n] Blur out mask ( y/n ?:help ) : n
[y] Place models and optimizer on GPU ( y/n ?:help ) : y
[y] Use AdaBelief optimizer? ( y/n ?:help ) : y
[n] Use learning rate dropout ( n/y/cpu ?:help ) : n
n
[y] Enable random warp of samples ( y/n ?:help ) : y
[0.0] Random hue/saturation/light intensity ( 0.0 .. 0.3 ?:help ) :
0.0
[0.0] GAN power ( 0.0 .. 5.0 ?:help ) :
0.0
[0.0] ‘True face’ power. ( 0.0000 .. 1.0 ?:help ) :
0.0
[0.0] Face style power ( 0.0..100.0 ?:help ) :
0.0
[0.0] Background style power ( 0.0..100.0 ?:help ) :
0.0
[none] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) :
none
[n] Enable gradient clipping ( y/n ?:help ) :
n
[y] Enable pretraining mode ( y/n ?:help ) : n
Initializing models: 100%|###############################################################| 5/5 [00:03<00:00, 1.53it/s]
Loading samples: 100%|############################################################| 8756/8756 [00:49<00:00, 176.42it/s]
Loading samples: 100%|############################################################| 1619/1619 [00:04<00:00, 342.22it/s]
==================== Model Summary ====================
== ==
== Model name: DF-UD256_SAEHD ==
== ==
== Current iteration: 0 ==
== ==
==—————— Model Options ——————==
== ==
== resolution: 256 ==
== face_type: f ==
== models_opt_on_gpu: True ==
== archi: df-ud ==
== ae_dims: 448 ==
== e_dims: 112 ==
== d_dims: 112 ==
== d_mask_dims: 22 ==
== masked_training: True ==
== uniform_yaw: False ==
== lr_dropout: n ==
== random_warp: True ==
== gan_power: 0.0 ==
== true_face_power: 0.0 ==
== face_style_power: 0.0 ==
== bg_style_power: 0.0 ==
== ct_mode: none ==
== clipgrad: False ==
== pretrain: False ==
== autobackup_hour: 0 ==
== write_preview_history: False ==
== target_iter: 0 ==
== random_flip: False ==
== batch_size: 3 ==
== eyes_mouth_prio: True ==
== blur_out_mask: False ==
== adabelief: True ==
== random_hsv_power: 0.0 ==
== random_src_flip: True ==
== random_dst_flip: False ==
== gan_patch_size: 32 ==
== gan_dims: 16 ==
== ==
==——————- Running On ——————–==
== ==
== Device index: 0 ==
== Name: NVIDIA GeForce RTX 3060 Ti ==
== VRAM: 5.35GB ==
== ==
=======================================================
Starting. Press “Enter” to stop training and save model.

Trying to do the first iteration. If an error occurs, reduce the model parameters.

!!!
Windows 10 users IMPORTANT notice. You should set this setting in order to work correctly.

View post on imgur.com

!!!

Nothing seems to be happening afterwards, what I saw is that there’s suppose to be a training preview window? nothing popped up. am I doing something wrong?
September 24, 2023 at 4:17 pm #9017
seishiruo
Participant
Leaving it after several minutes I get this.

Error: 2 root error(s) found.
(0) Resource exhausted: failed to allocate memory
[[node mul_81 (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

[[concat_4/concat/_463]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

(1) Resource exhausted: failed to allocate memory
[[node mul_81 (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node mul_81:
src_dst_opt/vs_inter/dense1/weight_0/read (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)

Input Source operations connected to node mul_81:
src_dst_opt/vs_inter/dense1/weight_0/read (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)

Original stack trace for ‘mul_81’:
File “threading.py”, line 884, in _bootstrap
File “threading.py”, line 916, in _bootstrap_inner
File “threading.py”, line 864, in run
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py”, line 58, in trainerThread
debug=debug)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py”, line 193, in __init__
self.on_initialize()
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 564, in on_initialize
src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs))
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py”, line 64, in get_update_op
v_t = self.beta_2*vs + (1.0-self.beta_2) * tf.square(g-m_t)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py”, line 1076, in _run_op
return tensor_oper(a.value(), *args, **kwargs)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py”, line 1400, in r_binary_op_wrapper
return func(x, y, name=name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py”, line 1710, in _mul_dispatch
return multiply(x, y, name=name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py”, line 206, in wrapper
return target(*args, **kwargs)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py”, line 530, in multiply
return gen_math_ops.mul(x, y, name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py”, line 6245, in mul
“Mul”, x=x, y=y, name=name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 3569, in _create_op_internal
op_def=op_def)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1375, in _do_call
return fn(*args)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1360, in _run_fn
target_list, run_metadata)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: failed to allocate memory
[[{{node mul_81}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

[[concat_4/concat/_463]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

(1) Resource exhausted: failed to allocate memory
[[{{node mul_81}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py”, line 129, in trainerThread
iter, iter_time = model.train_one_iter()
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py”, line 474, in train_one_iter
losses = self.onTrainOneIter()
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 774, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 584, in src_dst_train
self.target_dstm_em:target_dstm_em,
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 968, in run
run_metadata_ptr)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1191, in _run
feed_dict_tensor, options, run_metadata)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1369, in _do_run
run_metadata)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1394, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: failed to allocate memory
[[node mul_81 (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

[[concat_4/concat/_463]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

(1) Resource exhausted: failed to allocate memory
[[node mul_81 (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:64) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node mul_81:
src_dst_opt/vs_inter/dense1/weight_0/read (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)

Input Source operations connected to node mul_81:
src_dst_opt/vs_inter/dense1/weight_0/read (defined at C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38)

Original stack trace for ‘mul_81’:
File “threading.py”, line 884, in _bootstrap
File “threading.py”, line 916, in _bootstrap_inner
File “threading.py”, line 864, in run
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py”, line 58, in trainerThread
debug=debug)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py”, line 193, in __init__
self.on_initialize()
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 564, in on_initialize
src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs))
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py”, line 64, in get_update_op
v_t = self.beta_2*vs + (1.0-self.beta_2) * tf.square(g-m_t)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py”, line 1076, in _run_op
return tensor_oper(a.value(), *args, **kwargs)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py”, line 1400, in r_binary_op_wrapper
return func(x, y, name=name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py”, line 1710, in _mul_dispatch
return multiply(x, y, name=name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py”, line 206, in wrapper
return target(*args, **kwargs)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py”, line 530, in multiply
return gen_math_ops.mul(x, y, name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py”, line 6245, in mul
“Mul”, x=x, y=y, name=name)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 3569, in _create_op_internal
op_def=op_def)
File “C:\Users\Ferros\Desktop\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)
October 1, 2023 at 2:28 pm #9038
defalafa
Participant
dims are way to highin this model , try lower settings there

== ae_dims: 448== 256
== e_dims: 112 == 70
== d_dims: 112 == 70
Author

Posts

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.