[Solved] TensorFlow Error: InternalError: Failed copying input tensor

Error when TensorFlow GPU executes model training:

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

Solution:

Link: https://stackoverflow.com/questions/37313818/tensorflow-dst-tensor-is-not-initialized

The main reason is the batch_size is too large to load the memory.  If the Batch_size is properly reduced, it can run normally.

By default, TF will allocate as much GPU memory as possible. By adjusting gpuconfig, it can be set to allocate memory on demand. Refer to this document and this code.


Also, During long-term model training in Jupiter notebook, this error may be caused by the failure of GPU memory to be released in time. This problem can be solved by referring to this answer. The following functions are defined:

from keras.backend import set_session
from keras.backend import clear_session
from keras.backend import get_session
import gc

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it does something you should see a number as output

    # use the same config as you used to create the session
    config = tf.compat.v1.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tf.compat.v1.Session(config=config))

Called directly when GPU memory needs to be cleared reset_keras Function. For example:

dense_layers = [0, 1, 2]
layer_sizes = [32, 64, 128]
conv_layers = [1, 2, 3]

for dense_layer in dense_layers:
    for layer_size in layer_sizes:
        for conv_layer in conv_layers:
            reset_keras()
            # training your model here

Similar Posts: