Error when TensorFlow GPU executes model training:
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
Solution:
Link: https://stackoverflow.com/questions/37313818/tensorflow-dst-tensor-is-not-initialized
The main reason is the batch_size is too large to load the memory. If the Batch_size is properly reduced, it can run normally.
By default, TF will allocate as much GPU memory as possible. By adjusting gpuconfig, it can be set to allocate memory on demand. Refer to this document and this code.
Also, During long-term model training in Jupiter notebook, this error may be caused by the failure of GPU memory to be released in time. This problem can be solved by referring to this answer. The following functions are defined:
from keras.backend import set_session from keras.backend import clear_session from keras.backend import get_session import gc # Reset Keras Session def reset_keras(): sess = get_session() clear_session() sess.close() sess = get_session() try: del classifier # this is from global space - change this as you need except: pass print(gc.collect()) # if it does something you should see a number as output # use the same config as you used to create the session config = tf.compat.v1.ConfigProto() config.gpu_options.per_process_gpu_memory_fraction = 1 config.gpu_options.visible_device_list = "0" set_session(tf.compat.v1.Session(config=config))
Called directly when GPU memory needs to be cleared reset_keras Function. For example:
dense_layers = [0, 1, 2] layer_sizes = [32, 64, 128] conv_layers = [1, 2, 3] for dense_layer in dense_layers: for layer_size in layer_sizes: for conv_layer in conv_layers: reset_keras() # training your model here
Similar Posts:
- Error in calling GPU by keras or tensorflow: blas GEMM launch failed
- [Solved] Failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
- [Solved] module ‘keras.engine.topology’ has no attribute ‘load_weights_from_hdf5_group_by_name…
- Chinese character handwriting recognition based on densenetensorflow
- [Solved] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
- AttributeError: module ‘tensorflow’ has no attribute ‘Session’
- Failed to get convolution algorithm. This is probably because cuDNN failed to initialize
- Solution to GPU memory leak problem of tensorflow operation efficiency
- Tensorflowcenter {typeerror} non hashable type: “numpy. Ndarray”
- tf.data.Dataset.from_tensor_slices: How to Use shuffle(), repeat(), batch()