Tag Archives: PyTorch

[Solved] Pytorch: loss.backward (retain_graph = true) of back propagation error

The backpropagation method in RNN and LSTM models, the problem at loss.backward()
The problem tends to occur after updating the pytorch version.
Problem 1:Error with loss.backward()

Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
(torchenv) star@lab407-1:~/POIRec/STPRec/Flashback_code-master$ python train.py

Prolem 2: Use loss.backward(retain_graph=True)

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Solution.
Some pitfalls about loss.backward() and its argumenttain_graph
First of all, the function loss.backward() is very simple, is to calculate the gradient of the current tensor associated with the leaf nodes in the graph
To use it, you can of course use it directly as follows

optimizer.zero_grad() clearing the past gradients.
loss.backward() reverse propagation, calculating the current gradient.
optimizer.step() updates the network parameters according to the gradient

or this case
for i in range(num):
loss+=Loss(input,target)
optimizer.zero_grad() clears the past gradients.
loss.backward() back-propagate and compute the current gradient.
optimizer.step() update the network parameters according to the gradient

However, sometimes, such errors occur: runtimeerror: trying to backward through the graph a second time, but the buffers have already been free

This error means that the mechanism of pytoch is that every time. Backward() is called, all buffers will be free. There may be multiple backward() in the model, and the gradient stored in the buffer in the previous backward() will be free because of the subsequent call to backward(). Therefore, here is retain_Graph = true, using this parameter, you can save the gradient of the previous backward() in the buffer until the update is completed. Note that if you write this:

optimizer.zero_grad() clearing the past gradients.
loss1.backward(retain_graph=True) backward propagation, calculating the current gradient.
loss2.backward(retain_graph=True) backward propagation, calculating the current gradient.
optimizer.step() updates the network parameters according to the gradient

Then you may have memory overflow, and each iteration will be slower than the previous one, and slower and slower later (because your gradients are saved and there is no free)
the solution is, of course:

optimizer.zero_grad() clearing the past gradients.
loss1.backward(retain_graph=True) backward propagation, calculating the current gradient.
loss2.backward() backpropagate and compute the current gradient.
optimizer.step() updates the network parameters according to the gradient

That is: do not add retain to the last backward()_Graph parameter, so that the occupied memory will be released after each update, so that it will not become slower and slower.

Someone here will ask, I don’t have so much loss, how can such a mistake happen? There may be a problem with the model you use. Such problems occur in both LSTM and Gru. The problem exists with the hidden unit, which also participates in backpropagation, resulting in multiple backward(),
in fact, I don’t understand why there are multiple backward()? Is it true that my LSTM network is n to N, that is, input n and output n, then calculate loss with n labels, and then send it back? Here, you can think about BPTT, that is, if it is n to 1, then gradient update requires all inputs of the time series and hidden variables to calculate the gradient, and then pass it forward from the last one, so there is only one backward(), In both N to N and N to m, multiple losses need to be backwarded(). If they continue to propagate in two directions (one from output to input and the other along time), there will be overlapping parts. Therefore, the solution is very clear. Use the detach() function to cut off the overlapping backpropagation, (here is only my personal understanding. If there is any error, please comment and point it out and discuss it together.) there are three ways to cut off, as follows:

hidden.detach_()
hidden = hidden.detach()
hidden = Variable(hidden.data, requires_grad=True)

 

Error in loading pre training weight in torch. Load() in pytorch

I trained a model with GPU. I want to make an error when loading the test locally (only CPU):

    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

According to the prompt, add to torch. Load()  map_ Location = torch. Device ('cpu ') parameter

 

[Solved] PyTorch error: TypeError: ‘builtin_function_or_method‘ object is unsubscriptable

[this paper records two solutions for error reporting at the same time]

pytorch error: runtimeerror: Boolean value of tensor with more than one value is ambiguous

When writing Python code, if you want to view a dimension of a tenor, you use. Shape at the beginning

if test_image.shape[1] == 3:
    ......

Resulting error :

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

This is because tensor can’t use .shape, but should use .size

So I wrote it as .size,

if test_image.size[1] == 3:
    ......

Resulting error reported.

TypeError: ‘builtin_function_or_method’ object is unsubscriptable

This is because the brackets [ ] are written incorrectly and should be used ():

if test_image.size(1) == 3:
  ......

Solve the problem

Solutions to errors encountered by Python

Explain the function of static keyword and final keyword in Java in detail>>>

Python run error: runtimeerror: cudnn error: cudnn_ STATUS_ INTERNAL_ ERROR

Solution:

Add:

torch.cuda.set_device(0)

Nan solution for training RNN network loss

(1) The cause of gradient explosion can be solved by gradient ruling

GRAD_CLIP = 5
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), GRAD_CLIP)
optimizer.step()

(2) Testmodel and evaluate

with torch.no_grad():

(3) Lower the learning rate

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 ‘self’ in call to _ th_ addmm

There are three places in the code that need CUDA () conversion:

Whether the model is placed on CUDA model = model. To (device)

Whether the input data is put on CUDA data = data. To (device)

Whether the new tensor in the model is placed on CUDA P = torch. Tensor ([1]). To (device) 0

In the first article, model = model. To (device) is only instantiated in model__ init__() If instantiated in forward and used directly, the model will not be placed in CUDA

Here is an error code:

import torch
import torch.nn as nn


data = torch.rand(1, 10).cuda()


class TestMoule(nn.Module):
    def __init__(self):
        super(TestMoule, self).__init__()
        # self.linear = torch.nn.Linear(10, 2)

    def forward(self, x):
        # return self.linear(x)
        return torch.nn.Linear(10, 2)(x)


model = TestMoule()
model = model.cuda()

print(model(data))

RuntimeError: CUDA error: an illegal memory access was encountered

One of the above problems is that some functions under the NN module pass in GPU type data, with the following error code:

import torch

data = torch.randn(1, 10).cuda()

layernorm = torch.nn.LayerNorm(10)
# layernorm = torch.nn.LayerNorm(10).cuda()

re_data = layernorm(data)
print(re_data)

RuntimeError: CUDA error: device-side assert triggered

The category target of classification is not one-to-one corresponding to the softmax value of model output

Targets is a value of 1-3, but softmax calculates a value of 0-2, so the above error is prompted

df = pd.read_csv('data/reviews.csv')

def to_sentiment(score):
    score = int(score)
    if score <= 2:
        return 0
    elif score == 3:
        return 1
    else:
        return 2

df['sentiment'] = df.score.apply(to_sentiment)

[Solved] pytorchImportError: numpy.core.multiarray failed to import

Project scenarios

Recently, I’m learning Python . First, I installed the following with the CONDA command provided on the official website:

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

Then test whether the installation is successful and whether it can be used normally according to the method of the official website

import torch
x = torch.rand(5, 3)
print(x)

If the value of x can be printed normally, it means that it can be used normally, otherwise it is the opposite

Problem description

Error in code operation: importerror: numpy.core.multiarray failed to import

D:\miniconda3\envs\pytorch\lib\site-packages\numpy\__init__.py:138: UserWarning: mkl-service package failed to import, therefore Intel(R) MKL 
initialization ensuring its correct out-of-the box operation under condition when Gnu OpenMP had already been loaded by Python process is not 
  from . import _distributor_init
Traceback (most recent call last):
  File "c:/Users/ghgxj/Desktop/pytorch/1_get_started.py", line 1, in <module>
    import torch
  File "D:\miniconda3\envs\pytorch\lib\site-packages\torch\__init__.py", line 189, in <module>
    from torch._C import *
ImportError: numpy.core.multiarray failed to import

Cause analysis

The Python environment comes with a numpy , which will conflict with the numpy version installed with Python

Solutions

Uninstall numpy first

conda uninstall numpy

reinstall numpy

conda install numpy

The final result

According to the above scheme, the values of x are printed normally

tensor([[0.8338, 0.1541, 0.0379],
        [0.4348, 0.0145, 0.3586],
        [0.4098, 0.2363, 0.5405],
        [0.7372, 0.7418, 0.3703],
        [0.5668, 0.9512, 0.8041]])

Let’s see if GPU driver and CUDA can be used:

import torch
print(torch.cuda.is_available())

If the console print shows true , it means it can be used normally

 

Pytorch: How to Use pack_padded_sequence & pad_packed_sequence

pack_ padded_ Sequence is to record the word of each sentence according to the principle of batch first, and change it into a tensor of indefinite length, which is convenient to calculate the loss function

pad_ packed_ Sequence is to add a pack_ padded_ The structure generated by sequence is transformed into the original structure, which is a constant length tensor

The content of test.txt

As they sat in a nice coffee shop, 
he was too nervous to say anything and she felt uncomfortable. 
Suddenly, he asked the waiter, 
"Could you please give me some salt?I'd like to put it in my coffee."

See the following code for details

import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import wordfreq

vocab = {}
token_id = 1
lengths = []

#Read files and generate dictionaries
with open('test.txt', 'r') as f:
    lines=f.readlines()
    for line in lines:
        tokens = wordfreq.tokenize(line.strip(), 'en')
        lengths.append(len(tokens))
        #Add each word to the vocab and save the corresponding index at the same time
        for word in tokens:
            if word not in vocab:
                vocab[word] = token_id
                token_id += 1

x = np.zeros((len(lengths), max(lengths)))
l_no = 0
#Converting words to numbers
with open('test.txt', 'r') as f:
    lines = f.readlines()
    for line in lines:
        tokens = wordfreq.tokenize(line.strip(), 'en')
        for i in range(len(tokens)):
            x[l_no, i] = vocab[tokens[i]]
        l_no += 1

x=torch.Tensor(x)
x = Variable(x)
print(x)
'''
tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.,  0.,  0.,  0.],
        [20.,  9., 21., 22., 23.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [24., 25., 26., 27., 28., 29., 30., 31., 32., 13., 33., 34.,  4.,  7.]])
'''
lengths = torch.Tensor(lengths)
print(lengths)#tensor([ 8., 11.,  5., 14.])

_, idx_sort = torch.sort(torch.Tensor(lengths), dim=0, descending=True)
print(_) #tensor([14., 11.,  8.,  5.])
print(idx_sort)#tensor([3, 1, 0, 2])

lengths = list(lengths[idx_sort])#Fetch elements by subscript [tensor(14.), tensor(11.), tensor(8.), tensor(5.)]
t = x.index_select(0, idx_sort)#Fetch elements by subscript
print(t)
'''
tensor([[24., 25., 26., 27., 28., 29., 30., 31., 32., 13., 33., 34.,  4.,  7.],
        [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.,  0.,  0.,  0.],
        [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  0.,  0.,  0.,  0.,  0.,  0.],
        [20.,  9., 21., 22., 23.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
'''
x_packed = nn.utils.rnn.pack_padded_sequence(input=t, lengths=lengths, batch_first=True)
print(x_packed)
'''
PackedSequence(data=tensor([24.,  9.,  1., 20., 25., 10.,  2.,  9., 26., 11.,  3., 21., 27., 12.,
         4., 22., 28., 13.,  5., 23., 29., 14.,  6., 30., 15.,  7., 31., 16.,
         8., 32., 17., 13., 18., 33., 19., 34.,  4.,  7.]), batch_sizes=tensor([4, 4, 4, 4, 4, 3, 3, 3, 2, 2, 2, 1, 1, 1]))
'''


x_padded = nn.utils.rnn.pad_packed_sequence(x_packed, batch_first=True)#x_padded是tuple
print(x_padded)
'''
(tensor([[24., 25., 26., 27., 28., 29., 30., 31., 32., 13., 33., 34.,  4.,  7.],
        [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.,  0.,  0.,  0.],
        [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  0.,  0.,  0.,  0.,  0.,  0.],
        [20.,  9., 21., 22., 23.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]]), tensor([14, 11,  8,  5]))
'''
_, idx_unsort = torch.sort(idx_sort)
output = x_padded[0].index_select(0, idx_unsort)
print(output)
'''
tensor([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.,  0.,  0.,  0.],
        [20.,  9., 21., 22., 23.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
        [24., 25., 26., 27., 28., 29., 30., 31., 32., 13., 33., 34.,  4.,  7.]])
'''

PyTorch :TypeError: exceptions must derive from BaseException

pytorch error: typeerror: exceptions must derive from baseexception

In fact, it’s a low-level mistake. I think it’s because I didn’t find the carrier to run. Take your own code as an example:

I’m in base_ The parameters of netg can only be selected from options.py

self.parser.add_argument('--netG', type=str, default='p2hed', choices=['p2hed', 'refineD', 'p2hed_att'], help='selects model to use for netG')

But when I choose netg, the code is as follows:

def define_G(input_nc, output_nc, ngf, netG, n_downsample_global=3, n_blocks_global=9, n_local_enhancers=1, 
             n_blocks_local=3, norm='instance', gpu_ids=[]):    
    norm_layer = get_norm_layer(norm_type=norm)     
    if netG == 'p2hed':    
        netG = DDNet_p2hED(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, norm_layer)
    elif netG == 'refineDepth':
        netG = DDNet_RefineDepth(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    elif netG == 'p2h_noatt':        
        netG = DDNet_p2hed_noatt(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    else:
        raise('generator not implemented!')
    #print(netG)
    if len(gpu_ids) > 0:
        assert(torch.cuda.is_available())   
        netG.cuda(gpu_ids[0])
    netG.apply(weights_init)
    return netG

Note that there is no “rfined” option here, so when running the code, the program can’t find the netg which network to choose, so an error is reported

In fact, just change the “elif netg = =’refined depth ‘:” in the above code to “elif netg = =’refined’:”

Python: why use it optimizer.zero_ grad()

optimizer.zero_ Grad () means to set the gradient to zero, that is, to change the derivative of loss with respect to weight to 0

When learning python, I noticed that for each batch, most of the operations are as follows:

# zero the parameter gradients        
optimizer.zero_grad()        
# forward + backward + optimize        
outputs = net(inputs)        
loss = criterion(outputs, labels)        
loss.backward()        
optimizer.step()

For these operations, I understand it as a gradient descent method, and paste a simple gradient descent method that I wrote before as a contrast

    # gradient descent
    weights = [0] * n
    alpha = 0.0001
    max_Iter = 50000
    for i in range(max_Iter):
        loss = 0
        d_weights = [0] * n
        for k in range(m):
            h = dot(input[k], weights)
            d_weights = [d_weights[j] + (label[k] - h) * input[k][j] for j in range(n)] 
            loss += (label[k] - h) * (label[k] - h)/2
        d_weights = [d_weights[k]/m for k in range(n)]
        weights = [weights[k] + alpha * d_weights[k] for k in range(n)]
        if i%10000 == 0:
            print "Iteration %d loss: %f"%(i, loss/m)
            print weights

It can be found that they are actually one-to-one correspondence

optimizer.zero_ Grad () corresponds to d_ weights = [0] * n

That is to initialize the gradient to zero (because the derivative of loss of a batch with respect to weight is the sum of all the derivative of loss of sample with respect to weight)

outputs = net (inputs) corresponds to h = dot (input [k], weights)

That is, the predicted value can be obtained by forward propagation

loss = criterion (outputs, labels) corresponds to loss + = (label [k] – H) * (label [k] – H)/2

This step is obviously to find loss (actually, I don’t think it’s OK to use this step. We can’t use the loss value in back propagation, just to let us know what the current loss is)
loss.backward () corresponds to d_ weights = [d_ weights[j] + (label[k] – h) * input[k][j] for j in range(n)]

That is to say, back propagation is used to find gradient optimizer.step () corresponding weights = [weights [k] + alpha * D_ weights[k] for k in range(n)]

All parameters are updated

If there is any mistake, please point out. Welcome to exchange