optimizer.zero_ Grad () means to set the gradient to zero, that is, to change the derivative of loss with respect to weight to 0

When learning python, I noticed that for each batch, most of the operations are as follows:

```
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
```

For these operations, I understand it as a gradient descent method, and paste a simple gradient descent method that I wrote before as a contrast

```
# gradient descent
weights = [0] * n
alpha = 0.0001
max_Iter = 50000
for i in range(max_Iter):
loss = 0
d_weights = [0] * n
for k in range(m):
h = dot(input[k], weights)
d_weights = [d_weights[j] + (label[k] - h) * input[k][j] for j in range(n)]
loss += (label[k] - h) * (label[k] - h)/2
d_weights = [d_weights[k]/m for k in range(n)]
weights = [weights[k] + alpha * d_weights[k] for k in range(n)]
if i%10000 == 0:
print "Iteration %d loss: %f"%(i, loss/m)
print weights
```

It can be found that they are actually one-to-one correspondence

** optimizer.zero_ Grad () corresponds to d_ weights = [0] * n**

That is to initialize the gradient to zero (because the derivative of loss of a batch with respect to weight is the sum of all the derivative of loss of sample with respect to weight)

** outputs = net (inputs) corresponds to h = dot (input [k], weights) **

That is, the predicted value can be obtained by forward propagation

** loss = criterion (outputs, labels) corresponds to loss + = (label [k] – H) * (label [k] – H)/2 **

This step is obviously to find loss (actually, I don’t think it’s OK to use this step. We can’t use the loss value in back propagation, just to let us know what the current loss is)

** loss.backward () corresponds to d_ weights = [d_ weights[j] + (label[k] – h) * input[k][j] for j in range(n)]**

That is to say, back propagation is used to find gradient optimizer.step () corresponding weights = [weights [k] + alpha * D_ weights[k] for k in range(n)]

All parameters are updated

If there is any mistake, please point out. Welcome to exchange

### Similar Posts:

- [Solved] Pytorch: loss.backward (retain_graph = true) of back propagation error
- Solutions to errors encountered by Python
- Tensorflow gradients TypeError: Fetch argument None has invalid type
- Weighted cross entropy loss function: tf.nn.weighted_cross_entropy_with_logits
- [Solved] Pytoch nn.CrossEntropyLoss Error: RuntimeError: expected scalar type Long but found Float
- InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor ‘…
- filter:progid:DXImageTransform.Microsoft.Gradient
- Name Error: name ‘yolo_head’ is not defined [How to Solve]
- [Solved] Backend Internal error: Exception during IR lowering