Why can’t you grab tickets when you travel on holiday?Reveal the key technology of 12306 how to ensure the ticket is not oversold>>>
There is a requirement that the database has a table with nearly thousands of URL records, and each record is an image. I need to ask them to get each image and save it locally. In the beginning, I wrote this (pseudo code)
import requests
for url in urls:
try:
r = requests.get(url).content
save_ image(r)
except Exception, e:
print str(e)
However, when running on the server, you will find that every other request will report an error similar to the following:
HTTPConnectionPool(host='wx.qlogo.cn', port=80):
Max retries exceeded with url: /mmopen/aTVWntpJLCAr2pichIUx8XMevb3SEbktTuLkxJLHWVTwGfkprKZ7rkEYDrKRr5icyDGIvU4iasoyRrqsffbe3UUQXT5EfMEbYKg/0 (
Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)
The reason, probably because I frequently request, the server closed the department request connection
import requests
for url in urls:
for i in range(10):
try:
r = requests.get(url).content
except Exception, e:
if i >= 9:
do_ some_ log()
else:
time.sleep(0.5)
else:
time.sleep(0.1)
break
save_ image(r)
The code is very simple, but it can illustrate the general solution. Increasing the delay between each request can reduce most of the requests rejected
However, there are still some requests that have been rejected, so after those requests have been rejected, a retrial is initiated
After being rejected for 10 times, he was willing to give up (recorded in the log)
In the actual request, with a delay of 0.1s, the rejection rate is much less
The maximum number of rejected retries is 3, and all the pictures are successfully removed