Tag Archives: Forbidden

Python crawling picture prompt urllib.error.httperror: http error 403: forbidden solution

Recently a Traceback (most recent call last) was thrown when using python to crawl to the girl’s picture:
File “meinv.py”, line 108, in <module>
main()
File “meinv.py”, line 104, in main
imghandle.run()
File “meinv.py”, line 96, in run
self.handle_data(response)
File “meinv.py”, line 84, in handle_data
self.handle_imgdata(res, iname)
File “meinv.py”, line 65, in handle_imgdata
urllib.request.urlretrieve(imgurl, filepath)
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 188, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 163, in urlopen
return opener.open(url, data, timeout)
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 472, in open
response = meth(req, response)
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 582, in http_response
‘http’, request, response, code, msg, hdrs)
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 510, in error
return self._call_chain(*args)
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 444, in _call_chain
result = func(*args)
File “D:\allkitinstall\python3.5.3\lib\urllib\request.py”, line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden error

Let’s start by posting the code.

This is the function that constructs the request object

Here is the code that sends the request and downloads the image

Checked a lot of information on the Internet, said that the lack of user-agent header, but I have obviously added, tried many times or did not work, just when I was about to give up, in the packet capture tool suddenly found a picture request header in the header information Referer, which suddenly remembered

Many sites will set the Referer header to do anti-theft chain function, to prevent the site was malicious requests, so I was happy to add this request header

Try again

Done!