Question:
When the following statement is executed
1 def set_IPlsit():
2 url = 'https://www.whatismyip.com/'
3 response = urllib.request.urlopen(url)
4 html = response.read().decode('utf-8')
The following exception occurred:
C:\Users\54353\AppData\Local\Programs\Python\Python36\python.exe "C:/Users/54353/PycharmProjects/untitled/爬虫/图片 - 某网站.py"
Traceback (most recent call last):
File "C:/Users/54353/PycharmProjects/untitled/crawler/pic.py", line 100, in <module>
ip = set_IPlsit2()
File "C:/Users/54353/PycharmProjects/untitled/crawler/pic.py", line 95, in set_IPlsit2
response = ure.urlopen(url)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Process finished with exit code 1
analysis:
The reason for the above exception is that if you open a URL in urllib.request.urlopen mode, the server will only receive a simple request for accessing the page, but the server does not know the browser, operating system, hardware platform and other information used to send the request, and the request without such information is often abnormal access, such as crawler
In order to prevent this kind of abnormal access, some websites will verify the user agent in the request information. If the user agent is abnormal or does not exist, the request will be rejected
Solution:
Add the user agent to the request, and the code is as follows
1 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
2 req = urllib.request.Request(url=chaper_url, headers=headers)
3 urllib.request.urlopen(req).read()
Similar Posts:
- Python crawling picture prompt urllib.error.httperror: http error 403: forbidden solution
- Differences of urllib, urllib2, httplib and httplib2 libraries in Python
- Python: How to Solve raise JSONDecodeError(“Expecting value”, s, err.value) from None json.decoder…
- No module named ‘urllib.request’; ‘urllib’ is not a package
- [Solved] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
- How to Solve Python Error: crawler uses proxy anti blocking IP: http error 403: forbidden
- You-get Warning urllib.error.URLError:
- Python3 Use urlliburlopen error EOF occurred in violation of protocol (_ssl.c:841)
- [Solved] Python 3.8 Install pyaudio Error: pip3 install pyaudio error: Microsoft Visual C++ 14.0 or greater is required.
- Pycharm cannot import the third-party library no module named urllib3