Question:
When the following statement is executed
1 def set_IPlsit():
2 url = 'https://www.whatismyip.com/'
3 response = urllib.request.urlopen(url)
4 html = response.read().decode('utf-8')
The following exception occurred:
C:\Users\54353\AppData\Local\Programs\Python\Python36\python.exe "C:/Users/54353/PycharmProjects/untitled/爬虫/图片 - 某网站.py"
Traceback (most recent call last):
File "C:/Users/54353/PycharmProjects/untitled/crawler/pic.py", line 100, in <module>
ip = set_IPlsit2()
File "C:/Users/54353/PycharmProjects/untitled/crawler/pic.py", line 95, in set_IPlsit2
response = ure.urlopen(url)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Process finished with exit code 1
analysis:
The reason for the above exception is that if you open a URL in urllib.request.urlopen mode, the server will only receive a simple request for accessing the page, but the server does not know the browser, operating system, hardware platform and other information used to send the request, and the request without such information is often abnormal access, such as crawler
In order to prevent this kind of abnormal access, some websites will verify the user agent in the request information. If the user agent is abnormal or does not exist, the request will be rejected
Solution:
Add the user agent to the request, and the code is as follows
1 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
2 req = urllib.request.Request(url=chaper_url, headers=headers)
3 urllib.request.urlopen(req).read()