How to Solve Python Error: “HTTP Error 403: Forbidden”

Question:

When the following statement is executed

1 def set_IPlsit():
2     url = 'https://www.whatismyip.com/'
3     response = urllib.request.urlopen(url)
4     html = response.read().decode('utf-8')

The following exception occurred:

C:\Users\54353\AppData\Local\Programs\Python\Python36\python.exe "C:/Users/54353/PycharmProjects/untitled/爬虫/图片 - 某网站.py"
Traceback (most recent call last):
  File "C:/Users/54353/PycharmProjects/untitled/crawler/pic.py", line 100, in <module>
    ip = set_IPlsit2()
  File "C:/Users/54353/PycharmProjects/untitled/crawler/pic.py", line 95, in set_IPlsit2
    response = ure.urlopen(url)
  File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Users\54353\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Process finished with exit code 1

analysis:

The reason for the above exception is that if you open a URL in urllib.request.urlopen mode, the server will only receive a simple request for accessing the page, but the server does not know the browser, operating system, hardware platform and other information used to send the request, and the request without such information is often abnormal access, such as crawler

In order to prevent this kind of abnormal access, some websites will verify the user agent in the request information. If the user agent is abnormal or does not exist, the request will be rejected

Solution:

Add the user agent to the request, and the code is as follows

1 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}  
2 req = urllib.request.Request(url=chaper_url, headers=headers)  
3 urllib.request.urlopen(req).read()

Similar Posts: