Original method of error reporting:
1) Using request.request, the above error occurs .HTML cannot be crawled
from urllib import request def get_html(self, url): print(url) req = request.Request(url=url, headers={'User-Agent': random.choice(ua_list)}) res = request.urlopen(req) # html = res.read().decode() html = req.read().decode("gbk", 'ignore') with open(filename, 'w') as f: f.write(html) self.parse_html(html)
Solution:
1) Replace urllib.request with the requests library, which needs to be reinstalled.
2) I don’t know the specific reason.
import requests def get_html(self, url): print(url) req = requests.get(url=url, headers={'User-Agent': random.choice(ua_list)}) req.encoding = 'utf-8' # print(req.text) # res = request.urlopen(req) # html = res.read().decode() # print(req) # html = req.read().decode("gbk", 'ignore') # print(html) # Call the analytic function directly # filename = '123456.html' # with open(filename, 'w') as f: # f.write(html) self.parse_html(req.text)
Similar Posts:
- Differences of urllib, urllib2, httplib and httplib2 libraries in Python
- Interface automation (8): an error is reported during interface testing sslerror: Certificate verify failed
- How to Solve Python Error: “HTTP Error 403: Forbidden”
- How to Solve Python Error: crawler uses proxy anti blocking IP: http error 403: forbidden
- Python3 Use urlliburlopen error EOF occurred in violation of protocol (_ssl.c:841)
- [Solved] Python Error: UnicodeDecodeError: ‘gb2312’ codec can’t decode byte 0xa4 in position… : illegal multibyte sequence
- Python3 urlopen() TypeError: can’t convert ‘bytes’ object to str im…
- Sublime text install Emmet (Zen coding) plug in
- The solution of ‘STR’ object has no attribute ‘get’ error
- Summary of common functions of urllib.parse in Python 3