1. Error reporting scenario
An error is reported when using Python: “Unicode decodeerror: ‘GB2312’ codec can’t decode byte 0xa4 in position…: illegal multibyte sequence” generally has the following two scenarios:
1. Crawl Chinese website content
html = requests.get(url).decode("gb2312")
2. Read GBK encoded files
result = open(filename, 'r', encoding='gb2312')
2. Error reporting reason
The Chinese character set included in GB2312 is not comprehensive enough, and decoding errors will occur when encountering traditional characters.
Chinese character set range GB2312 < gbk < gb18030
3. Error reporting solution
1. Ignore decoding errors
html = requests.get(url).decode('gb2312'，errors = 'ignore')
The default parameter of the decode function is strict. Decode ([encoding], [errors =’strict ‘]). You can use the second parameter to control the error handling strategy. Strict means that an exception is thrown when an illegal character is encountered
if it is set to ignore, illegal characters will be ignored
if it is set to replace, it will be used? Replace illegal characters
2. Replace gbk2312 with GBK with a more comprehensive Chinese character set
result = open(filename, 'r', encoding='gbk')
Note: if ‘ignore’ is used to ignore illegal characters and report errors, the read Chinese will be garbled. If you want to read accurate Chinese content, you can first convert the content encoded in GB2312 to UTF-8 and then read it.
res = requests.get(url) res = decode(res, "gb2312").encode("utf8") res.encoding = 'utf-8' html = res.text print(html)
At this time, Chinese characters can be output normally.
- [Solved] HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.
- TypeError: the JSON object must be str, not ‘bytes’
- Error reporting and resolution of Python 3 using binascii method
- [Solved] Python error: UnicodeDecodeError:’gbk’ codec can’t decode byte 0xb0 in position 166: illegal multibyte sequence
- [Solved] Java compilation error: unmapped character encoding GBK
- Transcoding of system.web.httputility.urlencode in C #
- [Solved] JSON check syntax error: json.decoder.JSONDecodeError: Invalid control character at: line 1 column
- [Solved] UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 128: illegal multibyte sequence
- [Solved] Typeerror: incorrect padding occurred in python3 Base64 decoding
- Python JSON error json.decoder.jsondecodeerror Chinese