Tag Archives: UnicodeDecodeError

[Solved] Pandas Parses MovieLens 1M Dataset Error: UnicodeDecodeError

1. Problem description

When learning from books, I find that the data set downloaded from GitHub will report an error when reading with pandas:

2. Solutions

It is obviously a coding problem. Use the file command to view the file code:

ISO-8859 the code in Python is iso-8859-1, which can be judged by the following function:

pip install chardet

def get_encoding(file):
    with open(file, 'rb') as f:
        return chardet.detect(f.read())['encoding']

Therefore, use the encoding parameter to specify the actual file format.

it’s fine too

movies = pd.read_table('movies.dat', encoding=get_encoding('movies.dat'), sep='::', header=None, names=mnames, engine='python')

[Solved] Error when Python reads a file Unicode decodeerror

python read files error: UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 205: illegal multibyte sequence

python read file warning: “UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 205: illegal multibyte sequence”

Solution 1:

FILE_OBJECT= open('data.txt','r', encoding='UTF-8')

Solution 2:

FILE_OBJECT= open('data.txt','rb')

 

[Solved] Unicode decode error in the background when the robot framework is running

Error reported in win10 environment:

Traceback (most recent call last):

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunnerplugin.py”, line 370, in OnTimer

self._test_runner.get_output_and_errors(self.get_current_profile())

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunner.py”, line 250, in get_output_and_errors

stdout, stderr, returncode = self._process.get_output(), \

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunner.py”, line 305, in get_output

return self._output_stream.pop()

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunner.py”, line 400, in pop

return result.decode(‘UTF-8’)

File “C:\Python27\lib\encodings\utf_8.py”, line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xb2 in position 5: invalid start byte

Traceback (most recent call last):

 

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunnerplugin.py”, line 370, in OnTimer

self._test_runner.get_output_and_errors(self.get_current_profile())

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunner.py”, line 250, in get_output_and_errors

stdout, stderr, returncode = self._process.get_output(), \

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunner.py”, line 305, in get_output

return self._output_stream.pop()

File “C:\Python27\lib\site-packages\robotide\contrib\testrunner\testrunner.py”, line 400, in pop

return result.decode(‘UTF-8’)

File “C:\Python27\lib\encodings\utf_8.py”, line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xb2 in position 279: invalid start byte

Solution: At line400Error reported for utf8 encoding 0xb2 decoding: invalid

We must remember to delete the testrunner.pyc file and restart ride every time after modifying the testrunner.py file, otherwise it’s the same as changing it for nothing~!

 

    def pop(self):

        result = “”

        for _in xrange(self._queue.qsize()):

            try:

                result += self._queue.get_nowait()

            except Empty:

                pass

        return result.decode(‘UTF-8’)

Change UTF-8 to GBK

Then fix the problem!