Utf8mb4 is four bytes. Utf8 is three bytes. Utf8mb4 has better compatibility and takes up more space
Mainly from the two aspects of sorting accuracy and performance
Accuracy
utf8mb4_ unicode_ Ci
is based on the standard Unicode to sort and compare, and can be accurately sorted among various languages
utf8mb4_ general_ Ci
does not implement Unicode collation. When some special languages or characters are encountered, the sorting result may not be expected
Performance
utf8mb4_ general_ Ci
is faster in comparison and sorting
utf8mb4_ unicode_ Ci
in special cases, in order to deal with special characters, Unicode sort rules implement a slightly complex sort algorithm
however, in most cases, such a complex comparison will not occur . In theory, general may be faster than Unicode, but compared with the current CPU, it is far from enough to be a factor to consider the performance. Index and SQL design are the most important factors. My personal recommendation is utf8mb4_ unicode_ Ci
, it is very likely to use the default rules in 8.0 in the future. Users should pay more attention to the unification of character set and collation rules in DB than to which kind of collation to choose
Similar Posts:
- utf8mb4_general_Ci error reporting solution
- [Solved] Java collections.sort Error: Comparison method violates its general contract!
- [Solved] HiC-Pro mergeSAM.py Error: Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.
- gitfatal: I don’t handle protocol ‘https’ [How to Solve]
- One line command / usr / bin / Perl ^ m: bad interpreter
- Error reporting and resolution of Python 3 using binascii method
- The solution of job failed to start when modifying MySQL character encoding
- SyntaxError: Non-ASCII character ‘\xe2‘ in file
- [Solved] Hive Run Error: Diagnostic Messages for this Task: Error: Java heap space
- Python: json:json.decoder.JSONDecodeError: Invalid control character at: line 2 column 18 (char 19)