Tag Archives: mysql utf8mb4 unicode

What is the difference between utf8mb4 Unicode Ci and UTF8 general CI in MySQL database?

Utf8mb4 is four bytes. Utf8 is three bytes. Utf8mb4 has better compatibility and takes up more space

Mainly from the two aspects of sorting accuracy and performance

Accuracy
utf8mb4_ unicode_ Ci is based on the standard Unicode to sort and compare, and can be accurately sorted among various languages
utf8mb4_ general_ Ci does not implement Unicode collation. When some special languages or characters are encountered, the sorting result may not be expected

Performance
utf8mb4_ general_ Ci is faster in comparison and sorting
utf8mb4_ unicode_ Ci in special cases, in order to deal with special characters, Unicode sort rules implement a slightly complex sort algorithm
however, in most cases, such a complex comparison will not occur . In theory, general may be faster than Unicode, but compared with the current CPU, it is far from enough to be a factor to consider the performance. Index and SQL design are the most important factors. My personal recommendation is utf8mb4_ unicode_ Ci , it is very likely to use the default rules in 8.0 in the future. Users should pay more attention to the unification of character set and collation rules in DB than to which kind of collation to choose