-
Notifications
You must be signed in to change notification settings - Fork 6
Correct UTF 8 Encoding
Thomas Shone edited this page Oct 4, 2021
·
2 revisions
Design
UTF-8 (utf8
) encoding in MySQL supports up to 3 bytes per character (or utf8mb3
) which limits coverage to characters in the Basic Multilingual Plane (BMP).
This means if you try insert a string that contains any character that needs 4 bytes of storage (emojis, musical notation, mathematical notations, cuneiform and others) it will drop that character.
Ensure you create the columns you want to store UTF-8 characters in as CHARACTER SET utf8mb4
.
CREATE TABLE tab1 (
col1 CHAR(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
...
);
You can convert from utf8mb3
to utf8mb4
. It's a safe operation as utf8mb3
is a subset of utf8mb4
.
ALTER TABLE tab1
MODIFY COLUMN col1 CHAR(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-
Column type size limitations
- A
TINYTEXT
column can hold up to 255 bytes, so it can hold up to 85 3-byte or 63 4-byte characters. - Suppose that you have a
TINYTEXT
column that usesutf8mb3
but must contain more than 63 characters. You cannot convert it toutf8mb4
unless you also change the data type to a longer type such asTEXT
.
- A
-
Large column types
- InnoDB has a max index length of 767 bytes (if it's using
COMPACT
orREDUNDANT
row formats). Large text columns should already use sane key prefixing.-
utf8mb3
the index cannot exceed 255 characters (INDEX col1(255)
): -
utf8mb4
the index cannot exceed 191 characters (INDEX col1(191)
):
-
- InnoDB has a max index length of 767 bytes (if it's using