And I'm getting the following error when tyring to add a row to a table:
Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'content' at row 1
MySQL's utf8 permits only the unicode characters that can be represented with 3 bytes in UTF-8. Here you have a character that needs 4 bytes: \xF0\x90\x8D\x83 (U+10343 GOTHIC LETTER SAUIL).
If you have MySQL 5.5 or later you can change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8
Answers:
Answers (1)
UTF-8 encodes everything in the basic multilingual plane (i.e. U+0000 to U+FFFF inclusive) in 1-3 bytes. Therefore, you just need to check whether everything in your string is in the BMP.
In Java, that means checking whether any char (which is a UTF-16 code unit) is a high or low surrogate character, as Java will use surrogate pairs to encode non-BMP characters:
publicstaticboolean isEntirelyInBasicMultilingualPlane(String text){for(int i =0; i < text.length(); i++){if(Character.isSurrogate(text.charAt(i))){returnfalse;}}returntrue;}
Answers (2)
If you do not want to support beyond BMP, you can just strip those characters before handing it to MySQL:
publicstaticString withNonBmpStripped(String input ){if( input ==null)thrownewIllegalArgumentException("input");return input.replaceAll("[\\ud800-\\udfff]","");}
If you want to support beyond BMP, you need MySQL 5.5+ and you need to change everything that's utf8 to utf8mb4 (collations, charsets ...). But you also need the support for this in the driver that I am not familiar with. Handling these characters in Java is also a pain because they are spread over 2 chars and thus need special handling in many operations.
2.升级Mysql Server到v5.5.3+
Upgrade the MySQL server to v5.5.3+
3.修改database,table,column字符集
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)