设为首页 收藏本站
查看: 579|回复: 0

[经验分享] “Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

[复制链接]

尚未签到

发表于 2016-10-19 06:37:54 | 显示全部楼层 |阅读模式
  And I'm getting the following error when tyring to add a row to a table:
Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'content' at row 1
  
  MySQL's utf8 permits only the unicode characters that can be represented with 3 bytes in UTF-8. Here you have a character that needs 4 bytes: \xF0\x90\x8D\x83 (U+10343 GOTHIC LETTER SAUIL).
  If you have MySQL 5.5 or later you can change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8
  

Answers:

Answers (1)
  
  UTF-8 encodes everything in the basic multilingual plane (i.e. U+0000 to U+FFFF inclusive) in 1-3 bytes. Therefore, you just need to check whether everything in your string is in the BMP.
  In Java, that means checking whether any char (which is a UTF-16 code unit) is a high or low surrogate character, as Java will use surrogate pairs to encode non-BMP characters:

publicstaticboolean isEntirelyInBasicMultilingualPlane(String text){for(int i =0; i < text.length(); i++){if(Character.isSurrogate(text.charAt(i))){returnfalse;}}returntrue;}


Answers (2)
  If you do not want to support beyond BMP, you can just strip those characters before handing it to MySQL:

publicstaticString withNonBmpStripped(String input ){if( input ==null)thrownewIllegalArgumentException("input");return input.replaceAll("[\\ud800-\\udfff]","");}
  If you want to support beyond BMP, you need MySQL 5.5+ and you need to change everything that's utf8 to utf8mb4 (collations, charsets ...). But you also need the support for this in the driver that I am not familiar with. Handling these characters in Java is also a pain because they are spread over 2 chars and thus need special handling in many operations.
  
  原因是当前mysql的字符集为utf-8,最多3个字节,但遇到像emoji表情这样的特殊字符时,需要4个字节来表示。所以导致insert报错。

mysql> show character set;  
+----------+-----------------------------+---------------------+--------+  
| Charset  | Description                 | Default collation   | Maxlen |  
+----------+-----------------------------+---------------------+--------+  
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |  
+----------+-----------------------------+---------------------+--------+  

  
  解决办法:
  1.备份数据库
  
  2.升级Mysql Server到v5.5.3+
  Upgrade the MySQL server to v5.5.3+
  
  3.修改database,table,column字符集

DSC0000.gif

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
#
For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
#
For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy
-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)



  
  4.修改my.ini(linux下为my.cnf)



[client]
default-character-set
= utf8mb4
[mysql]
default-character-set
= utf8mb4
[mysqld]
character-set-client-handshake
= FALSE
character-set-server
= utf8mb4
collation-server
= utf8mb4_unicode_ci
init_connect
='SET NAMES utf8mb4'



  重新启动Mysql,检查字符集:



mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)



  
  5.如果是用java连接的mysql,需要升级mysql-connector-java.jar至少到5.1.14
  
  
  参考资料:
  http://mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4
  http://technovergence-en.blogspot.jp/2012/03/mysql-from-utf8-to-utf8mb4.html
  http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html
  http://stackoverflow.com/questions/7814293/how-to-insert-utf-8-mb4-characteremoji-in-ios5-in-mysql
  http://stackoverflow.com/questions/8709892/mysql-throws-incorrect-string-value-error

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-288065-1-1.html 上篇帖子: 从小型网站到超大规模网站的MySQL参考架构 下篇帖子: mysql 复制表数据,表结构的3种方法(转载)
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表