设为首页 收藏本站
查看: 740|回复: 0

[经验分享] NSStringEncoding

[复制链接]

尚未签到

发表于 2016-1-2 10:58:52 | 显示全部楼层 |阅读模式
  今天看见一个很棒的博客,只是无法粉丝之,就转载一下几篇很好用的博文吧
  转载至:http://hi.baidu.com/may2150209/blog/item/198976ace7e583054b36d6f1.html
  PS:发现博主也是转载的,anyway,好用就行
  以下为正文
  今天在尝试抓取起点中文网首页的时候遇到了一个问题 — 如果编码没有用对的话是没办


-

  • 分享微经验,让更多的人受益
  • 快去分享吧!!!
  法读取任何东西的.
  
  这也算是C#用的太多养成的坏习惯, 以前基本没怎么考虑过编码问题. 应该说, C#里面就算编码错了, 也能读进来东西,
  只是一片乱码而已. Cocoa里面就狠了点, 直接抛异常了.
  下面是刚开始写的一段代码, 把起点中文网的主页下载到一个字符串中.
  NSURL *url = [[NSURL alloc]
  initWithString:@"http://www.cmfu.com"];
  NSError *error;
  NSString *xml = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];
  if(xml == nil)
  { NSLog(@"Error reading url at %@", [error localizedFailureReason]); }
  else { [result setString:xml]; }
  死活下载失败, 错误信息就是编码不对. 好吧, 我打开了帮助查看了下所有的编码:
  enum {
  NSASCIIStringEncoding =
  1,
  NSNEXTSTEPStringEncoding =
  2,
  NSJapaneseEUCStringEncoding =
  3,
  NSUTF8StringEncoding =
  4,
  NSISOLatin1StringEncoding =
  5,
  NSSymbolStringEncoding =
  6,
  NSNonLossyASCIIStringEncoding =
  7,
  NSShiftJISStringEncoding =
  8,
  NSISOLatin2StringEncoding =
  9,
  NSUnicodeStringEncoding =
  10,
  NSWindowsCP1251StringEncoding =
  11,
  NSWindowsCP1252StringEncoding =
  12,
  NSWindowsCP1253StringEncoding =
  13,
  NSWindowsCP1254StringEncoding =
  14,
  NSWindowsCP1250StringEncoding =
  15,
  NSISO2022JPStringEncoding =
  21,
  NSMacOSRomanStringEncoding =
  30,
  NSUTF16StringEncoding = NSUnicodeStringEncoding,
  NSUTF16BigEndianStringEncoding =
  0x90000100,
  NSUTF16LittleEndianStringEncoding =
  0x94000100,
  NSUTF32StringEncoding =
  0x8c000100,
  NSUTF32BigEndianStringEncoding =
  0x98000100,
  NSUTF32LittleEndianStringEncoding =
  0x9c000100,
  };
  我一个一个的试,
  居然全都不行! 崩溃了, 这都什么年代了, 难道Cocoa还不支持中文? 不可能啊.
  估计是上面那份文档里面只是列出了最长用的几种编码(这里是苹果认为最长用的, 可见对于中国基本是无视了, 鄙视下!),
  我就写了下面这段代码输出了所有支持的编码:
  const NSStringEncoding *encodings = [NSString availableStringEncodings];
  NSMutableString *str = [[NSMutableString alloc] init];
  NSStringEncoding encoding;
  while ((encoding = *encodings++) != 0)
  {
  [str appendFormat: @"%@ === %in", [NSString localizedNameOfStringEncoding:encoding], encoding]; }
  [result setString: str];
  好家伙, 果然被我猜中了, 下面就是所有支持的编码列表
  Western (Mac OS Roman) === 30
  Japanese (Mac OS) === -2147483647
  Traditional Chinese (Mac OS) === -2147483646
  Korean (Mac OS) === -2147483645
  Arabic (Mac OS) === -2147483644
  Hebrew (Mac OS) === -2147483643
  Greek (Mac OS) === -2147483642
  Cyrillic (Mac OS) === -2147483641
  Devanagari (Mac OS) === -2147483639
  Gurmukhi (Mac OS) === -2147483638
  Gujarati (Mac OS) === -2147483637
  Thai (Mac OS) === -2147483627
  Simplified Chinese (Mac OS) === -2147483623
  Tibetan (Mac OS) === -2147483622
  Central European (Mac OS) === -2147483619
  Symbol (Mac OS) === 6
  Dingbats (Mac OS) === -2147483614
  Turkish (Mac OS) === -2147483613
  Croatian (Mac OS) === -2147483612
  Icelandic (Mac OS) === -2147483611
  Romanian (Mac OS) === -2147483610
  Celtic (Mac OS) === -2147483609
  Gaelic (Mac OS) === -2147483608
  Keyboard Symbols (Mac OS) === -2147483607
  Farsi (Mac OS) === -2147483508
  Cyrillic (Mac OS Ukrainian) === -2147483496
  Inuit (Mac OS) === -2147483412
  Unicode (UTF-32LE) === -1677721344
  Unicode (UTF-8) === 4
  Unicode (UTF-16) === 10
  Unicode (UTF-16BE) === -1879047936
  Unicode (UTF-16LE) === -1811939072
  Unicode (UTF-32) === -1946156800
  Unicode (UTF-32BE) === -1744830208
  Western (ISO Latin 1) === 5
  Central European (ISO Latin 2) === 9
  Western (ISO Latin 3) === -2147483133
  Central European (ISO Latin 4) === -2147483132
  Cyrillic (ISO 8859-5) === -2147483131
  Arabic (ISO 8859-6) === -2147483130
  Greek (ISO 8859-7) === -2147483129
  Hebrew (ISO 8859-8) === -2147483128
  Turkish (ISO Latin 5) === -2147483127
  Nordic (ISO Latin 6) === -2147483126
  Thai (ISO 8859-11) === -2147483125
  Baltic Rim (ISO Latin 7) === -2147483123
  Celtic (ISO Latin ===
  -2147483122
  Western (ISO Latin 9) === -2147483121
  Romanian (ISO Latin 10) === -2147483120
  Latin-US (DOS) === -2147482624
  Greek (DOS) === -2147482619
  Baltic Rim (DOS) === -2147482618
  Western (DOS Latin 1) === -2147482608
  Greek (DOS Greek 1) === -2147482607
  Central European (DOS Latin 2) === -2147482606
  Cyrillic (DOS) === -2147482605
  Turkish (DOS) === -2147482604
  Portuguese (DOS) === -2147482603
  Icelandic (DOS) === -2147482602
  Hebrew (DOS) === -2147482601
  Canadian French (DOS) === -2147482600
  Arabic (DOS) === -2147482599
  Nordic (DOS) === -2147482598
  Cyrillic (DOS) === -2147482597
  Greek (DOS Greek 2) === -2147482596
  Thai (Windows, DOS) === -2147482595
  Japanese (Windows, DOS) === 8
  Simplified Chinese (Windows, DOS) === -2147482591
  Korean (Windows, DOS) === -2147482590
  Traditional Chinese (Windows, DOS) === -2147482589
  Western (Windows Latin 1) === 12
  Central European (Windows Latin 2) === 15
  Cyrillic (Windows) === 11
  Greek (Windows) === 13
  Turkish (Windows Latin 5) === 14
  Hebrew (Windows) === -2147482363
  Arabic (Windows) === -2147482362
  Baltic Rim (Windows) === -2147482361
  Vietnamese (Windows) === -2147482360
  Western (ASCII) === 1
  Japanese (Shift JIS X0213) === -2147482072
  Chinese (GBK) === -2147482063
  Chinese (GB 18030) === -2147482062
  Japanese (ISO 2022-JP) === 21
  Korean (ISO 2022-KR) === -2147481536
  Japanese (EUC) === 3
  Simplified Chinese (EUC) === -2147481296
  Traditional Chinese (EUC) === -2147481295
  Korean (EUC) === -2147481280
  Japanese (Shift JIS) === -2147481087
  Cyrillic (KOI8-R) === -2147481086
  Traditional Chinese (Big 5) === -2147481085
  Western (Mac Mail) === -2147481084
  Simplified Chinese (HZ GB 2312) === -2147481083
  Traditional Chinese (Big 5 HKSCS) === -2147481082
  Ukrainian (KOI8-U) === -2147481080
  Traditional Chinese (Big 5-E) === -2147481079
  Western (NextStep) === 2
  Non-lossy ASCII === 7
  Western (EBCDIC Latin 1) === -2147480574
  终于看到了熟悉的 GBK 编码, 对应的代码是 -2147482063. Ok, 更改一下最开始的代码
  NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
  NSError *error;
  NSStringEncoding encoder;
  NSString *xml = [NSString stringWithContentsOfURL:url encoding:encoder=-2147482063 error:&error];
  if(xml == nil)
  { NSLog(@"Error reading url at %@", [error localizedFailureReason]); }
  else { [result setString:xml]; }
  终于搞定了! 看到熟悉的中文真是激动了.
  注:转载的

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-159431-1-1.html 上篇帖子: 读取任意编码的文件(转) 下篇帖子: 各种python环境的问题
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表