python decode unicode encode

jiang1799 发表于 2015-12-2 13:07:46

　　    字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。
　　    代码中字符串的默认编码与代码文件本身的编码一致，以下是不一致的两种:

　　    1. s = u'你好'
　　          该字符串的编码就被指定为unicode了，即python的内部编码，而与代码文件本身的编码(查看默认编码：import sys print('hello',sys.getdefaultencoding())ascii 。设置默认编码：import sys reload(sys)sys.setdefaultencoding('utf-8')))无关。因此，对于这种情况做编码转换，只需要直接使用encode方法将其转换成指定编码即可.
　　    2. # -*- coding: utf-8 -*-
　　          s = ‘你好’
　　          此时为utf-8编码，ascii编码不能显示汉字
　　
　　isinstance(s, unicode)#用来判断是否为unicode ,是返回True，不是返回False
　　unicode(str,'gb2312')与str.decode('gb2312')是一样的，都是将gb2312编码的str转为unicode编码
　　
　　使用str.__class__可以查看str的编码形式
　　原理说了半天，最后来个包治百病的吧：）
　　

　　#!/usr/bin/env python
#coding=utf-8
s="中文"
if isinstance(s, unicode):
#s=u"中文"
print s.encode('gb2312')
else:
#s="中文"
print s.decode('utf-8').encode('gb2312')
　　
　　语音模块代码：

# -*- coding: utf-8 -*-import
import sys
print('hello',sys.getdefaultencoding())
def xfs_frame_info(words):
#decode utf-8 to python internal unicode coding
isinstance(words,unicode)
wordu = words.decode('utf-8')
#encode python unicode to gbk
data = wordu.encode('gbk')
length = len(data) + 2
frame_info = bytearray(5)
frame_info = 0xfd
frame_info = (length >> 8)
frame_info = (length & 0x00ff)
frame_info = 0x01
frame_info = 0x01

buf = frame_info + data
print("buf:",buf)
return buf
if __name__ == "__main__":
print("hello world")
words1= u'你好'
#encodetype = isinstance(words1,unicode)
#print("encodetype",encodetype)
print("origin unicode", words1)
words= words1.encode('utf-8')
print("utf-8 encoded", words)
a = xfs_frame_info(words)
print('a',a)
if __name__ == "__main__":
print("hello world")
words1= '你好'
print("oringe utf-8 encode:",words1)
encodetype = isinstance(words1,unicode)
wordu = words1.decode('utf-8')
print("unicode from utf-8 decode:",wordu)
#encodetype = isinstance(words1,utf-8)
#encodetype = isinstance(words1,'ascii')
#print("encodetype",encodetype)
#print("origin unicode", words1)

word_utf8 = wordu.encode('utf-8')
#encodetype2 = isinstance(words,utf8)
#print("encodetype2",encodetype2)
print("utf-8 encoded",word_utf8)
a = xfs_frame_info(word_utf8)
print('a',a)
　　你好前不加u''时，要多一步decode为unicode

页: [1]

运维网's Archiver

python decode unicode encode