Description
unicodestr = native2unicode(bytes) takes a vector containing numeric values in the range [0,255] and converts these values as a stream of 8-bit bytes to Unicode characters. The stream of bytes is assumed to be in the MATLAB default character encoding scheme. Return value unicodestr is a char vector that has the same general array shape as bytes.
unicodestr = native2unicode(bytes, encoding) does the conversion with the assumption that the byte stream is in the character encoding scheme specified by the string encoding. encoding must be the empty string ('') or a name or alias for an encoding scheme. Some examples are 'UTF-8', 'latin1', 'US-ASCII', and 'Shift_JIS'. For common names and aliases, see the Web site http://www.iana.org/assignments/character-sets. If encoding is unspecified or is the empty string (''), the MATLAB default encoding scheme is used.
Note If bytes is a char vector, it is returned unchanged.
Examples
This example begins with a vector of bytes in an unknown character encoding scheme. The user-written function detect_encoding determines the encoding scheme. If successful, it returns the encoding scheme name or alias as a string. If unsuccessful, it throws an error represented by an MException object, ME. The example calls native2unicode to convert the bytes to Unicode characters:
try
enc = detect_encoding(bytes);
str = native2unicode(bytes, enc);
disp(str);
catch ME
rethrow(ME);
end
Note that the computer must be configured to display text in a language represented by the detected encoding scheme for the output of disp(str) to be correct.
Description
bytes = unicode2native(unicodestr) takes a char vector of Unicode characters, unicodestr, converts it to the MATLAB default character encoding scheme, and returns the bytes as a uint8 vector, bytes. Output vector bytes has the same general array shape as the unicodestr input. You can save the output of unicode2native to a file using the fwrite function.
bytes = unicode2native(unicodestr, encoding) converts the Unicode characters to the character encoding scheme specified by the string encoding. encoding must be the empty string ('') or a name or alias for an encoding scheme. Some examples are 'UTF-8', 'latin1', 'US-ASCII', and 'Shift_JIS'. For common names and aliases, see the Web site http://www.iana.org/assignments/character-sets. If encoding is unspecified or is the empty string (''), the MATLAB default encoding scheme is used.
Examples
This example begins with two strings containing Unicode characters. It assumes that string str1 contains text in a Western European language and string str2 contains Japanese text. The example writes both strings into the same file, using the ISO-8859-1 character encoding scheme for the first string and the Shift-JIS encoding scheme for the second string. The example uses unicode2native to convert the two strings to the appropriate encoding schemes.