"Unicode" itself is not an Encoding method, but it is a code table that maps a character and its encoding character one-to-one.
Unicode represents one character with 2 bytes. Therefore, the string encoded by Unicode ends with "\x00\x00"
, not "\x00"
.
For this reason, the Null Terminator of string encoded by Unicode is 2 bytes : "\x00\x00"
UTF-16 : 16-bit Unicode Transformation Format
BE : Big Endian
According to Unicode, UTF-16BE is an encoding method to the form of Big Endian by one character to 16 bit (2 bytes).
"%uFEFF"
, at the beginning of the string to represent it as UTF-16BE.Originally, BOM plays the role of giving information about the string to the program.
In UTF-16BE, BOM gives information about Endian.
BOM :
%uFEFF
: UTF-16BE : Big Endian
BOM :
%uFFFE
: UTF-16LE : Little Endian
💡 UTF-16BE Format :
"%uFEFF .... %u0000"
"A"
: "%uFEFF%u0041%u0000"
"ANSI" itself is not an Encoding method, but it means the default local/codepage for my system.
In ANSI, each CodePage has a code table and uses a special CodePage according to each language.
In ANSI, an English character is represented as 1 byte. Therefore, the Null Terminator of an English string is 1 byte. " \x00
Unlike UTF-16, ANSI doesn't use BOM.
💡 ANSI Format :
"\x.. .. \x00"
"A" : "\x41\x00"
Feedback is always welcome