[ENG] Encoding [UTF-16BE; ANSI]

mntly·2024년 8월 29일

ANSI CVE-2021-39863 English UTF-16BE

0

CVE.2021.39863-Exploit.Knowledge

목록 보기

2/8

1. Unicode

"Unicode" itself is not an Encoding method, but it is a code table that maps a character and its encoding character one-to-one.
Unicode represents one character with 2 bytes. Therefore, the string encoded by Unicode ends with "\x00\x00", not "\x00".
For this reason, the Null Terminator of string encoded by Unicode is 2 bytes : "\x00\x00"

UTF-16BE

UTF-16 : 16-bit Unicode Transformation Format
BE : Big Endian

According to Unicode, UTF-16BE is an encoding method to the form of Big Endian by one character to 16 bit (2 bytes).

It uses BOM, "%uFEFF", at the beginning of the string to represent it as UTF-16BE.

BOM, Byte Order Mark

Originally, BOM plays the role of giving information about the string to the program.
In UTF-16BE, BOM gives information about Endian.

BOM : %uFEFF : UTF-16BE : Big Endian

BOM : %uFFFE : UTF-16LE : Little Endian

💡 UTF-16BE Format : "%uFEFF .... %u0000"

Ex:) "A" : "%uFEFF%u0041%u0000"

2. ANSI

"ANSI" itself is not an Encoding method, but it means the default local/codepage for my system.
In ANSI, each CodePage has a code table and uses a special CodePage according to each language.
In ANSI, an English character is represented as 1 byte. Therefore, the Null Terminator of an English string is 1 byte. " \x00
Unlike UTF-16, ANSI doesn't use BOM.

💡 ANSI Format : "\x.. .. \x00"

Ex:) "A" : "\x41\x00"

REFERENCE

Feedback is always welcome

이전 포스트

[KOR] Encoding [UTF-16BE; ANSI]

다음 포스트

[KOR] Process of JS in PDF at Adobe

0개의 댓글

관련 채용 정보