출처
숫자가 아닌 임의의 객체를 인코딩하려면 2가지 조건이 필요하다.
하나는 인코딩 함수가 one-to-one 이어야 한다는 것과,
두번째는 Prefix-Free 이어야 한다는 것이다.
One-To-One 이 필요한 이유는 매우 직관적이다.
One-To-One 이 아니라면 Decoding 이 불가능하기 때문이다.
Prefix-Free 가 필요한 이유는 여러개의 객체를 tuple 로 묶어서 인코딩을 하기 위해서이다.
만약 Prefix-Free 하지 않는다면, 여러개의 객체를 하나의 bit string 으로 표현하는 인코딩 함수가
one-to-one 이 아니게 되어서, Decoding 이 불가능할수도 있다.
Definition 2.17 (Prefix free encoding)
y,y′: string
y is a prefix of y′ if
∣y∣≤∣y′∣ and ∀i<∣y∣, yi′=yi
O : non-empty set
E:O→{0,1}∗ : function
E is prefix free if
E(o)=∅,∀o∈O and
∄ distinct objects o,o′∈O s.t E(o) is a prefix of E(o′)
Theorem 2.18 (Prefix-free implies tuple encoding)
E:O→{0,1}∗ is prefix-free
Define E:O∗→{0,1}∗ as follows:
∀o0,…,ok−1∈O∗
E(o0,…,ok−1)=E(o0)E(o1)⋯E(ok−1)
Then, E is one to one.
Proof
Suppose E is not one to one.
⇒∃ distinct tuples (o0,…,ok−1), (o0′,…,ok′−1′) s.t
E(o0,…,ok−1)=E(o0′,…,ok′−1′)=x∈{0,1}∗
Let i be the first index s.t oi=oi′
Then, xj=E(oj)=E(oj′) for ∀j∈{0,1,2,…i−1}.
x=E(o0,…,ok−1)=x0⋯xi−1E(oi)E(oi+1)⋯E(ok−1)
=E(o0′,…,ok′−1′)=x0⋯xi−1E(oi′)E(oi+1′)⋯E(ok′−1′)
∴E(oi)⋯E(ok−1)=E(oi′)⋯E(ok′−1′)
This means that one of E(oi) and E(oi′) must be a prefix of the other.
This contradicts to "E is prefix-free".
Proved by contradiction.
QED
Lemma 2.20
E:O→{0,1}∗ is one-to-one. Then,
∃ one-to-one prefix-free encoding E s.t ∣E(o)∣≤2∣E(o)∣+2 for ∀o∈O.
Proof
Double every bit in the string x (0↦00, 1↦11)
and mark the end of the string by concatenating to it the pair 01.
Then, E(x)=E(x′) for ∀x=x′
Define a function PF:{0,1}∗→{0,1}∗ for x∈{0,1}∗ as follows:
PF(x)=x0x0x1x1…xn−1xn−101
We want to show E(o)=PF(E(o)) is an prefix-free encoding.
(1) E is encoding
E is one-to-one and PF is one-to-one.
Then, E=PF∘E is also one-to-one
∴ E is encoding.
(2) E is prefix-free
E is prefix-free
⇔ E(o) is not a prefix of E(o′) for
∀ distinct object o,o′∈O
Suppose E(o) is a prefix of E(o′) for some distinct o,o′∈O
E(o)=PF(E(o))
E(o′)=PF(E(o′))
Let x=E(o), x′=E(o′).
Since E is one-to-one, x=x′.
(i) ∣x∣<∣x′∣
the two bits in positions 2∣x∣ in PF(x) will be 01
but two bits in positions 2∣x∣ in PF(x′) will be 00 or 11
⇒ PF(x) and PF(x′) differ in the coordinates 2∣x∣,2∣x∣+1
∴PF(x) cannot be a prefix of PF(x′).
(ii) ∣x∣=∣x′∣
Since x=x′, ∃ coordinate i in which they differ.
⇒ PF(x) and PF(x′) differ in the coordinates 2i,2i+1
∴ PF(x) cannot be a prefix of PF(x′).
(iii) ∣x∣>∣x′∣
then ∣PF(x)∣=2∣x∣+2>∣PF(x′)∣=2∣x′∣+2.
⇒ PF(x) is longer than PF(x′).
∴PF(x) cannot be a prefix of PF(x′).
Therefore, PF(x)=E(o) is not a prefix of PF(x′)=E(o′) for ∀o,o′∈O.
Hence, completing the proof.
QED