Context-Free Grammars(1)

dandb3·2023년 2월 3일

compiler grammar parse()

Compilers

목록 보기

4/8

The Formal Definition of a Context-Free Grammar

$Terminals$ 는 string이 구성되는 basic symbol들 이다.
"token name"은 "terminal"과 동의어이다.
lexical analyzer로 부터 나온 token을 terminal이라고 부른다.
$Nonterminals$ 는 문자열들의 집합을 나타내는 문법적인 변수이다.
grammar로 만들어진 language를 정의하는데 쓰인다.
Grammar에서, 하나의 nonterminal은 $start symbol$ 로 쓰이고, 그것이 의미하는 string들의 집합이 grammar로 만들어진 language이다.
관습적으로 start symbol의 구성이 처음 위치해 있다.
Grammar의 production들이 string을 구성하기 위해 어떻게 terminal과 nonterminal이 조합되어야 하는지를 나타낸다. 각각의 $production$ 은 아래와 같이 구성된다.
(a) production에서 $head$ 나 $left\,side$ 라고 불리는 nonterminal
-> 이 production은 head가 가리키고 있는 string들을 정의한다.
(b) symbol $\rightarrow$ 또는 $::=$
(c) 0개 이상의 terminal들과 nonterminal들로 구성된 $body$ 혹은 $right\,side$

Notational Conventions

These symbols are terminals:
(a) Lowercase letters early in the alphabet, such as $a$ , $b$ , $c$ .
(b) Operator symbols such as $+$ , $*$ , and so on.
(c) Punctuation symbols such as parentheses, comma, and so on.
(d) The digits 0,1,...,9.
(e) Boldface strings such as id or if, each of which represents a single terminal symbol.
These symbols are nonterminals:
(a) Uppercase letters early in the alphabet, such as $A$ , $B$ , $C$ .
(b) The letter $S$ , which, when it appears, is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
(d) When discussing programming constructs, uppercase letters may be used to represent nonterminals for the constructs. For example, non-terminals for expressions, terms, and factors are often represented by $E$ , $T$ , and $F$ , respectively.
Uppercase letters late in the alphabet, such as $X$ , $Y$ , $Z$ , represent $grammar\,symbols$ ; that is, either nonterminals or terminals.
Lowercase letters late in the alphabet, chiefly $u,v,...,z,$ represent (possibly empty) strings of terminals.
Lowercase Greek letters, $\alpha,\beta,\gamma$ for example, represent (possibly empty) strings of grammar symbols. Thus, a generic production can be written as $A\rightarrow\alpha$ , where $A$ is the head and $\alpha$ the body.
A set of productions $A\rightarrow\alpha_1,\,A\rightarrow\alpha_2,...,A\rightarrow\alpha_k$ with a common head $A$ (call them $A$ - $productions$ ), may be written $A\rightarrow\alpha_1|\alpha_2|...|\alpha_k$ . Call $\alpha_1,\alpha_2,...,\alpha_k$ the $alternatives$ for $A$ .
Unless stated otherwise, the head of the first production is the start symbol.

Derivations

예시를 통한 설명
- $E\rightarrow-E$ 라는 production이 있을 때,
- $E$ 를 $-E$ 로 바꾸었을 때 $E\Rightarrow-E$ 로 표기하고, " $E$ derives $-E$ "라고 읽는다.
- 연속적으로 적용도 할 수 있다 : $E\Rightarrow-E\Rightarrow-(E)\Rightarrow-$ (id)
- 이런 경우, 위와 같은 sequence of replacement를 "a $derivation$ of $-$ (id) from $E$ 라고 부른다.
- $-$ (id) 가 expression의 한 instance 라는 것도 증명이 된 셈이다.
General한 설명
- $A$ : nonterminal, $\alpha,\beta$ : arbitrary strings of grammar symbols
- $\alpha A\beta$ 를 고려해 보자.
- $A\rightarrow\gamma$ 가 production이라면, $\alpha A\beta\Rightarrow\alpha\gamma\beta$ 라고 쓸 수 있다.
- symbol $\Rightarrow$ 는 "derives in one step"을 의미한다.
- derivation이 반복되서 $\alpha_1\Rightarrow\alpha_2\Rightarrow...\Rightarrow\alpha_n$ 과 같이 쓰이면, $\alpha_1$ $derives$ $\alpha_n$ 이라고 한다.
- "derives in zero or more steps"인 경우, symbol $\overset{*}\Rightarrow$ 로 표현할 수 있다.
- 그러므로,
  1. $\alpha\overset{*}\Rightarrow\alpha$ , for any string $\alpha$
  2. If $\alpha\overset{*}\Rightarrow\beta$ and $\beta\Rightarrow\gamma$ , then $\alpha\overset{*}\Rightarrow\gamma$ .
- 또한 $\overset{+}\Rightarrow$ 는 "derives in one or more steps"를 의미함.
- 정의부분
  - $S$ 가 grammar $G$ 의 start symbol이라고 했을 때,
    만약 $S\overset{*}\Rightarrow\alpha$ 라면 $\alpha$ 를 G의 $sentential form$ 이라고 부른다.
  - $G$ 의 $sentence$ 는 nonterminals를 갖고 있지 않은 $G$ 의 sentential form이다.
  - $language$ $generated$ $by$ a grammar는 그것의 sentence들의 집합이다.
  - string of terminals $w\in L(G)\Longleftrightarrow w$ is a sentence of $G$ (or $S\overset{*}\Rightarrow w$ ).
  - grammar로 부터 만들어질 수 있는 language : $context$ - $free$ $language$
  - 두 grammar가 같은 language를 만든다면, 둘은 $equivalent$ 하다고 한다.
derivation의 두 가지 방법
1. $leftmost$ derivations
  - 각 sentential에서 가장 왼쪽에 있는 nonterminal이 항상 선택된다.
  - 만약 $\alpha\Rightarrow\beta$ 가 $\alpha$ 의 leftmost nonterminal을 바꾼 것이라면, $\alpha\underset{lm}\Rightarrow\beta$ 라고 쓴다.
2. $rightmost$ derivations
  - 항상 rightmost nonterminal이 선택된다.
  - $\alpha\underset{rm}\Rightarrow\beta$ 라고 쓴다.
일반적인 경우, $w$ 는 terminal로만 구성되어 있고, $A\rightarrow\delta$ 가 적용된 production이고, $\gamma$ 가 grammar symbols의 string이라고 한다면, 모든 leftmost step은 $wA\gamma\underset{lm}\Rightarrow w\delta\gamma$ 로 표현할 수 있다.
$\alpha$ 가 $\beta$ 를 leftmost로만 derive할 수 있다고 강조할 때 $\alpha\overset{*}{\underset{lm}\Rightarrow}\beta$ 라고 쓴다.
만약 $S\overset{*}{\underset{lm}\Rightarrow}\alpha$ 라면, $\alpha$ 가 grammar의 $left$ - $sentential$ $form$ 라고 부른다.
rightmost의 경우에도 동일하게 정의한다. Rightmost derivation을 $canonical$ derivation이라고 부르기도 한다.

dandb3

공부 내용 저장소

이전 포스트

Regular Expression to DFA

다음 포스트

Context-Free Grammars(1)

Compilers

The Formal Definition of a Context-Free Grammar

Notational Conventions

Derivations

Regular Expression to DFA

Context-Free Grammars(2)

0개의 댓글