Machine-Independent Assembler Features
아키텍쳐와 상관없는 어셈블러가 제공하는 기능을 설명.
- There are some common assembler features that are not closely related to machine architecture
- The presence or absence of such features is much more closely related to issues such as programmer convenience and software environment than it is to machine architecture
- 5 features are introduced in our textbook (Chapter 2.3)
- Literals, Symbol definitions, Expressions(Fig 2.9, 2.10)
- Program blocks(Fig. 2.11~2.14)
- Control sections(Fig 2.15~2.17)
Fig. 2.9
Fig. 2.10
Literals
- It is often convenient for the programmer to be able to write the value of a constant operand as a part of the instruction that uses it.
- This avoids having to define the constant elsewhere in the program and make up a label for constant.
- Such an operand is called a “literal” because the value is stated literally in the instruction.
- A literal is identified with the prefix “=“, which is followed by a specification of the literal value, using the same notation as in the BYTE statement.
- It is important to understand the difference between a literal and an immediate operand
- With immediate addressing, the operand value is assembled as part of the machine instruction.
- The immediate value is within the machine instruction itself.
- [e.g.] Line 55 in Fig. 2.10: LDA # 3 010003
- With a literal, the assembler generates the specified value as a constant at some other memory location.
- The address of this generated constant is used as the target address for the machine instruction.
- The literal value is obtained from data memory.
- [e.g.] Line 45 in Fig. 2.10: ENDFIL LDA =C'EOF' 032010
Fig. 2.10을 참고하면 002D에 EOF가 할당된 것을 확인할 수 있다.
Literal Pool
- Assembler collects all the literal operands used in a program into one or more literal pools
- Default location is at the end of the program
- A literal pool listing is shown in Fig. 2.10 immediately following the END statement for better code reading. In this case, the pool consists of the single literal =X’05’.
- In some cases, however, a programmer can declare a place (i.e., at some other location in the object program)
- By using the assembler directive LTORG (Line 93 in Fig. 2.10)
- When the assembler encounters a LTORG statement, it creates a literal pool that contains all of the literal operands used since the previous LTORG (or the beginning of the program).
- 리터럴 피연산자를 사용하는 명령어에 가깝게 유지하려면 다음과 같이 하십시오
Duplicate Literals
- Most assemblers recognize duplicate literals (= the same literal used in more than one place in the program), and store only one copy of the specified data value
- For example, the literal =X’05’ is used on lines 215 and 230. However, only one data area with this value is generated. Both instructions refer to the same address in the literal pool for their operand.
How does the Assembler handle Literal Operands?
- The basic data structure needed: LITTAB (Literal Table)
- For each literal used, the table contains [Literal name, Operand value and length, Address assigned to the operand when it is placed in a literal pool]
- During Pass1: Each literal operand is recognized.
- Assembler searches LITTAB for the specified literal name.
- If found, no action is needed; Otherwise, the literal is added to LITTAB (leaving the address unassigned).(주소를 할당받지 않았다는 뜻은 아직 저장되지 않았고 이후에 저장될 예정)
- If the code is LTORG or END, assign addresses for literals in LITTAB.
- During Pass2: Each literal operand is translated to its address.
- The operand address for use in generating object code is obtained by searching LITTAB for each literal operand encountered.
- The data values of literals are inserted into the object program.
Symbol Definitions
- Most assemblers provide an assembler directive that allows the
programmer to define symbols and specify their values.
- EQU is the assembler directives whose main function is the definition of symbols.
- One common use of EQU (for “equate”)
- Form: symbol EQU value
- To establish symbolic names that can be used for improving readability in place of numeric values.
MAXLEN EQU 4096 : symbol을 정의하더라도 4096이 메모리에 저장되지 않는다. 어셈블러가 symbol과 value값을 symbol table에 매핑할수 있도록 도와주는 것.
MAXLEN WORD 4096 : 메모리에 4096이라는 숫자를 할당함.
Symbol Definitions with EQU
- When the assembler encounters the EQU statement, it enters MAXLEN into SYMTAB (with value 4096).
- During assembly of the LDT instruction, the assembler searches
SYMTAB for the symbol MAXLEN, using its value as the operand in the instruction.
- The resulting object code is exactly the same as the original version of the instruction (i.e., the one using the value instead of symbol)
- However, the source statement is easier to understand. It is also much easier to find and change the value of MAXLEN if this becomes necessary
- Another common use of EQU is to define mnemonic names for registers.
- In a machine with many general-purpose registers, not like in SIC, having mnemonic names for registers can help!
- c.f.) The standard mnemonics for registers are already defined in SIC
- The programmer can establish and use names that reflect the logical function of the registers in the program
Restriction in Symbol Definitions
- The descriptions on the EQU statement contain a restriction that is common to all symbol-defining assembler directives.
- All symbols used on the right-hand side of the statement (= all terms used to specify the value of the new symbol) must have been defined previously in the program.
참조되는 Symbol을 먼저 정의해주기
Expression
- Most assemblers allow the use of expression wherever a single operand (labels, literals, etc.) is permitted. Each such expression must, of course, be evaluated by the assembler to produce a single operand address or value.
- Assemblers generally allow arithmetic expressions formed according to the normal rules using the operators +,-,*,and /
- Individual terms(항) in the expression may be constants, user-defined symbols, or special terms.
- The most common such special term is the current value of the location counter (often designated by *). This term represents the value of the next unassigned memory location
- [e.g.] Line 106 in Fig. 2.10: BUFEND EQU
*
(현재 PC값에서 -3한 값이라 생각하면 편할듯)
- This statement gives BUFEND a value that is the address of the next byte after the buffer area
- Expressions are classified depending upon the types of values they produce
- Absolute expressions: independent of program location
- Relative expressions: relative to the beginning of program
- A symbol defined by EQU can also be absolute or relative
이 값(16진수 1000)은 대부분의 Loc 열의 다른 항목과 마찬가지로 주소를 나타내지 않으며 소스 문(MAXLEN)에 나타나는 기호와 관련된 값을 나타냅니다.
- To determine the type of an expression, we must keep track of the types of all symbols defined in the program. For this purpose we need a flag in the symbol table to indicate type of value (absolute or relative) in addition to the value itself. (상대값인지 절대값인지 확인한대)
- Thus, SYMTAB needs a type field to discern absolute symbols from relative symbols
- Operands of format 4 instructions may have relative values; Such relative values should be modified for relocation by the loader later.
- We need to know which is relative
- e.g., +JSUB RDREC -> relative value, we need a modification record
+LDT #MAXLEN -> absolute value
Program Blocks
- In all of the examples we have seen so far, the program being assembled was treated as a unit
- The source programs logically contained subroutines, data areas, etc
- However, they were handled by the assembler as one entity, resulting in a single block of object code.
- Many assemblers provide features that allow more flexible handling of the source and object programs
- Some features allow the generated machine instructions and data to load into the memory in a different order from the corresponding source statements -> Program Blocks, to refer to segments of code that are rearranged within a single object program unit
- Other features result in the creation of several independent parts of the object program.
These parts maintain their identity and are handled separately by the loader -> Control Sections, to refer to segments that are translated into independent object program units.
- Figures 2.11 and 12 show an example program as it might be written using program blocks.
- The assembler directive USE indicates which portions(부분) of the source program belong to the various blocks (If no USE statements are included, the entire program belongs to the single block) -> use를 사용하면 다양한 블록, 아니면 단일블록
- Three program blocks are used in Figures 2.11 and 2.12.
- Executable instructions: (unnamed) -> block field 0
- Data area that are a few words or less in length (named CDATA) -> block field 1
- Data area consisting of larger blocks of memory (named CBLKS) -> block field 2
- Each program block has relative address space separately
default block, CDATA, CBLKS의 block들은 각각 서로 다른 주소 공간을 갖는다. 예를 들어 default block 첫 instruction의 주소값도 0, CDATA blcok 첫 instruction의 주소값도 0인 것을 확인할 수 있다.
- Pass 1
- Maintain a separate LOCCTR for each program block
- Each label is assigned an address relative to the start of the block that contains each label
- SYMTBL stores block number for each symbol
- Store starting address of each block in block table
- At the end of Pass 1, the assembler constructs a table that contains the starting addresses and lengths for all blocks
- Pass 2
- For translation, the assembler calculates the address for each symbol relative to the start of the object program
- By adding the address of the symbol, relative to the start of its block, to the assigned block starting address (e.g., the address of INPUT = 6(The address relative to the start of the block) + 66(The starting address of CDATA) = 6C)
Fig. 2.12
Fig. 2.13(object file)
object file에서는 예약어 작성 안한다!!
- The first two Text records: generated from the source lines 5~70
- When the USE on line 92 is recognized, the assembler writes out the current Text record (even though there is still room left in it)
- The assembler then prepares to begin a new Text record for the new program block. The next two Text records come from lines 125~180
- The 5th Text record contains the single byte of data from line 185
- The 6th Text record resumes the default program block and the rest of the object program continues in similar fashion
Fig. 2.14
-
It doesn’t matter that the Text records of the object program are not in sequence by address; the loader will simply load the object code from each record at the indicated address.
-
객체 프로그램의 텍스트 레코드가 주소별로 순차적이지 않아도 상관없습니다. 로더는 단순히 표시된 주소에 있는 각 레코드에서 객체 코드를 로드합니다.
-
When this loading is completed, the generated code from the default block will occupy relative locations 0000 through 0065; the generated code and reserved storage for CDATA will occupy locations 0066 through 0070; the storage reserved for CBLKS will occupy locations 0071 through 1070
-
이 로드가 완료되면 기본 블록에서 생성된 코드는 0000에서 0065까지의 상대적인 위치를 차지하게 되며, 생성된 코드와 CDATA용으로 예약된 스토리지는 0066에서 0070까지의 위치를 차지하게 됩니다. CBLK용으로 예약된 스토리지는 0071에서 1070까지의 위치를 차지하게 됩니다
RESB, RESW 는 Object file에 포함하지 않는다. 하지만 이후에 로더가 올려준다.
Advantages of Using Program blocks
literal과 immediate 차이
아래 예제들..
디폴트가 프로그램 끝에
리터럴 오퍼랜드
투패스 알고리즘 리터럴 테이블 사용한다.
심볼 데피니션 EQU를 사용한다.
사용하지 않았을때 사용했을때 차이
EQU와 WORD의 차이
워드는 실제로 메모리에 4096이 1워드로 생성이 된다.
EQU는 메모리 상수가 할당 되는게 아니라 말 그대로 심볼이 정의.
심볼 테이블에만 들어간다. 메모리 할당이 되지 않는다.
immediate addressing이 된다.
word면 실제 메모리 접근해서 가야한다.
심볼 정의 제약 사항 빨간 글씨
expression
expressiong의 값이 프로그램 위치에 따라 달라지는지 안달라지는지의 개념과 이유
기본적으로 format4 instruction을 수정해야 하는지 말아야 하는지
relative symbol을 사용하는 format4에 대해서만 modification record를 만들면 된다.
program blck
소스 코드 순서대로 메모리에 올리는게 아니라
메모리에 올릴때 순서를 바꿔서 올리려고 한대
pass 1, pass2로 처리 가능
use directive를 사용한 순서대로 바뀐다.
program block의 이점. 두가지