[CTAP] 7# Portability Pitfalls

문연수·2022년 8월 20일

Andrew Koenig C C Traps and Pitfalls c programming language

CTAP

목록 보기

8/9

1. Words

panacea: something that will solve all problems, 만병통치약
ancestry: your ancestors who lived a long time ago, or the origin of your familiy, 가계, 혈통
diverge: to follow a different direction, or to be or become different
1. (다른 방향으로) 갈라지다[분기하다/나뉘다]
2. 격식 나뉘다, 갈리다.
3. (예상, 게획 등에서) 벗어나다[일탈하다] (<-> converge)
linguistic: connected with language or the study of language, 언어(학)의
yet:
1. (부정문, 의문문에서) 아직 (안 했거나 못 했다는 뜻을 나타낼 때)
2. 아직 (하지 말라는 뜻을 나타낼 때)
3. 접속사 그렇지만, 그런데도 (= nevertheless)
havok: 대파괴, 큰 혼란[피해]
robustness: 억셈, 건장함
enterprising: 진취력[기획력]있는
idiosyncrasy: 특이한 성격[방식], 성벽, 별스러운점 (= eccentricity)
intangibility:
1. 손으로 만질 수 없음, 만져서 알 수 없는 것
2. 막연하여 피할 수 없음, 불가해
outlast: ...보다 더 오래가다[계속하다]
cater: (사업으로 행사에) 음식을 공급하다.
cater to somebody/something: ~의 구미에 맞추다[~에 영합하다]
entail: 수반하다.

2. Summery

- 1. Coping with change

Program tend to last longer than their authors ever dreamed, even when written only for the author's own use. Thus it is not enough to do what works now and ignore the future. Yet we have just seen that trying to be as portable as possible can be expensive by denying us today's benefits in order to live with yesterday's tools.

The best we can do about decisions like these is to admit that they are decisions and not let them be made by accident.

너무 명언이라 가져옴.

- 2. What's in a name?

In fact, all ANSI C guarantees is that the implementation will distinguish external names that differ in the first six characters. For the purpose of this definition, upper-case letters do not differ from the corresponding lower-case letters.

char *Malloc(unsigned n)
{
	char *p, *malloc(unsigned);
    
    p = malloc(n);
    if (p == NULL)
		panic("out of memory");
        
	return p;
}

This function will be recursed without exiting.

- 3. How big is an integer?

The three sizes of integers are nondecreasing. That is, a short integer can contain only values that will also fit in a plain integer and a plain integer can contain only values that will also fit in a long integer.

An ordinary integer is large enough to contain any array subscription.

The size of a character is natural for the particular hardware.

The most important thing is that one cannot count on having any particular precision available.

지금은 specified-width integer types (C99 이래로) 가 생겼기 때문에 이들을 사용하면 portable 한 코드를 작성할 수 있음.

- 4. Are characters signed or unsigned?

Most modern computers support 8-bit characters, so most modern C compilers implement characters as 8-bit integers. However, not all compilers interpret those 8-bit quantities the same way.

If you care whether a character value with the high-order bit on is treated as a negative number, you should probably declare it as unsigned char.

- 5. Shift operators

In a right shift, are vacated bits filled with zeroes or copies of the sign bit?
-> The answer is implementation-defined behavior

What values are permitted for the shift count?
-> if the item being shifted is n bits long, then the shift count must be greater than or equal to zero and strictly less than n.

2번 규칙의 경우 Syntax error 로 넘어가지 않고 Undefined Behavior 로 취급된다:

The result value is also undefined if the value of the right operand is greater than or equal to the width (in bits) of the value of the converted left operand.

C: A Reference Manual 5/e, 232pg.

- 6. Memory location zero

Strictly speaking, this is not a portability problem: the effect of misusing a null pointer is undefined in all C programs.

그냥 접근하지 말라고.

- 7. How does division truncate?

C99 로 넘어오면서 규칙이 좀 바뀌었지만 책의 내용을 간단하게 요약하자면...

q = a / b;
r = a % b;

Most important, we want q*b + r == a, because this is the relation that defines the remainder.
If we change the sign of a, we want that to change the sign of q, but not the magnitude.
When b>0, we want to ensure that r>=0 and r<b. For instance, if the remainder is being used as an index to a hash table, it is important to be able to know that it will always be a valid index.

C89 (혹은 그 이전) 에선 $|r|<|b|$ , 그리고 $a \ge 0, b >0$ 일때 $r \ge 0$ 이라는 조건 아래에서 규칙 1 만 지켜진다. 이게 참 골때리는데 a 가 양수, b 가 음수인 상황에서 q 가 음수라면, q 는 그 몫이 1 더 작아질 수 있다. 무슨 말이냐 하면

q = 5 / -3;
r = 5 % -3;

을 계산했을 때 q = -2 이 되고 r = -1 이 될 수도 있다. -2 * (-3) + (-1) == 5 가 성립하기만 하면 된다.

이 골때리는 규칙은 C99 에서 truncated toward zero 라는 규칙으로 변경 되었으며 나눗셈의 결과는 언제나 0 방향으로 잘려 나가기 때문에, 5 / -3 의 결과는 항상 -1 가 될 수 있도록 변경 되었다. (음수 기준에선 0 이 더 크므로, 0 방향으로 잘린다 means that 몫이 더 큰 방향의 결과를 산출한다)

- 8. How big is a random number?

정답: 딱 RAND_MAX 만큼, ANSI C 선에서 정리 가능

- 9. Case conversion

* 태초의 toupper 와 tolower 가 있었고...

They were originally written as macros:

#define toupper(c) ((c) + 'A' - 'a')
#define tolower(c) ((c) + 'a' - 'A')

This assumption is valid for both the ASCII and EBCDIC character sets, and probably isn't too dangerous, because the nonportability of these macro definitions can be encapsulated in the single file that contains them.

These macros do have one disadvantage, though: when given somthing that is not a letter of the appropriate case, they return garbage.

* 예외처리를 추가하자

He considered rewriting the macros this way:

#define toupper(c) ((c) >= 'a' && (c) <= 'z' ? (c) + 'A' - 'a' : (c))
#define tolower(c) ((c) >= 'A' && (c) <= 'Z' ? (c) + 'a' - 'A' : (c))

but realized that this would cause c to be evaluated anywhere between one and three times for each acll, which would play havoc with expressions like toupper(*p++).

* 결국 제자리로

Instead, he decided to rewrite toupper and tolower as functions. The toupper() function now looked something like this:

int toupper(int c)
{
	if (c >= 'a' && c <= 'z')
    	return c + 'A' - 'a';
        
	return c;
}

* 새롭게 생각해낸 대안

#define _toupper(c) ((c) + 'A' - 'a')
#define _tolower(c) ((c) + 'a' - 'A')

Choose what you'd prefer!

There was just one problem in all this: the people at Berkeley never followed suit, nor did some other C implementers.

오늘도 말 안 듣는 버클리 대학교. 이래서 표준이 많은 것 같기도 하고...

- 10. Free first, then reallocate?

The seventh edition of the reference manual for the UNIX system described slightly different behavior:

Realloc changes the size of the block pointed to by ptr to size bytes and returns a pointer to the (possibly moved) block. The contents will be unchanged up to the lesser of the new and old size.

Realloc also works if ptr points to a block freed since the last call of malloc, realloc, or calloc; thus sequences of free, malloc, and realloc can exploit the search strategy of malloc to do storage compaction.

In other words, this implementatoin allowed a memory area to be reallocated after it had been freed, as long as that reallocation was done quickly enough.

Needless to say, this technique is not recommended, if only because not all C implementations preserve memory long enough after it has been freed. However, the Seventh Edition manual leaves one thing unstated: an earlier implementation of realloc actually required that the area given to it for reallocatoin be freed first.

요약: free() 한거 건들지마라. 하지 말라면 하지 말라고 이 XXXX야.
진짜 당연한 말이지만 Undefined Behavior 이다.

- 11. An example of portability problems

void printnum(long n, void (*p)())
{
	if (n < 0) {
    	(*p)('-');
        n = -n;
    }
    
    if (n >= 10)
    	printnum(n/10, p);
        
	(*p) ((int) (n % 10) + '0');
}

This program, for all its simplicity, has several portability problems.

Convert the low-order decimal digit of n to character form. (ANSI C 이래로, decimal digit 문자에 대해선 그 값이 1씩 증가해야 함을 guarantee 하고 있다.)
n = -n might overflow, because 2's complement machines generally allow more negative values than positive values to be represented.

void printneg(long n, void (*p)())
{
	if (n <= -10)
    	printneg(n/10, p);
        
	(*p)("0123456789"[-(n % 10)]);
}

void printnum(long n, void (*p)())
{
	if (n < 0) {
    	(*p)('-');
        printneg(n, p);
    } else
    	printneg(-n, p);
}

This still doesn't quite work. We have used n/10 and n%10 to represent the leading digits and the trailing digit of n (with suitable sign changes).

void printneg(long n, void (*p)())
{
	long q;
    int r;
    
    q = n / 10;
    r = n % 10;
    if (r > 0) {
    	r -= 10;
        q++;
    }
    
    if (n <= -10)
    	printneg(q, p);

	(*p)("0123456789"[-r]);
}

This looks like a lot of work to cater to portability.

two's complement 는 단 한번도 생각해본적이 없는데... 지식이 늘었다.

3. Exercise

https://www.mythos-git.com/Cruzer-S/CTAP/-/tree/main/Chapter07

4. References

[Book] C Traps and Pitfalls (Andrew Koenig)
[Book] C: A Reference Manual 5/e (Samuel P. Harbison III, Guy L. Steele Jr.)
[Book] C Programming: A Modern Approach 2/e (K.N.King)
[Site] ISO/IEC 9899:1999, 5.2.1/3 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf)
[Site] Cambridge Dictionary, https://dictionary.cambridge.org/
[Site] 네이버 영한사전, https://dict.naver.com/

문연수

2000.11.30

이전 포스트

[CTAP] 6# The Preprocessor

다음 포스트