[TIL] HTTP : The Definitive Guide "p343 ~ p345"

시윤·2025년 5월 3일

TIL http

[TIL] Two Pages Per Day

목록 보기

135/153

Chapter 15. Entities and Encodings

(해석 또는 이해가 잘못된 부분이 있다면 댓글로 편하게 알려주세요.)

✏️ 원문 번역

Entity Bodies

The entity body just contains the raw cargo. Any other descriptive information is contained in the headers. Because the entity body cargo is just raw data, the entity headers are needed to describe the meaning of that data. For example, the Content-Type entity header tells us how to interpret the data (image, text, etc.), and the Content-Encoding entity header tells us if the data was compressed or otherwise recoded. We talk about all of this and more in upcoming sections.

Entity Body는 가공되지 않은 내용물 자체만을 포함하고 있습니다.
다른 메타 정보는 헤더에 포함되어 있습니다.
Entity Body의 내용물은 가공되지 않은 데이터이므로 Entity Header를 통해 데이터의 의미를 표현할 필요가 있습니다.
예를 들어 Content-Type Entity Header는 데이터를 해석하는 방식(이미지, 텍스트 등)을 설명하며 Content-Encoding Entity Header는 데이터가 압축 혹은 암호화 되었는지 알려줍니다.
이번 섹션에서는 Entity Header에 대해 자세히 알아보겠습니다.

The raw content begins immediately after the blank CRLF line that marks the end of the header fields. Whatever the content is—text or binary, document or image, compressed or uncompressed, English or French or Japanese—it is placed right after the CRLF.

가공되지 않은 콘텐츠가 헤더 필드의 끝을 명시하는 CRLF 공백 라인 뒤에 곧바로 나타나기 시작합니다.
텍스트든, 바이너리든, 문서든, 이미지든, 압축이 되었든 되지 않았든, 영어든 프랑스어든 일본어든간에 콘텐츠는 CRLF 뒤에 곧바로 위치합니다.

Figure 15-2 shows two examples of real HTTP messages, one carrying a text entity, the other carrying an image entity. The hexadecimal values show the exact contents of the message:

Figure 15-2는 두 가지의 실제 HTTP 메시지 예시를 보여줍니다.
하나는 텍스트로 된 엔티티를 전송하며, 다른 하나는 이미지 엔티티를 전송합니다.
16진수의 값이 메시지의 정확한 콘텐츠를 나타내고 있습니다.

• In Figure 15-2a, the entity body begins at byte number 65, right after the end-of-headers CRLF. The entity body contains the ASCII characters for “Hi! I’m a message!”

Figure 15-2a에서 Entity Body는 헤더의 끝을 나타내는 CRLF 직후에 바이트 번호 65번부터 시작합니다.
Entity Body는 "Hi! I'm a message!"에 해당하는 ASCII 문자를 포함하고 있습니다.

• In Figure 15-2b, the entity body begins at byte number 67. The entity body contains the binary contents of the GIF image. GIF files begin with 6-byte version signature, a 16-bit width, and a 16-bit height. You can see all three of these directly in the entity body.

Figure 15-2b에서 Entity Body는 바이트 번호 67번부터 시작합니다.
Entity Body는 GIF 이미지에 대한 바이너리 콘텐츠를 포함하고 있습니다.
GIF 파일은 6바이트의 version signature와 16비트의 width, 16비트의 height으로 시작됩니다.
Entity Body에서 위의 세 가지 요소를 직접 확인할 수 있습니다.

Content-Length: The Entity's Size

The Content-Length header indicates the size of the entity body in the message, in bytes. The size includes any content encodings (the Content-Length of a gzip-compressed text file will be the compressed size, not the original size).

Content-Length 헤더는 메시지 내부의 Entity Body 크기를 바이트로 표현합니다.
사이즈는 모든 콘텐츠 인코딩 결과를 포함합니다.
즉 gzip으로 압축된 텍스트 파일의 Content-Length는 실제 사이즈가 아닌 압축된 사이즈로 나타납니다.

The Content-Length header is mandatory for messages with entity bodies, unless the message is transported using chunked encoding. Content-Length is needed to detect premature message truncation when servers crash and to properly segment messages that share a persistent connection.

메시지가 청크 인코딩된 형태로 전송되지 않는 한 Content-Length 헤더는 Entity Body를 수반하는 메시지에서 필수적으로 사용됩니다.
Content-Length는 서버에 고장이 발생했을 때 Message Truncation을 감지하고 Persistent Connection을 공유하는 메시지를 적절히 세그먼트화 하기 위해 필요합니다.
(* Message Truncation : 메시지의 일부분만 전달되고 나머지는 잘려나간 경우)

Detecting Truncation

Older versions of HTTP used connection close to delimit the end of a message. But, without Content-Length, clients cannot distinguish between successful connection close at the end of a message and connection close due to a server crash in the middle of a message. Clients need Content-Length to detect message truncation.

구버전의 HTTP는 메시지의 끝을 제한하지 않는 연결 종료 방식을 사용하였습니다.
만약 Content-Length가 없다면 클라이언트는 메시지가 종료된 시점에 성공적으로 연결이 종료되었는지, 전송중에 서버 충돌로 인해 연결이 종료되었는지 구분할 수 없습니다.
클라이언트는 Content-Length를 통해 Message Truncation을 감지해야 합니다.

Message truncation is especially severe for caching proxy servers. If a cache receives a truncated message and doesn’t recognize the truncation, it may store the defective content and serve it many times. Caching proxy servers generally do not cache HTTP bodies that don’t have an explicit Content-Length header, to reduce the risk of caching truncated messages.

Message Truncation은 캐싱 프록시 서버에서 특히 심각한 문제입니다.
캐시가 불완전한 메시지를 받았음에도 Truncation을 인지하지 못하는 경우 결함이 있는 콘텐츠를 저장하고 수차례 제공할 가능성이 있습니다.
일반적으로 캐싱 프록시 서버는 Content-Length 헤더를 명시적으로 사용하지 않은 HTTP Body를 저장하지 않습니다.
잘린 메시지가 저장되는 위험을 줄이기 위함입니다.

Incorrect Content-Length

An incorrect Content-Length can cause even more damage than a missing Content-Length. Because some early clients and servers had well-known bugs with respect to Content-Length calculations, some clients, servers, and proxies contain algorithms to try to detect and correct interactions with broken servers. HTTP/1.1 user agents officially are supposed to notify the user when an invalid length is received and detected.

잘못된 Content-Length는 Content-Length가 누락되었을 때보다 훨씬 더 심각한 문제를 유발합니다.
초기의 일부 클라이언트와 서버는 Content-Length 연산과 관련된 잘 알려진 버그를 가지고 있습니다.
따라서 일부 클라이언트와 서버, 프록시는 손상된 서버와의 상호작용을 감지하고 바로잡기 위한 알고리즘을 포함하고 있습니다.

Content-Length and Persistent Connections

Content-Length is essential for persistent connections. If the response comes across a persistent connection, another HTTP response can immediately follow the current response. The Content-Length header lets the client know where one message ends and the next begins. Because the connection is persistent, the client cannot use connection close to identify the message’s end. Without a Content-Length header, HTTP applications won’t know where one entity body ends and the next message begins.

Content-Length는 Persistent Connection에서 필수적입니다.
응답이 Persistent Connection을 통해 전달되는 경우 현재 응답을 뒤이어 곧바로 또다른 HTTP 응답이 전달될 수 있습니다.
Content-Length 헤더는 클라이언트가 한 메시지의 끝과 다음 메시지의 시작을 파악할 수 있게 합니다.
연결이 영구적이기 때문에 클라이언트가 연결을 종료함으로써 메시지의 끝을 식별하는 것이 불가능합니다.
Content-Length 헤더가 없다면 HTTP 응용 프로그램은 한 Entity Body의 끝과 다음 메시지의 시작을 구분할 수 없게 됩니다.

As we will see in “Transfer Encoding and Chunked Encoding,” there is one situation where you can use persistent connections without having a Content-Length header: when you use chunked encoding. Chunked encoding sends the data in a series of chunks, each with a specified size. Even if the server does not know the size of the entire entity at the time the headers are generated (often because the entity is being generated dynamically), the server can use chunked encoding to transmit pieces of well-defined size.

이후 "Transfer Encoding and Chunked Encoding" 섹션에서는 청크 인코딩을 사용할 때 Content-Length 헤더 없이 Persistent Connection을 사용할 수 있는 한 가지 예외 상황이 등장합니다.
청크 인코딩은 데이터를 지정된 크기로 나누어 일련의 청크 형태로 전달합니다.
헤더가 생성된 시점에 서버가 전체 엔티의 크기를 알지 못하더라도(엔티티가 동적으로 생성되기 때문) 서버는 청크 인코딩을 사용하여 올바르게 정의된 사이즈의 조각들을 전송할 수 있습니다.

Content Encoding

HTTP lets you encode the contents of an entity body, perhaps to make it more secure or to compress it to take up less space (we explain compression in detail later in this chapter). If the body has been content-encoded, the Content-Length header specifies the length, in bytes, of the encoded body, not the length of the original, unencoded body.

HTTP는 Entity Body의 콘텐츠를 암호화 하여 더 안전하게 전송하거나 압축을 통해 공간을 최소한으로 차지할 수 있게 합니다. (나중에 다시 등장합니다)
Body의 콘텐츠가 암호화 되었다면, Content-Length 헤더는 암호화되지 않은 원본 크기가 아닌 암호화된 본문의 크기를 바이트로 표현합니다.

Some HTTP applications have been known to get this wrong and to send the size of the data before the encoding, which causes serious errors, especially with persistent connections. Unfortunately, none of the headers described in the HTTP/1.1 specification can be used to send the length of the original, unencoded body, which makes it difficult for clients to verify the integrity of their unencoding processes.

일부 HTTP 응용 프로그램은 이를 잘못 이해하여 암호화 전에 데이터의 크기를 전송하는 경우가 있습니다.
하지만 이는 심각한 에러를 유발할 수 있습니다. 특히 Persistent Connection에서는 더욱 심각합니다.
HTTP/1.1 명세에 존재하는 어떠한 헤더도 암호화 되지 않은 원본 Body의 길이를 전송할 때 사용할 수 없습니다.
클라이언트가 암호화 해제 프로세스의 무결성을 검증하기 어렵기 때문입니다.

✏️ 요약

Entity Bodies

Raw Data : text/plain, image/gif 등
Entity Header의 끝을 나타내는 CRLF 직후에 바로 나타난다

Content-Length

: Entity Body의 크기를 바이트로 표현하는 헤더 (Compressed & Encoded Size)

Message Truncation 감지에 사용
- 메시지 종료 시점의 정상적인 연결 종료 vs 서버 충돌로 인한 조기 구분 확인
- 프록시가 Truncation을 감지하지 못하는 경우 결함이 있는 데이터를 여러 곳에 전송할 수 있음
Persistent Connection에서 필수적
- 한 메시지의 끝과 다음 메시지의 시작 지점을 파악하는 데 사용
- 단, 청크 인코딩을 사용하는 경우 Content-Length가 포함되지 않을 수 있음

✏️ 감상

Content-Length의 특성을 악용해보자 😈

오늘 글을 읽어보니.. 딱 봐도 해커들이 의도적으로 Content-Length를 틀리게 써서 서버를 공격할 수 있겠구나 싶었습니다. 느낌만으로도 견적이 나오더라구요...ㅋㅋㅋ

실제로 찾아보니 HTTP Request Smuggling과 같은 공격 기법이 있었습니다. Smuggling은 수많은 네트워크 홉이 연결되어 있는 환경에서 각 장치가 요청을 처리할 때 발생하는 차이점을 악용하는 공격이라고 합니다.

이 공격은 Content-Length 헤더와 Transfer-Encoding 헤더 두 가지를 조작하는 방식으로 주로 이루어지는데요..! 아직 Transfer-Encoding 헤더에 대해 자세히 다루지 않긴 했지만, 대충 하나의 데이터를 쪼개서 여러 개의 HTTP Request로 구성할 수 있게 하는 헤더입니다. 서버는 데이터의 끝을 나타내는 0\r\n이 나타나기 전까지를 하나의 데이터로 간주하며, 0\r\n이 나타나지 않은 경우 계속해서 요청을 대기합니다.

만약 Hop1은 Content-Length를 우선적으로 해석하고 Hop2는 Transfer-Encoding을 우선적으로 해석하는 상황이라면 다음과 같은 상황이 발생할 수 있습니다.

Hop1이 Content-Length: 13, Transfer-Encoding: chunked와 함께 Entity Body에 "0\r\nSmuggled\r\n"이라는 데이터를 포함하여 전송한다고 가정해봅시다. Hop1은 Content-Length: 13을 우선시하므로 전체 메시지를 Hop2에 전달할 것입니다.

반면 Hop2는 Transfer-Encoding을 우선시하기 때문에 "0\r\nSmuggled\r\n"에서 "0\r\n"을 확인하고 나서 요청을 마무리합니다. 그럼 뒤에 남은 "Smuggled"라는 텍스트는 처리되지 않은 채 연결에 남아있게 됩니다. Hop2에 다음 요청이 들어오면 요청 메시지가 SmuggledPOST /page HTTP/1.1과 같이 표현됩니다. 즉 이것을 악용하면 요청 메시지를 조작할 수 있다는 뜻입니다.

똑똑한 사람들...