[TIL] HTTP : The Definitive Guide "p257 ~ p259"

시윤·2일 전

TIL http

[TIL] Two Pages Per Day

목록 보기

106/107

Chapter 11. Client Identification and Cookies

(해석 또는 이해가 잘못된 부분이 있다면 댓글로 편하게 알려주세요.)

✏️ 원문 번역

Preface

Web servers may talk to thousands of different clients simultaneously. These servers often need to keep track of who they are talking to, rather than treating all requests as coming from anonymous clients. This chapter discusses some of the technologies that servers can use to identify who they are talking to.

웹 서버는 수천 개에 달하는 클라이언트와 동시에 통신할 수 있습니다.
익명의 클라이언트로부터 도착하는 모든 요청을 처리하는 대신 서버는 종종 통신 대상이 누구인지 추적해야 할 때가 있습니다.
이번 챕터에서는 서버와의 통신 대상을 식별할 수 있는 몇 가지 기술을 논의하고자 합니다.

The Personal Touch

HTTP began its life as an anonymous, stateless, request/response protocol. A request came from a client, was processed by the server, and a response was sent back to the client. Little information was available to the web server to determine what user sent the request or to keep track of a sequence of requests from the visiting user.

HTTP는 익명의 무상태 요청/응답 프로토콜로 시작되었습니다.
클라이언트로부터 도착한 요청은 서버에 의해 처리된 후 다시 클라이언트로 응답을 반환합니다.
웹 서버가 요청을 보낸 대상을 확인하거나 방문 사용자의 요청 순서를 추적할 수 있는 정보는 거의 없습니다.

Modern web sites want to provide a personal touch. They want to know more about users on the other ends of the connections and be able to keep track of those users as they browse. Popular online shopping sites like Amazon.com personalize their sites for you in several ways:

오늘날 웹 사이트는 개인화된 느낌을 제공하고자 합니다.
엔드 유저에 대해 자세히 파악한 후 브라우징 중인 사용자를 추적하기를 원합니다.
Amazon.com 과 같은 유명 온라인 쇼핑 사이트는 사이트를 개인화하기 위해 몇 가지 방식들을 차용하고 있습니다.

Personal greetings

Welcome messages and page contents are generated specially for the user, to make the shopping experience feel more personal.

개별 인사말

환영 메시지와 페이지 콘텐츠가 사용자에게 맞춰 생성됩니다.
쇼핑 경험이 보다 개인화된 것처럼 느끼게 하기 위함입니다.

Targeted recommendations

By learning about the interests of the customer, stores can suggest products that they believe the customer will appreciate. Stores can also run birthday specials near customers’ birthdays and other significant days.

맞춤 추천

고객의 관심사를 학습하여 고객이 좋아할 만한 상품을 제안할 수 있습니다.
고객의 생일을 비롯한 여러 주요한 날들이 다가올 때 특별한 할인을 제공할 수도 있습니다.

Administrative information on file

Online shoppers hate having to fill in cumbersome address and credit card forms over and over again. Some sites store these administrative details in a database. Once they identify you, they can use the administrative informationon file, making the shopping experience much more convenient.

관리 정보 파일 사용

온라인 쇼핑객은 주소와 신용카드 양식을 여러 번 번거롭게 작성해야 하는 상황을 극도로 싫어합니다.
일부 사이트들은 데이터베이스에 이러한 관리 디테일들을 저장하고 있습니다.
한 번 사용자를 식별하고 나면 관리 정보 파일을 사용하여 쇼핑 경험을 보다 편리하게 만들 수 있습니다.

Session tracking

HTTP transactions are stateless. Each request/response happens in isolation. Many web sites want to build up incremental state as you interact with the site (for example, filling an online shopping cart). To do this, web sites need a way to distinguish HTTP transactions from different users.

세션 추적

HTTP는 무상태 트랜잭션을 수행합니다.
각각의 요청/응답은 개별적으로 발생합니다.
온라인 장바구니를 채우는 것처럼, 많은 웹 사이트들은 상호작용이 이루어질 때 증가하는 상태를 구축하기를 원합니다.
웹 사이트는 다양한 사용자가 전송하는 HTTP 트랜잭션을 구별할 필요가 있습니다.

This chapter summarizes a few of the techniques used to identify users in HTTP. HTTP itself was not born with a rich set of identification features. The early web-site designers (practical folks that they were) built their own technologies to identify users. Each technique has its strengths and weaknesses. In this chapter, we’ll discuss the following mechanisms to identify users:

• HTTP headers that carry information about user identity

• Client IP address tracking, to identify users by their IP addresses

• User login, using authentication to identify users

• Fat URLs, a technique for embedding identity in URLs

• Cookies, a powerful but efficient technique for maintaining persistent identity

해당 챕터는 HTTP에서 사용자를 구분하기 위해 사용되는 몇 가지 기술들을 요약 설명합니다.
HTTP가 그 자체로 풍부한 식별 기능 집합을 가지고 있지는 않았습니다.
초기의 웹 사이트 설계자들은 사용자를 식별하기 위한 기술을 자체적으로 개발하고는 했습니다. 참 실용적인 사람들이었습니다.
각가의 기술들은 장단점이 있었습니다.
이번 챕터에서는 다음과 같은 사용자 식별 매커니즘에 대해 논의합니다.
- HTTP 헤더를 통한 사용자 식별
- 클라이언트 IP 주소 추적을 통한 사용자 식별
- 사용자 로그인 인증을 통한 사용자 식별
- URL에 식별 정보를 임베딩하는 기술
- 식별 정보를 영속적으로 유지하기 위한 강력하고 효율적인 기술인 쿠키

HTTP Headers

Table 11-1 shows the seven HTTP request headers that most commonly carry information about the user. We’ll discuss the first three now; the last four headers are used for more advanced identification techniques that we’ll discuss later.

Table 11-1은 사용자 정보를 실어나르기 위해 가장 흔히 사용되는 일곱 가지 HTTP 요청 헤더를 나타냅니다.
지금 위의 세 가지 헤더에 대해서만 상세히 설명합니다.
나머지 네 가지 헤더는 추후에 이야기할 발전된 형태의 식별 기술에 활용됩니다.

The From header contains the user’s email address. Ideally, this would be a viable source of user identification, because each user would have a different email address. However, few browsers send From headers, due to worries of unscrupulous servers collecting email addresses and using them for junk mail distribution. In practice, From headers are sent by automated robots or spiders so that if something goes astray, a webmaster has someplace to send angry email complaints.

From 헤더는 사용자의 이메일 주소를 포함합니다.
이상적으로 이 정보는 사용자를 식별하기 위한 실용적인 소스가 될 수 있습니다.
사용자마다 서로 다른 이메일 주소를 가지고 있기 때문입니다.
하지만 서버가 악의적으로 이메일 주소를 수집하여 정크 메일을 전송하는 것을 염려하여 From 헤더를 전송하는 브라우저가 거의 없습니다.
실제로 From 헤더는 웹 마스터가 이메일로 항의를 하기 위한 수단으로 자동화된 로봇이나 스파이더에 의해 사용됩니다.

The User-Agent header tells the server information about the browser the user is using, including the name and version of the program, and often information about the operating system. This sometimes is useful for customizing content to interoperate well with particular browsers and their attributes, but that doesn’t do much to help identify the particular user in any meaningful way. Here are two User-Agent headers, one sent by Netscape Navigator and the other by Microsoft Internet Explorer:

Navigator 6.2
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011128

Netscape6/6.2.1
Internet Explorer 6.01
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

User-Agent 헤더는 사용자의 브라우저 정보를 서버에게 전달하는 역할을 수행합니다.
전송되는 정보에는 프로그램의 이름과 버전, 혹은 OS에 대한 정보까지 포함하는 경우도 있습니다.
특정 브라우저와 브라우저의 기능과 상호운용되는 콘텐츠를 맞춤화 하는 데 사용됩니다.
하지만 특정 사용자를 식별하는 데는 그다지 큰 도움을 주지 않습니다.
상단의 내용은 두 가지의 User-Agent 헤더 사용 예시입니다.
하나는 Netscape Navigator에 의해 전송되었으며, 다른 하나는 Internet Explorer에 의해 전송되었습니다.

The Referer header provides the URL of the page the user is coming from. The Referer header alone does not directly identify the user, but it does tell what page the user previously visited. You can use this to better understand user browsing behavior and user interests. For example, if you arrive at a web server coming from a baseball site, the server may infer you are a baseball fan.

Referer 헤더는 사용자가 어떤 페이지에서 넘어왔는지에 대한 URL을 제공합니다.
Referer 헤더만으로는 사용자를 직접적으로 식별할 수 없습니다.
그러나 사용자가 이전에 방문한 페이지가 어디였는지를 분명하게 전달할 수 있습니다.
사용자의 탐색 과정과 관심사를 이해하는 데 유용하게 사용되는 정보입니다.
가령 여러분이 야구 사이트로부터 웹 서버에 도달한다면 서버는 여러분이 야구 팬이라는 사실을 추론할 수 있게 됩니다.

The From, User-Agent, and Referer headers are insufficient for dependable identification purposes. The remaining sections discuss more precise schemes to identify particular users.

From, User-Agent, Referer 헤더는 신뢰할 수 있는 식별 목적으로는 불충분합니다.
나머지 섹션에서는 더 정확한 사용자 식별 기법에 대해 설명합니다.

✏️ 요약

Mechanisms to Identify Users

: HTTP를 통해 사용자를 식별하기 위한 다양한 매커니즘

HTTP 헤더
클라이언트 IP 주소 추적
사용자 로그인 인증
Fat URL :URL에 식별 정보를 임베딩하는 방식
쿠키 :식별 정보를 유지하기 위한 기술

HTTP Headers to Get User Information

: 신뢰할 수 있는 식별 목적이 아닌 사용자 정보 수집을 목적으로 활용 가능한 HTTP 헤더

From : 사용자의 이메일 주소 -> 이상적으로 사용자 식별 정보 수집 도움
User-Agent : 사용자의 브라우저 정보 (프로그램 이름, 버전, OS 정보 등) -> 특정 브라우저와 브라우저의 기능과 상호운용되는 콘텐츠를 맞춤화하는 데 도움
Referer : 사용자가 어떤 페이지에서 넘어왔는지에 대한 URL 정보 -> 사용자의 탐색 과정과 관심사를 이해하는 데 도움

✏️ 감상

User-Agent 악용 전문가

User-Agent 헤더는 사용자의 브라우저 정보를 수집하는 헤더다. 그래서 웹 사이트 관리자들은 robots.txt에 차단할 User-Agent의 목록을 작성해서 무단 크롤링이 발생하지 않게 막는다. 그리고 내가 종종 하는 짓은 User-Agent를 크롬 브라우저로 조작해서 사이트를 크롤링하는 것이다...ㅎㅎ 나쁜 의도는 없다. 기껏 해봐야 연습용 웹 개발이고, 몰래 긁어온 데이터를 영리적으로 사용할 생각은 더더욱 없다.

예전에 같은 과 사람들과 프로젝트를 하나 진행하면서 야구 사이트에서 크롤링을 할 일이 많이 있었다. 냅다 GET 요청을 보내면 아무 응답도 오지 않았기 때문에 이것저것 헤더를 많이 만졌던 기억이 난다. 개인적으로 크롤링 경험이 있으면 HTTP의 요청/응답 방식이나 헤더 같은 것들을 이해하는 데 굉장히 도움이 된다고 생각한다. 코딩하는 사람들이 종종 "백문이 불여일타"라고 하지 않는가. HTTP 글 열심히 읽는 거 물론 좋지만(!) 배운 내용을 써보는 게 기억에 더 잘 남는 것은 사실이다.

그러니 글을 읽는 것에만 그치지 말고 꼭 활용까지 해보도록 하자. 지금부터라도!

시윤

맑은 눈의 다람쥐

이전 포스트

[TIL] HTTP : The Definitive Guide "p252 ~ p253"

다음 포스트