[ Network ] Ch02. Application Layer

38A·2023년 10월 10일

네트워크

목록 보기

2/3

2.1 Principles of network applications

conceptual andimplementation aspects of application-layer protocols
- transport-layer service models → why? app. layer 혼자 service하지 못함
- client-server paradigm
- peer-to-peer paradigm

Creating a network app

Write programs that:
- run on (different) end systems
- communicate over network
- e.g., Web: web server software communicates with browser software
No need to write software for network-core devices
- network-core devices do not run user applications
- Not function at app. layer
- applications on end systems allows for rapid app development

Client-server architecture

Server → Scalability 확장성 문제
- always-on host
- permanent IP addr.
- often in data centers, for scailing
Client
- communicate with server
- may be intermittently(간헐적으로) connected
- may have dynamic IP addresses
- do not communicate directly with each other

Peer-peer (P2P) architecture!

no always-on server
arbitrary end systems directly communicate
peers request service from other peers, provide service in return to other peers
- self scalability – new peers bring new service capacity, as well as new service demands
peers are intermittently(간헐적으로 ON/OFF) connected and may change IP addresses → NAT
IP → public, private
- complex management

Processes communicating

process: program running within a host
- within same host, two processes communicate using inter-process communication (defined by OS)
- processes in different hosts communicate by exchanging messages
clients, servers
- client process: process that initiates communication
- server process: process that waits to be contacted
note: applications with P2P architectures have client processes & server processes

Sockets

socket interface:
- located between application and TCP, UDP and other protocol stacks (common interface)
- A process sends/receives messages to/from its socket
Addressing processes
- Host device has unique 32-bit IP address (IPv4)
- Q: does the IP address of host on which process runs suffice for sidentifying the process?
- A: No, many processes can be running on same host.
  To receive messages, process must have an identifier ( port # )
- Port number associated with process on host.
- Ex.: port numbers on the server side:
  - HTTP server: 80
  - Mail server: 25 → well-known port #
- Ex: To send HTTP message to gaia.cs.umass.edu web server:
- IP address: 128.119.245.12, port number: 80

Application-layer protocol defines:

Types of messages exchanged
- Ex: request, response
Syntax of message
- what fields in messages & how fields are delineated(구별)
Semantics of a field → Ex_ 0이면 ~ 의미
- meaning of information in fields
Rules for when and how processes send & respond to messages
Open protocols:
- defined in RFCs, everyone has access to protocol definition
- allows for interoperability
- Ex.: HTTP, SMTP
Proprietary protocols:
- Ex.: Skype, Zoom

Transport layer service

Reliable data transfer
- some apps (e.g., file transfer, web transactions) require 100% reliable data transfer
- other apps (e.g., audio) can tolerate some loss
Timing
- some apps (e.g., Internet telephony, interactive games) require low delay to be “effective”
Throughput
- some apps (e.g., multimedia) require minimum amount of throughput to be “effective”
- other apps (“elastic apps”) make use of whatever throughput they get
Security
- Confidentiality
- Data integrity
- Authentication

Internet transport protocols services

TCP service:
- reliable transport between sending and receiving process
  → No error, Error recovery 해줌
- flow control: sender won’t overwhelm receiver
  → receiver's buffer가 넘치지 않게
- congestion control: throttle sender when network overloaded
- connection-oriented: setup required between client and server processes
- does not provide: timing, minimum throughput guarantee, security
UDP service: → User Datagram
- unreliable data transfer between sending and receiving process
  → 언제든 Loss가 생길 수 있음
- does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup.
Q: why UDP? Later ..

Securing TCP

Vanilla TCP & UDP sockets: → standard
- no encryption
- cleartext passwords sent into socket traverse Internet in cleartext (!)
Transport Layer Security (TLS) (or SSL (Secure Socket Layer)) → 요즘은 다 TLS
- provides encrypted TCP connections
- Data confidentiality and end-point authentication
- TLS at app layer
  - apps use TSL libraries, that use TCP in turn
- DTLS (Datagram TLS) for UDP

2.2 Web and HTTP

web page consists of objects
- object can be HTML file, JPEG image, Java applet, audio file,...
web page consists of base HTML-file which includes several referenced objects
- Hypertext / hypermedia system: information is organized as a set of documents (objects)
- Each object is addressed by a uniform resource locator (URL)

HyperText Transfer Protocol (HTTP)

Web’s application-layer protocol
- Defines how Web clients request pages from Web servers and how servers transfer Web pages to clients
client / server model:
- client: browser that requests, receives, (using HTTP protocol) and “displays” Web objects
- server: Web server sends (using HTTP protocol) objects in response to requests
Standards → 왜 바뀌었는지 중심으로 보기 !
- HTTP/1.0: RFC 1945 (in 1996),
- HTTP/1.1: RFC 2068 (in 1997)
  - RFC 2616 (1999), RFC 7230 (2014), RFC 9112 (2022)
- HTTP/2: RFC 7540 (2015)
  - RFC9113 (2022) (Proposed Standard)
- HTTP/3: RFC 9114 (2022) (Proposed Standard)
HTTP uses TCP as its underlying transport protocol
- client initiates TCP connection (creates socket) to server,
  - Well-known server port: 80
  - HTTPS server port: 443
- server accepts TCP connection from client
- HTTP messages (application-layer protocol messages) are exchanged between browser (HTTP client) and Web server (HTTP server)
- TCP connection is closed
HTTP is a ⭐️"stateless protocol" ↔️ stateful : Ex_ cd → list
- server maintains no information about past client requests
- protocols that maintain “state” are complex!
  - past history (state) must be maintained → table
  - if server/client crashes, their views of “state” may be inconsistent, must be reconciled

HTTP-TCP connections: two types

Non-persistent(비지속) HTTP → 주로 사용. why? 생각하기
- At most one object is sent over a TCP connection
- HTTP/1.0 uses non-persistent
- but browsers often open parallel TCP connections (동시에 connect) to fetch referenced objects
  → 빨리 정보를 받을 수 있다. But 서버에 부담이 커짐 (connect # ⬆️)
- Response time
  - RTT (definition): time for a small packet to travel from client to server and back
    → Round Trip Time
  - HTTP response time (per object):
    - one RTT to initiate TCP connection
    - one RTT for HTTP request and first few bytes of HTTP response to return
    - object / file transmission time
  - Non-persistent HTTP response time = 2RTT + file transmission time
    (+ release time)
Persistent HTTP
- Multiple objects can be sent over single TCP connection between client and server
  - server leaves connection open after sending response
  - subsequent HTTP messages between the same client/server are sent over the same connection
- uses persistent connections
- Pipelining
  - Persistent without pipelining
    - client issues new request only when previous response has been received
    - one RTT for each referenced object
  - Persistent with pipelining
    - default in HTTP/1.1
      - Most browsers turn this feature off → 대부분 사용 X
    - client sends requests as soon as it encounters a referenced object
    - as little as one RTT for all the referenced objects

HTTP Message Format

two types of HTTP messages: request, response

HTTP request messageRestful protocol : CRUD ( + Notification : Client 요청 없이 event 발생시 server가 notify )

C → POST, R → GET, U → PUT, D → DELETE

참고
- POST method:
  - web page often includes input-form
  - user input sent from client to server in entity body of HTTP POST request message
- GET method → Trend
  - Instead of the POST method, GET method can include user data in URL field of HTTP GET request message (following a ‘?’)
  - www.somesite.com/animalsearch?monkeys&banana → input
- PUT method
  - Upload an object to a specific URL (replaces it if it exists)

HTTP response message

Status code in Response Message
- 3-digit integer that indicates the response to a received request
  - Status phrase gives short textual explanation of the status code
- 200 OK: request succeeded, information returned
- 301 Moved Permanently: requested object moved, new location specified later in this message (Location:) → 다른 곳으로 옮겨짐
- 400 Bad Request: syntax error in request
- 404 Not Found: requested document does not exist on this server
- 505 Version Not Supported:
- 1XX : info, 2XX : success, 3XX : redirection, 4XX : client error, 5XX : server error

Cookies

: Maintaining user/server state → like stateful

HTTP is a stateless protocol
- Server forgets about each client as soon as it sends response.
no notion of multi-step exchanges of HTTP messages to complete a Web “transaction”
- no need for client/server to track “state” of multi-step exchange
- all HTTP requests are independent of each other
- no need for client/server to “recover” from a partially-completed-but-never- completely-completed transaction
Issues to stateless behavior
- When a Web site wants to identify users
- When the server wishes to restrict(제한) user access
- When the server wants to serve content as a function of the user identity
Web sites and client browser use cookies to maintain some state between transactions
- A cookie is a short piece of data, not an executable code, and can not directly harm the machine
Four components:
1) cookie header line set-cookie of HTTP response message
2) cookie header line cookie in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site → 해석하기 위한 DB
Problem in privacy

Web cache (Proxy Server)

An intermediary entity that satisfies HTTP requests on the behalf of an origin Web server
- User browsers must be configured so that all requests are first directed to its Web cache
Browser sends all HTTP requests to cache
- if object in cache: cache returns object to client
- else cache requests object from origin server, caches received object, then returns object to client
Why Web caching ?
- Advantages
  - reduce response time for client request
    - cache is closer to client
  - ⭐️ reduce traffic on an institution’s access link
- Disadvantages
  - Lower performance for objects that are not cached

Example

Option 1: buy a faster access link

Option 2: install a web cache

Calculating access link utilization, end-end delay with cache:→ 문제점 : Cache data와 origin data가 다를 수 있음 ➡️ Conditional GET

Conditional GET

→ Access link traffic을 줄임

Problem: an object in the cache might be stale → Web $와 original이 다름
Goal: don’t send object if cache has old version
Solution: conditional GET
- client: specify date of cached copy in HTTP request
  → If_modified-since: <date>
- server: response contains no object if cached copy is up-to-date:
  → HTTP/1.1 304 Not Modified

HTTP/2

HTTP1.1 problems: in multiple, pipelined GETs over single TCP connection →
- server responds in-order (FCFS: first-come-first-served scheduling) to GET requests
- with FCFS, small object may have to wait for transmission
  (head-of-line (HOL) blocking : 앞에서 막고 있어서 뒤쪽 waiting) behind large object(s) → 여러개 connection 이용
- loss recovery (retransmitting lost TCP segments) stalls object transmission → loss 발생시 재전송 요청, recovery 될때까지 나머지 waiting
Focused on performance.
Supports multiplexing to mitigates(완화) HOL blocking.
- When a server wants to send an HTTP response, the response is processed by the framing sub-layer, where it is broken down into frames.
- The frames of the response are then interleaved by the framing sub-layer in the server with the frames of other responses and sent over the single persistent TCP connection.
A binary format.
Header suppression (HPACK: RFC7541)
- HPACK compression mechanism Huffman coding & index based
Supports stream prioritization.
Supports server push mechanism. ↔️ pull : 요청하지 않아도 요청할것 같은 정보 제공. document 안에 포함된 object를 request 없이 제공 → 쓸데없는 정보를 제공할 수 있음

Multiplexing

mitigating HOL blocking

HTTP 1.1: client requests 1 large object (e.g., video file) and 3 smaller objects
HTTP/2: objects divided into frames, frame transmission interleaved

Stream prioritization

HTTP/2 to HTTP/3

HTTP/2 → TLS 1.2
- HTTP/2 over single TCP connection means:
  - recovery from packet loss still stalls all object transmissions
  - as in HTTP 1.1, browsers have incentive to open multiple parallel TCP connections to reduce stalling, increase overall throughput
- no security over vanilla TCP connection → TLS 암호화 overhead : 다 암호화시키기 때문
HTTP/3: → TLS 1.3
- HTTP over QUIC(Quick UDP Internet Connection)
- HTTP/2 was adjusted in a few key areas to make it compatible with QUIC. This tweaked version was eventually named HTTP/3
- The main features for HTTP/3 (faster connection set-up, less HoL blocking, connection migration, and so on) are really all coming from QUIC. → IP가 바뀌어도 conn. 유지 Ex_ wifi → 4G/5G

2.3 E-mail, SMTP

Three major components:
- user agents: a.k.a. “mail reader” such as outlook
- mail servers
  - mailbox contains incoming messages for user
  - message queue of outgoing (to be sent) mail messages
- simple mail transfer protocol: SMTP
  - client: sending mail server
  - server: receiving mail server

SMTP

Uses TCP to reliably transfer email message from client (mail server initiating connection) to server on port 25
- Three phases of transfer in SMTP
Command/response interaction
- commands: ASCII text → cf. HTTP request
- response: status code and phrase
The message (header & body(payload)) must be in 7-bit ASCII → cf. HTTP ASCII must X
SMTP uses persistent connections
- Can send several messages over the same TCP connection
- direct transfer: sending server to receiving server
SMTP: client push protocol(sender) (HTTP: client pull protocol(reader))

MIME extension

Multi-purpose Internet Mail Extensions (MIME)
- For non-ASCII data
- Additional lines in msg header declare MIME content type

Mail access protocols

Mail access protocol: retrieval from server
- POP3: Post Office Protocol 3 [RFC 1939]
  - authorization phase (agent <-->server) and transaction phase (download)
- IMAP: Internet Mail Access Protocol [RFC 3501]: → POP3보다 더 간단함
  - more features (more complex)
  - provide the folder functionality
- HTTP: gmail, Hotmail, Yahoo!Mail, etc.
  - provides web-based interface
  - Access to messages is provided with scripts that run in an HTTP server; the scripts use the IMAP (or POP) protocol to communicate with IMAP (or POP) server. → 껍데기만 HTTP, 내용은 기존과 동일

2.4 DNS ( Domain Name System )

Name & Address
- Name
  - Character string for human use, e.g. www.naver.com
  - Mnemonic
- Address used to identify a host in a packet
  - IP address (32 bits for IPv4, 128 bits for IPv6)
Q: how to map between IP address and name, and vice versa ? • Mapping a name to an address or an address to a name is called name-address resolution.
Name-address resolution
- Solution 1:
  - Hostname to IP address mapping file (hosts file). (ARPANET) → 127.0.0.1 loop addr.
- Solution 2:
  - The Internet has too many objects for a single management center → centralized system : 하나 죽으면 마비
  - Uses distributed database system
    - Scalability, maintenance
  - Partition the name space into a hierarchical tree
    - Domain hierarchy
    - Millions of different organizations responsible for their records
The meanings of DNS
- A distributed database implemented in a hierarchy of DNS servers
- An application-layer protocol that allows hosts to query the DNS servers to resolve hostnames (hostname-IP address translation)
  - Runs over UDP → why? query, response 하면 끝이기 때문에 3-way handshaking 시간이 더 크기 때문에
  - Server port number: 53

The tree can have up to 128 levels
- level 0 (root) to level 127. 63 characters/level
Partition the hierarchy into subtree called zones.
- Each zone can be thought of as corresponding to some administrative authority(관리하는 주체) that is responsible for that portion of the hierarchy

DNS : Distributed, hierarchical database

Client wants IP address for www.amazon.com; 1st approximation:
- client queries root server to find .com DNS server
- client queries .com DNS server to get amazon.com DNS server
- client queries amazon.com DNS server to get IP address for www.amazon.com

Root DNS servers
- 13 root server (A-M) (server farm) organizations in the Internet
  - www.root-servers.org
- More than 1000 root servers scattered all over the world
- Provides the IP addresses of the TLD servers
Top-Level Domain (TLD) servers:
- Responsible for .com, .org, .net, .edu, .aero, .jobs, .museums, and all top-level country domains, e.g.: .cn, .uk, .fr, .ca, .kr
- Provide The IP addresses for authoritative DNS servers.
Authoritative DNS servers:
- Organization’s own DNS server(s), providing authoritative hostname to IP mappings for organization’s named hosts
- Can be maintained by organization(단체) or service provider

Local DNS name servers

When host($ 존재) makes DNS query, it is sent to its local DNS server(여기에도 $ 존재)
- Local DNS server returns reply, answering:
  - from its local cache of recent name-to-address translation pairs
    - Windows: ipconfig /displaydns, ipconfig /flushdns(지우기)
  - forwarding request into DNS hierarchy for resolution
- Each ISP such as a residential ISP or an institutional ISP has local DNS name server:
  - Can find your local DNS server: Windows: ipconfig /all (in DNS server info)
Local DNS server doesn’t strictly belong to hierarchy (않을수도 있다)

DNS name resolution: iterated query

→ 대부분 iterated query 사용

DNS name resolution: recursive query

DNS Caching

Once (any) name server learns mapping, it caches mapping, and immediately returns a cached mapping in response to a query
- caching improves response time
- cache entries timeout (disappear) after some time (TTL : Time To Live, 생명시간)
- TLD servers typically cached in local name servers
Cached entries may be out-of-date
- if named host changes IP address, may not be known Internet-wide until all TTLs expire!
- best-effort name-to-address translation!
update / notify mechanisms proposed IETF standard
- RFC 2136

DNS records

type=A
- name is hostname
- value is IPv4 address(32bits)
type=NS
- name is domain (e.g., foo.com)
- value is hostname of authoritative name server for this domain
  → e.g. dns.cs.umass.edu
type=CNAME
- name is alias name for some “canonical” (the real) name
- www.ibm.com is really servereast.backup2.ibm.com
- value is canonical name(원래 name)
type=MX
- value is name of SMTP mail server associated with name
type=AAAA
- name is hostname
- value is IPv6 address (128 bits)
type=TXT
- name is host name
- value is arbitrary human-readable text

DNS protocol messages

Getting your info into the DNS

Example: new startup “Network Utopia”
- This is done through a registrar, a commercial entity accredited(인증을 받은) by ICANN.
- A registrar first verifies that the requested domain name is unique and then enters it into the DNS database.
  - Need to provide a registrar with names and IP addresses of your authoritative name server (primary and secondary)
  - Registrar inserts two RRs into the com TLD server:
    (networkutopia.com, dns1.networkutopia.com, NS)
    (dns1.networkutopia.com, 212.212.212.1, A)
- Create authoritative server locally with IP address 212.212.212.1
  - type A record for www.networkuptopia.com
  - type MX record for networkutopia.com

DNS security

DDoS attacks
- bombard root servers with traffic → traffic 양을 증가시켜서 query 서비스를 막음
  - BW flooding (Ping attack) in 2002
    - Little damage
    - by traffic filtering
    - local DNS servers cache IPs of TLD servers, allowing root server bypass
- bombard TLD servers
  - potentially more dangerous
  - DNS lookup request in 2016
Spoofing attacks
- A man-in-the-middle attack
- intercept DNS queries, returning bogus(wrong) replies
  - DNS cache poisoning
  - RFC 4033: DNSSEC authentication services

2.5 P2P applications

Every node is both a client and a server
- Peers request service from other peers, provide service in return to other peers
Self scalability(확장성) – new peers bring new service capacity, and new service demands
Peers are autonomous(독자적으로)
- Peers are intermittently(간헐적으로) connected and change IP addresses
Examples
- Fully distributed P2P protocol: Gnutella
- P2P file sharing (BitTorrent)
- streaming (KanKan)
Issues
- Lack of robustness(견고하지 x): due to churn (join and leave)
- Low capability(성능) of each node (peer)
  - Low bandwidth, low performance computer, low uptime(가동시간)
- Poor resource search
  - How to know who has the contents you want.
- NAT traversal → server 역할일 때 public IP addr.와 port가 필요
  - Today almost all PC are connected to the internet via NAT (Network Address Translation) device
    → private IP addr. ↔️ public IP addr.
- Free riding → client 역할만하고 나가버림
- security

File distribution: client-server vs P2P

Q: how much time to distribute file (size F) from one server to N peers?
- peer upload/download capacity is limited resource
server transmission: must sequentially send (upload) N file copies:
- time to send one copy: F/u $_s$
- time to send N copies: NF/u $_s$
client: each client must download file copy
- Client i takes at least F/d $_i$ time to download
- 𝑑 $_{𝑚𝑖𝑛}$ = min $_i$ 𝑑 $_𝑖$ → slowest

File distribution time: P2P

server transmission: must upload at least one copy:
- time to send one copy: F/u $_s$
client: each client must download file copy
- Best download time of a client with the poorest link: F/d $_{min}$
clients: aggregate amount downloaded: NF bits
- max upload rate (assuming all nodes sending file chunks) is u $_s$ + $\sum$ u $_i$

BitTorrent

file divided into 256kB chunks(단위) (typical size)
peers in torrent send/receive file chunks
peer joining torrent:
- registers itself with the tracker and get a list of peers
- periodically informs the tracker that it is still in the torrent
- connects to subset of peers (“neighbors”) in the list
- has no chunks, but will accumulate them over time from other peers
while downloading, peer uploads chunks to other peers
- May have no chunks initially, but will accumulate them over time from other peers
peer may change peers with whom it exchanges chunks
- churn: peers may come and go
once peer has entire file, it may (selfishly(이기적으로)) leave or (altruistically(타의적으로)) remain in torrent
Requesting chunks:
- at any given time, different peers have different subsets of file chunks
- periodically, Alice asks each peer for list of chunks that they have
- Alice requests missing chunks from peers,
  - rarest first
Sending chunks: tit-for-tat
- Alice sends chunks to four neighbors currently sending her chunks at highest rate
  - other peers are choked by Alice (do not receive chunks from her)
  - re-evaluate top 4 every 10 secs
- every 30 secs: randomly select another peer, starts sending chunks
  → prevent free riding
  - “optimistically unchoke” this peer
  - newly chosen peer may join top 4

DHT (Distributed Hash Table)

An important subject in P2P field
A distributed P2P database
- Distributes data among a set of nodes (peers) according to predefined rules
- ex.: Chord

2.6 Video streaming and content distribution networks

Stream video traffic: major consumer of Internet bandwidth
- Netflix, YouTube, Amazon Prime: 80% of residential(가정용) ISP traffic (2020)
challenge1: scale - how to reach ~1B(10억) users?
- single mega-video server won’t work
challenge2: heterogeneity(다양성)
- different users have different capabilities (e.g., wired versus mobile; bandwidth rich versus bandwidth poor)
solution: distributed(challenge1), application-level(challenge2) infrastructure

Multimedia: video

video: sequence of images displayed at constant rate
- e.g., 24(영화관. flim수⬇️. 불을 꺼서 잔상효과를 오래가게하여서 착시)(or 30) frames/sec
digital image: array of pixels
- each pixel represented by bits
encoding: use redundancy(중복) within and between images to decrease # bits used to encode image
- spatial (within image)
- temporal (from one image to next)
- Trade off video quality with bit rate
CBR: (constant bit rate): video encoding rate fixed
VBR: (variable bit rate): video encoding rate changes as amount of spatial, temporal coding changes → 성능이 좋지만 network 상황에 영향을 받음
Examples:
- MPEG 1 (CD-ROM) 1.5 Mbps
- MPEG2 (DVD) 3-6 Mbps
- MPEG4 (often used in Internet, 64Kbps – 12 Mbps)
Can create multiple versions of the same video, each at a different quality level. Users can then decide which version they want to watch as a function of their current available bandwidth.

Streaming stored video

Main challenges:
- server-to-client bandwidth will vary(변함) over time, with changing network congestion levels (in house, access network, network core, video server)
- packet loss, delay due to congestion will delay playout, or result in poor video quality

challenges

continuous playout constraint: during client video playout, playout timing must match original timing
- ... but network delays are variable (jitter), so will need client-side buffer to match continuous playout constraint
other challenges:
- client interactivity: pause, fast-forward, rewind, jump through video
- video packets may be lost, retransmitted

playout buffering

Streaming over HTTP

→ TCP
→ 가용 대역폭이 달라도 똑같이 인코딩된 비디오를 전송 받는다는 문제가 있다.
이 문제로 인한 HTTP 기반 스트리밍인 DASH(Dynamic Adaptive Streaming over HTTP)가 개발되었다.

Streaming multimedia: DASH

DASH: Dynamic, Adaptive Streaming over HTTP
Server:
- divides video file into multiple chunks
- each chunk stored, encoded at different rates
  - different rate encodings stored in different files
  - files replicated in various CDN nodes
- manifest file: provides URLs for different chunks
Client:
- First requests the manifest file.
  - In MPEG-DASH, called as a MPD (Media Presentation Description)
- periodically measures server-to-client bandwidth
- consulting the manifest, requests one chunk at a time
  - chooses maximum coding rate sustainable given current bandwidth
  - can choose different coding rates at different points in time (depending on available bandwidth at time)
“intelligence” at client: client determines → Client 수가 늘어나면 server 부담 ⬆️하는 문제 해결
- when to request chunk (so that buffer starvation, or overflow does
  not occur)
- what encoding rate to request (higher quality when more bandwidth available)
- where to request chunk (can request from URL server that is “close” to client or has high available bandwidth)
- Streaming video = encoding + DASH + playout buffering

Content distribution networks (CDNs)

challenge: how to stream content (selected from millions of videos) to hundreds of millions of simultaneous users?
option 1: a single, massive data center
- long (and possibly congested) path to distant clients
- A popular video will likely be sent many times over the same communication links → link bandwidth⬆️
- single point of failure: the data center or its link to the Internet → centralized
- ....quite simply: this solution doesn’t scale
Option 2: store/serve multiple copies of videos at multiple geographically distributed sites (CDN)
- Types of CDNs
  - A private CDN: owned by the content provider itself
    - Google’s CDN
  - A third-party CDN: distributes content on behalf of multiple content providers
    - Akamai, Limelight
Server placement strategies in CDNs
- enter deep: push CDN servers deep into many access networks
  - close to users by deploying server clusters in access ISPs all over the world.
  - improve user-perceived delay and throughput by decreasing the number of links and routers between the end user and the CDN server
  - Akamai: 240,000 servers deployed in > 120 countries (2015)
    → 서버 수가 엄청 많아야함
- bring home: → 적은 수의 서버
  - Instead of getting inside the access ISPs, these CDNs typically place their clusters in Internet Exchange Points (IXPs)
  - lower maintenance and management overhead → cheap
  - used by Limelight and many other CDNs

CDN OperationCluster Selection Strategies → 둘다 이슈는 있음

Strategy 1: based on geographical distance
- When a DNS request is received from a particular LDNS(→ local DNS), the CDN authoritative DNS chooses the geographically closest cluster from the LDNS
- Issues
  - Some end-users are configured to use remotely located LDNSs.
    → Client와 LDNS가 멀리 있을 수 있다
  - This strategy ignores the variation in delay and available bandwidth over time of Internet paths, always assigning the same cluster to a particular client.
Strategy 2: based on periodic real-time measurement
- perform periodic real-time measurements of delay and loss performance between their clusters and LDNS.
  - For instance, a CDN can have each of its clusters periodically send probes (for example, ping messages or DNS queries) to all of the LDNSs around the world
- Selects a cluster which has the lowest RTT(→ bandwidth가 넓은(빠른)) while considering the load balance(→ 한쪽으로 몰리지 않게)

Case Study: Netflix

Use the Amazon cloud and its own private CDN
- Amazon cloud
  - The Web site (and its associated backend databases) run entirely on Amazon servers in the Amazon cloud
    - user registration and login, billing,
    - movie catalogue for browsing and searching, and a movie recommendation system
    - Process the movies: create many different formats for each movie, suitable for a diverse array of client video players
    - Upload the versions to its CDN
- Its own private CDN
  - Distributes only video
CDN: stores copies of content (e.g. MADMEN) at CDN nodes
- Push caching on a day-to-day basis(매일)
subscriber requests content, service provider returns manifest
- using manifest, client retrieves content at highest supportable rate
- may choose different rate or copy if network path congested

HGU 전산전자공학부 이종원 교수님의 23-2 컴퓨터 네트워크 수업을 듣고 작성한 포스트이며, 첨부한 모든 사진은 교수님 수업 PPT의 사진 원본에 필기를 한 수정본입니다.

38A

HGU - 개인 공부 기록용 블로그

이전 포스트

[ Network ] Ch01. Introduction

다음 포스트