CS:APP Chapter 11 Summary 🌐
Recently, I’ve been studying CS:APP - I’m posting my own summary of chapter 11 that I wrote up using Notion.
11.1 The Client-Server Programming Model
11.1 The Client-Server Programming Model
Every network application is based on the client-server model:
- An application consists of a server process and one or more client processes.
- A server manages some resource, and it provides some service for its clients by manipulating that resource.
- A client-server transaction consists of four steps:
- A client initiates a transaction by sending a request to the server. (example : A Web browser sends a request to a Web server.)
- The server receives the request, interprets it, and manipulates its resources in the appropriate way. (example : When a Web server receives a request, it reads a disk file.)
- The server sends a response to the client and then waits for the next request. (example : A Web server sends the file back to a client.)
- The client receives the response and manipulates it. (example : After a Web browser receives a page from the server, it displays it on the screen.)
- A single host can run many different clients and servers concurrently.
11.2 Networks
11.2 Networks
LAN and Ethernet
An adapter plugged into an expansion slot on the I/O bus provides the physical interface to the network.
Data received from the network are copied from the adapter across the I/O and memory buses into memory, typically by a DMA transfer.
Physically, a network is a hierarchical system that is organized by geographical proximity. At the lowest level is a LAN (local area network) that spans a building or a campus. (example : Ethernet)
- Ethernet segment
- consists of some wires and a small box called a hub.
- A hub copies every bit that it receives on each port to every other port. - Every host sees every bit.
- One end of each wire is attached to an adapter on a host, and the other end is attached to a port on the hub.
- Each Ethernet adapter has a globally unique 48-bit address stored in a nonvolatile memory on the adapter.
- A host can send a chunk of bits called a frame to any other host on the segment. Every host adapter sees the frame, but only the destination host actually reads it.
Each frame includes:
- fixed number of header bits that identify the source and destination of the frame
- the frame length
- a payload of data bits.
- Bridged Ethernet
- Multiple Ethernet segments can be connected into larger LANs, called bridged Ethernets, using a set of wires and small boxes called bridges.
- Bridges automatically learn over time which hosts are reachable from which ports, and then selectively copy frames from one port to another only when it’s necessary. → can make better use of the available wire bandwidth than hubs.
- Router
- At a higher level in the hierarchy, multiple incompatible LANs can be connected by specialized computers called routers to form an internet.
- Routers can also connect to networks known as WANs (wide area networks), which can span larger geographical areas than LANs.
- In general, routers can be used to build internets from arbitrary collections of LANs and WANs.
- Protocol software
- Protocol software running on each host and router smoothes out the differences between the different networks.
- Protocol software implements a protocol that governs how hosts and routers cooperate in order to transfer data.
- The protocol must provide a naming scheme:
- The internet protocol defines a uniform format for host addresses.
- Then, each host is assigned at least one of these internet addresses that uniquely identifies it.
- The protocol must provide a delivery mechanism:
- The internet protocol defines a uniform way to bundle up data bits into discrete chunks called packets.
- A packet consists of:
- a header which contains the packet size and addresses of the source and destination hosts.
- a payload which contains data bits sent from the source host.
- Example - Hosts and routers using the internet to transfer data
- The client on host A invokes a system call that copies the data from the client’s VAS into a kernel buffer.
- The protocol software on host A creates LAN1 frame by appending an internet header and a LAN1 frame header to the data (encapsulation):
- The internet header is addressed to internet host B.
- The LAN1 frame header is addressed to the router.
- The LAN1 adapter copies the frame to the network.
- The frame reaches the router → the router’s LAN1 adapter reads it from the wire and passes it to the protocol software.
- The router creates a new LAN2 frame:
- The router fetches the destination internet address from the internet packet header.
- The router uses this as an index into a routing table to determine where to forward the packet. (LAN2)
- The router strips off the old LAN1 frame buffer, and prepends a new LAN2 frame header addressed to host B.
- The router passes the resulting frame to the adapter.
- The router’s LAN2 adapter copies the frame to the network.
- The frame reaches host B → its adapter reads the frame from the wire and passes it to the protocol software.
- The protocol software on host B strips off the packet header and frame header.
The protocol copy the resulting data into the server’s VAS when the server invokes a system call that reads the data.
- Ethernet segment
11.3 The Global IP Internet
11.3 The Global IP Internet
The Global IP Internet
The global IP Internet is the most famous and successful implementation of an internet.
Each Internet host runs software that implements the TCP/IP protocol, which is a family of protocols, each of which contributes different capabilities.
Internet is a worldwide collection of hosts with the following properties:
- The set of hosts is mapped to a set of 32-bit IP addresses.
- The set of IP addresses is mapped to a set of identifiers called Internet domain names.
- A process on one Internet host can communicate with a process on any other Internet host over a connection.
IP Addresses -
htonl
,ntohl
,inet_pton
,inet_ntop
An IP address is an unsigned 32-bit integer. Network programs store IP addresses in the IP address structure.
/* IP address structure */ struct in_addr { uint32_t s_addr; /* Address in network byte order (big-endian) */ };
- TCP/IP defines a uniform network byte order (big-endian byte order) for any integer data item like an IP address.
- Unix provides some functions for converting between network and host byte order:
#include <arpa/inet.h> uint32_t htonl(uint32_t hostlong); uint16_t htons(uint16_t hostshort); uint32_t ntohl(uint32_t netlong); uint16_t ntohs(uint16_t netshort);
htonl
converts an unsigned 32-bit integer from host byte order to network byte order.
ntohl
converts an unsigned 32-bit integer from network byte to host byte order.
htons
andntohs
perform corresponding conversions for unsigned 16-bit integers.
- IP addresses are presented to human in a dotted-decimal notation, where each byte is represented by its decimal value and separated from the other bytes by a period.
(example : 0x8002c2f2 → 128.2.210.175)
- Application programs can convert IP addresses ↔ dotted-decimal strings using
inet_pton
andinet_ntop
.#include <arpa/inet.h> int inet_pton(AF_INET, const char *src, void *dst); const char *inet_ntop(AF_INET, const void *src, char *dst, socklen_t size);
inet_pton
converts a dotted-decimal string (src
) to a binary IP address in network byte order (dst
).
inet_ntop
converts a binary IP address in network byte order (src
) to the corresponding dotted-decimal representation and copies at mostsize
bytes of the resulting null-terminated string todst
.
Internet Domain Names
The Internet defines a set of human-friendly domain names and a mechanism that maps the set of domain names to the set of IP addresses.
- The set of domain names forms a hierarchy, and each domain name encodes its position in the hierarchy.
- The hierarchy is represented as a tree:
- The first level in the hierarchy is an unnamed root node.
- The second level is a collection of first-level domain names that are defined by a nonprofit organization called ICANN. (example :
com
,edu
,gov
,org
, andnet
)
- The third level are second-level domain names, which are assigned by various authorized agents of ICANN. (example :
cmu.edu
)
- Once an organization has received a second-level domain name, it’s free to create any other new domain name within its subdomain. (example :
cs.cmu.edu
)
- The mapping between the set of domain names and the set of IP addresses has been maintained in a distributed worldwide database known as DNS.
- Linux
NSLOOKUP
program displays the IP addresses associated with a domain name.- Each Internet host has the locally defined domain name
localhost
, which always maps to the loopback address127.0.0.1
:linux> nsllokup localhost
→Address: 127.0.0.1
We can use
HOSTNAME
to determine the real domain name of our local host:linux> hostname
→whaleshark.ics.cs.cmu.edu
- In the most general case, multiple domain names are mapped to the same set of multiple IP addresses.
- Each Internet host has the locally defined domain name
Internet Connections
- Connection
- Internet clients and servers communicate by sending and receiving streams of bytes over connections.
- A connection is a point-to-point : it connects a pair of processes.
- A connection is full duplex : data can flow in both directions at the same time.
- A connection is reliable : the stream of bytes sent by the source process is eventually received by the destination process in the same order it was sent.
- Socket
- A socket is an end point of a connection.
- Each socket has a socket address that consists of an Internet address and a 16-bit integer port and is denoted by the notation
address:port
.
- The port in the client’s socket address is assigned automatically by the kernel when the client makes a connection request. (ephemeral port)
- The port in the server’s socket address is well-known port that is permanently associated with the service. (example : Web servers - 80, email servers - 25)
- A well-known service name is associated with each service with a well known port. (example :
http
- Web service,smtp
- email)
- Identifying a connection
- A connection is uniquely identified by the socket addresses of its two end points. - A socket pair (cliaddr:cliport, servaddr:servport)
(example : (128.2.194.242:51213, 208.216.181.15:80))
- A connection is uniquely identified by the socket addresses of its two end points. - A socket pair (cliaddr:cliport, servaddr:servport)
- Connection
11.4 The Sockets Interface
11.4 The Sockets Interface
The sockets interface is a set of functions that are used with the Unix I/O functions to build network applications.
Socket Address Structures
Internet socket addresses are stored in 16-byte structures having the type
sockaddr_in
:/* IP socket address structure */ struct sockaddr_in { uint16_t sin_family; /* Protocol family (always AF_INET) */ uint16_t sin_port; /* Port number in network byte order */ struct in_addr sin_addr; /* IP address in network byte order */ unsigned char sin_zero[8]; /* Pad to sizeof(struct sockaddr) */ }; /* Generic socket address structure (for connect, bind, and accept) */ struct sockaddr { uint16_t sa_family; /* Protocol family */ char sa_data[14]; /* Address data */ };
- For Internet applications, the
sin_family
field is always AF_INET, thesin_port
field is a 16-bit port number in network byte order, and thesin_addr
field contains a 32-bit IP address in network byte order.
connect
,bind
, andaccept
expect a pointer to a genericsockaddr
structure → applications are required to cast any pointers to protocol-specific structures to generic structure.
- We define the following type:
typedef struct sockaddr SA;
- For Internet applications, the
The
socket
FunctionClients and servers use the
socket
to create a socket descriptor.#include <sys/types.h> #include <sys/socket.h> int socket(int domain, int type, int protocol);
- If we wanted the socket to be the end point for a connection, we call
socket
with hardcoded arguments :clientfd = Socket(AF_INET, SOCK_STREAM, 0);
- AF_INET indicates that we are using 32-bit IP addresses.
- SOCK_STREAM indicates that the socket will be an end point for a connection.
- The best practice is to use the
getaddrinfo
to generate parameters automatically.
- The
clinetfd
descriptor returned bysocket
is only partially opened and can’t be used for reading and writing yet.
- If we wanted the socket to be the end point for a connection, we call
The
connect
FunctionA client establishes a connection with a server by calling the
connect
.#include <sys/socket.h> int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);
- The
connect
attempts to establish an Internet connection with the server at socket addressaddr
, whereaddrlen
issizeof(sockaddr_in)
.
- The
connect
blocks until either the connection is successfully established or an error occurs:- successful → the
clientfd
is ready for reading and writing, and the resulting connection is characterized by the socket pair(x:y, addr.sin_addr:addr.sin_port)
.
- successful → the
- The best practice is to use the
getaddrinfo
to generate parameters automatically.
- The
The
bind
Functionbind
,listen
, andaccept
are used by servers to establish connections with clients.#include <sys/socket.h> int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
bind
asks the kernel to associate the server’s socket address inaddr
with the socket descriptorsockfd
.
- The best practice is to use the
getaddrinfo
to generate parameters automatically.
The
listen
Function#include <sys/socket.h> int listen(int sockfd, int backlog); /* Returns 0 if OK, -1 o n error */
- Clients are active entities that initiate connection requests. ↔ Servers are passive entities that wait for connection requests from clients.
- The kernel assumes that a descriptor created by the
socket
corresponds to an active socket that will live on the client end of a connection. → A server calls thelisten
to tell the kernel that the descriptor will be used by a server.
listen
convertssockfd
from an active socket to a listening socket that can accept connection requests from clients.
The
accept
FunctionServers wait for connection requests from clients by calling the
accept
function.#include <sys/socket.h> int accept(int listenfd, struct sockaddr *addr, int *addrlen);
accept
waits for a connection request from a client to arrive on the listening descriptorlistenfd
, then fills in the client’s socket address inaddr
, and returns a connected descriptor that can be used to communicate with the client using Unix I/O functions.
- Listening descriptor vs. Connected descriptor
- Listening descriptor serves as an end point for client connection requests, is created once, and exists for the lifetime of the server.
- Connected descriptor is the end point of the connection that is established between the client and the server, is created each time the server accepts a connection request, and exists only as long as it takes the server to service a client.
- Listening descriptor and connected descriptors are distinguished to allow to build concurrent servers that can process many client connections simultaneously.
Host and Service Conversion -
getaddrinfo
,getnameinfo
getaddrinfo
converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures.#include <sys/types.h> #include <sys/socket.h> #include <netdb.h> int getaddrinfo(const char *host, const char *service, const struct addrinfo *hints, struct addrinfo **result); void freeaddrinfo(struct addrinfo *result); const char *gai_strerror(int errcode);
- Given
host
andservice
,getaddrinfo
returns aresult
that points to a linked list ofaddrinfo
structures, each of which points to a socket address structure that corresponds tohost
andservice
.struct addrinfo { int ai_flags; /* Hints argument flags */ int ai_family; /* First arg to socket function */ int ai_socktype; /* Second arg to socket function */ int ai_protocol; /* Third arg to socket function */ char *ai_canonname; /* Canonical hostname */ size_t ai_addrlen; /* Size of ai_addr struct */ struct sockaddr *ai_addr; /* Ptr to socket address structure */ struct addrinfo *ai_next; /* Ptr to next item in linked list */ };
- To avoid memory leaks, the application must free the list by calling
freeaddrinfo
after using the result.
host
argument can be either a domain name or a numeric address (example : a dotted-decimal IP address).
service
argument can be either a service name (example :http
) or a decimal port number.
- If we’re not interested in converting the hostname or to an address, we can set
host
to NULL. The same holds forservice
, while at least one of them must be specified.
hints
is anaddrinfo
structure that provides finer control over theaddrinfo
structure thatgetaddrinfo
returns:- When passed as a
hints
argument, only theai_family
,ai_socktype
,ai_protocol
, andai_flags
fields can be set, while other fields must be set to zero. In practice, we usememset
to zero the entire structure and then set a selected fields.
ai_family
- By default,
getaddrinfo
return both IPv4 and IPv6 socket addresses.
- set to AF_INET → restricts the list to IPv4 addresses.w
- set to AF_INET6 → restricts the list to IPv6 addresses.
- By default,
ai_socktype
- By default,
getaddrinfo
can return up to 3addrinfo
structures.
- set to SOCK_STREAM → restricts the list to at most one
addrinfo
structure for each unique address.
- By default,
ai_flags
- We can put flags to
ai_flags
by bitwise OR-ing to modify the default behavior.
- AI_ADDRCONFIG : asks
getaddrinfo
to return IPv4 addresses only if the local host is configured for IPv4. Similarly for IPv6.
- AI_CANONNAME : By default,
ai_canonname
is NULL. If this flag is set,getaddrinfo
point theai_cannonname
in the firstaddrinfo
structure to the canonical name ofhost
.
- AI_NUMERICSERV : By default,
service
can be a service name or a port number. This flag forces theservice
argument to be a port number.
- AI_PASSIVE : By default,
getaddrinfo
returns socket addresses that can be used by clients as active sockets. This flag instructs it to return socket addresses that can be used by servers as listening sockets. In this case, thehost
should be NULL. The address field in the resulting socket address structures will be wildcard address, which tells the kernel that this server will accept requests to any of the IP addresses for this host.
- We can put flags to
- When passed as a
getaddrinfo
fillsaddrinfo
structure’s each field except forai_flags
:
ai_addr
points to a socket address structure.
ai_addrlen
gives the size of this socket address structure.
ai_next
points to the nextaddrinfo
structure in the list.
- The other fields describe various attributes of the socket address.
The
getnameinfo
converts a socket address structure to the corresponding host and service name strings.#include <sys/socket.h> #include <netdb.h> int getnameinfo(const struct sockaddr *sa, socklen_t salen, char *host, size_t hostlen, char *service, size_t servlen, int flags);
getnameinfo
converts the socket address structuresa
to the corresponding host and service name strings and copies them to thehost
andservice
buffers.
sa
argument points to a socket address structure of sizesalen
bytes,host
to a buffer of sizehostlen
bytes, andservice
to a buffer of sizeservlen
bytes.
- If we don’t want the hostname, we can set
host
to NULL andhostlen
to zero. The same holds forservice
, while at least one of them must be set.
- We can put flags to
flags
by bitwise OR-ing to modify the default behavior.- NI_NUMERICHOST : By default,
getnameinfo
returns a domain name inhost
. This flag forces it to return a numeric address string instead.
- NI_NUMERICSERV : By default,
getnameinfo
return a service name instead of a port number. This flag forces it to return the port number.
- NI_NUMERICHOST : By default,
Example Code -
HOSTINFO
that displays the mapping of a domain name to its associated IP addresses#include "csapp.h" int main(int argc, char **argv) { struct addrinfo *p, *listp, hints; char buf[MAXLINE]; int rc, flags; if (argc != 2) { fprintf(stderr, "usage: %s <domain name>\n", argv[0]); exit(0); } /* Get a list of addrinfo records */ memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_family = AF_INET; /* IPv4 only */ hints.ai_socktype = SOCK_STREAM; /* Connections only */ if ((rc = getaddrinfo(argv[1], NULL, &hints, &listp)) != 0) { fprintf(stderr, "getaddrinfo error: %s\n", gai_strerror(rc)); exit(1); } /* Walk the list and display each IP address */ flags = NI_NUMERICHOST; /* Display address string instead of domain name */ for (p = listp; p; p = p->ai_next) { Getnameinfo(p->ai_addr, p->ai_addrlen, buf, MAXLINE, NULL, 0, flags); printf("%s\n", buf); } /* Clean up */ Freeaddrinfo(listp); exit(0); }
- Given
Helper Functions for the Sockets Interface -
open_clientfd
,open_listenfd
int open_clientfd(char *hostname, char *port) { int clientfd; struct addrinfo hints, *listp, *p; /* Get a list of potential server addresses */ memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_socktype = SOCK_STREAM; /* Open a connection */ hints.ai_flags = AI_NUMERICSERV; /* ... using a numeric port arg. */ hints.ai_flags |= AI_ADDRCONFIG; /* Recommended for connections */ Getaddrinfo(hostname, port, &hints, &listp); /* Walk the list for one that we can successfully connect to */ for (p = listp; p; p = p->ai_next) { /* Create a socket descriptor */ if ((clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0) continue; /* Socket failed, try the next */ /* Connect to the server */ if (connect(clientfd, p->ai_addr, p->ai_addrlen) != -1) break; /* Success */ Close(clientfd); /* Connect failed, try another */ } /* Clean up */ Freeaddrinfo(listp); if (!p) /* All connects failed */ return -1; else /* The last connect succeeded */ return clientfd; }
- The
open_clientfd
establishes a connection with a server running on hosthostname
and listening for connection requests on port numberport
.
- The
open_clientfd
returns an open socket descriptor that is ready for input and output using the Unix I/O functions.
- The arguments to
socket
andconnect
are generated automatically bygetaddrinfo
→ There is no dependence on any particular version of IP.
int open_listenfd(char *port) { struct addrinfo hints, *listp, *p; int listenfd, optval = 1; /* Get a list of potential server addresses */ memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_socktype = SOCK_STREAM; /* Accept connections */ hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; /* ... on any IP address */ hints.ai_flags |= AI_NUMERICSERV; /* ... using port number */ Getaddrinfo(NULL, port, &hints, &listp); /* Walk the list for one that we can bind to */ for (p = listp; p; p = p->ai_next) { /* Create a socket descriptor */ if ((listenfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0) continue; /* Socket failed, try the next */ /* Eliminates "Address already in use" error from bind */ Setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, (const void *)&optval, sizeof(int)); /* Bind the descriptor to the address */ if (bind(listenfd, p->ai_addr, p->ai_addrlen) == 0) break; /* Success */ Close(listenfd); /* Bind failed, try the next */ } /* Clean up */ Freeaddrinfo(listp); if (!p) /* No address worked */ return -1; /* Make it a listening socket ready to accept connection requests */ if (listen(listenfd, LISTENQ) < 0) { Close(listenfd); return -1; } return listenfd; }
open_listenfd
returns a listening descriptor that is ready to receive connection requests on portport
.
- We have called
getaddrinfo
with the AI_PASSIVE flag and a NULLhost
argument → the address field in each socket address structure is set to the wildcard address, which tells the kernel that this server will accept requests to any of the IP addresses for this host.
- The
Example Echo Client and Server
Echo client main routine
#include "csapp.h" int main(int argc, char **argv){ int clientfd; char *host, *port, buf[MAXLINE]; rio_t rio; if (argc != 3) { fprintf(stderr, "usage: %s <host> <port>\n", argv[0]); exit(0); } host = argv[1]; port = argv[2]; clientfd = Open_clientfd(host, port); Rio_readinitb(&rio, clientfd); while (Fgets(buf, MAXLINE, stdin) != NULL) { Rio_writen(clientfd, buf, strlen(buf)); Rio_readlineb(&rio, buf, MAXLINE); Fputs(buf, stdout); } Close(clientfd); exit(0); }
- The loop terminates when
fgets
encounters EOF on standard input, either because the user typed Ctrl+D or because it has exhausted the text lines in a redirected input file.
- After the loop terminates, the client closes the descriptor. → EOF notification is sent to the server.
Iterative echo server main routine
#include "csapp.h" void echo(int connfd) { size_t n; char buf[MAXLINE]; rio_t rio; Rio_readinitb(&rio, connfd); while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) { printf("server received %d bytes\n", (int)n); Rio_writen(connfd, buf, n); } } int main(int argc, char **argv) { int listenfd, connfd; socklen_t clientlen; struct sockaddr_storage clientaddr; /* Enough space for any address */ char client_hostname[MAXLINE], client_port[MAXLINE]; if (argc != 2) { fprintf(stderr, "usage: %s <port>\n", argv[0]); exit(0); } listenfd = Open_listenfd(argv[1]); while (1) { clientlen = sizeof(struct sockaddr_storage); connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen); Getnameinfo((SA *) &clientaddr, clientlen, client_hostname, MAXLINE, client_port, MAXLINE, 0); printf("Connected to (%s, %s)\n", client_hostname, client_port); echo(connfd); Close(connfd); } exit(0); }
sockaddr_storage
structure,clientaddr
’s type, is large enough to hold any type of socket address. → The code is protocol-independent.
- This server can only handle one client at a time. → A server of this type that iterates through clients one at a time is called an iterative server.
- Client’s EOF notification is detected when the server receives a return of zero from its
rio_readlineb
inecho.
- The loop terminates when
Additional Section - EOF in several contexts
There is no such thing as an EOF character - EOF is a condition that is detected by the kernel.
- An application finds out about the EOF condition when it receives a zero return code from the
read
function.
- For disk files, EOF occurs when the current file position exceeds the file length.
- For Internet connections, EOF occurs when a process closes its end of the connection. The process at the other end detects the EOF when it attempts to read pass the last byte in the stream.
- An application finds out about the EOF condition when it receives a zero return code from the
11.5 Web Servers
11.5 Web Servers
Web Basics
Web clients and servers interact using a text-based application-level protocol known as HTTP (hypertext transfer protocol):
- A Web client (= browser) opens an Internet connection to a server and requests some content.
- The server responds with the requested content and then closes the connection.
- The browser reads the content and displays it on the screen.
Web services differs from conventional file retrieval services for that Web content can be written in a language known as HTML.
Web Content
To Web clients and servers, content is a sequence of bytes with an associated MIME (multipurpose internet mail extensions) type.
Web servers provide content to clients in two different ways:
- Serving static content : Fetch a disk file (static content) and return its contents to the client.
- Serving dynamic content : Run an executable and return its output (dynamic content) to the client.
Each files returned by a Web server has a unique name known as a URL (universal resource locator).
http://www.google.com:80/index.html
identifies an HTML file called/index.html
on Internet hostwww.google.com
that is managed by a Web server listening on port 80.
- The port number is optional and defaults to the HTTP port 80.
- URLs for executable files can include program arguments after the filename. -
?
separates the filename from the arguments, and each argument is separated by an&
.
- Clients and servers use different parts of the URL during a transaction:
- A client uses the prefix
http://www.google.com:80
to determine what kind of server to contact, where the server is, and what port it is listening on.
- A server uses the suffix
/index.html
to find the file on its filesystem and to determine whether the request is for static or dynamic content.
- A client uses the prefix
HTTP Transactions
We can use the Linux TELNET program to conduct transactions with any Web server on the Internet.
- Line 1 : We run TELNET and ask it to open a connection to the AOL Web server.
- Line 2 ~ 4 : TELNET opens the connection, and waits for us to enter text.
- Line 5 ~ 7 : We enter an HTTP request.
- Each time we enter a text line and hit the enter key, TELNET reads the line, appends carriage return and line feeds characters (’\r\n’), and sends the line to the server.
- Line 8 : 17 : The server replies with an HTTP response.
- Line 18 : The server closes the connection.
An HTTP request consists of a request line, zero or more request headers, and an empty text line that terminates the list of headers.
- A request line
- A request line has the form
method URI version
.
method
: HTTP supports a number of different methods, includingGET
,POST
,OPTIONS
,HEAD
,PUT
,DELETE
, andTRACE
.
URI
(uniform resource identifier) : the suffix of the corresponding URL that includes the filename and optional arguments.
version
: indicates the HTTP version to which the request conforms.
- A request line has the form
- Request headers
- Request headers provide additional information to the server. (example : brand name of the browser, the MIME types that the browser understands, …)
- Request headers have the form
header-name: header-data
. (example :Host: www.aol.com
)
- The empty text line
- The empty text line terminates the headers and instructs the server to send the requested HTML file.
HTTP Responses consists of a response line, zero or more response headers, an empty line that terminates the headers, and the response body.
- A response line
- A response line has the form
version status-code status-message
.
version
: describes the HTTP version that the response conforms to.
status-code
: a three-digit positive integer that indicates the disposition of the request.
status-message
: the English equivalent of the error code.
- A response line has the form
- Response headers
- Response headers provide additional information about the response. (example :
Content-Type
(the MIME type of the content),Content-Length
(the content’s size in bytes))
- Response headers provide additional information about the response. (example :
- The empty text line
- The empty text line terminates the headers, followed by the response body.
- The response body
- The response body contains the requested content.
Serving Dynamic Content
- Client passing program arguments to the server
- Argument for
GET
requests are passed in the URI. (example :GET /cgi-bin/adder?15000&213 HTTP
)
- Special characters like spaces must be represented with special encoding. (example :
%20
for space)
- Argument for
- Server passing arguments to the child
- The server receives the request :
GET /cgi-bin/adder?15000&213 HTTP
- The server calls
fork
to create a child process.
- The child process sets the CGI environment variable
QUERY_STRING
to15000&213
.
- The server calls
execve
to run the/cgi-bin/adder
program in the context of the child.
- The
adder
program can reference the arguments using the Linuxgetenv
function.
- The server can also pass other information to the child process using other CGI environment variables.
- The server receives the request :
- Child sending its output to the client
- A CGI program sends its dynamic content to the standard output.
- The child process uses the Linux
dup2
to redirect standard output to the connected descriptor before loading and running the CGI program.
- The child is responsible for generating the
Content-type
andContent-length
response headers, and the empty line that terminates the headers.
- Example Code - a simple CGI program
adder
#include "csapp.h" int main(void) { char *buf, *p; char arg1[MAXLINE], arg2[MAXLINE], content[MAXLINE]; int n1=0, n2=0; /* Extract the two arguments */ if ((buf = getenv("QUERY_STRING")) != NULL) { p = strchr(buf, '&'); *p = '\0'; strcpy(arg1, buf); strcpy(arg2, p+1); n1 = atoi(arg1); n2 = atoi(arg2); } /* Make the response body */ sprintf(content, "Welcome to add.com: "); sprintf(content, "%sTHE Internet addition portal.\r\n<p>", content); sprintf(content, "%sThe answer is: %d + %d = %d\r\n<p>", content, n1, n2, n1 + n2); sprintf(content, "%sThanks for visiting!\r\n", content); /* Generate the HTTP response */ printf("Content-length: %d\r\n", (int)strlen(content)); printf("Content-type: text/html\r\n\r\n"); printf("%s", content); fflush(stdout); exit(0); }
- Client passing program arguments to the server
11.6 Putting It Together: The TINY
Web Server
11.6 Putting It Together: The TINY
Web Server
TINY
is an iterative server that listens for connection requests on the port that is passed in the command line.TINY
opens a listening socket by callingopen_listenfd
. (Line 29)
TINY
executes the typical infinite server loop, repeatedly accepting a connection request. (Line 32)
TINY
performs a transaction. -doit
(Line 36)
TINY
closes its end of the connection. (Line 37)
doit
handles one HTTP transaction.doit
reads and parses the request line. (Line 11-14)
TINY
only supports the GET method. If another method is requested,doit
sends an error message. -clienterror
(Line 15-19)
doit
reads and ignores any request headers. (Line 20)
doit
parses URI into a filename and a possibly CGI argument string, and sets a flag indicating whether the request is for static or dynamic content. -parse_uri
(Line 23)
- If the file doesn’t exist,
doit
sends an error message. (Line 24-28)
- If the file exists,
doit
verifies that the file is a regular file and that we have read permission. (Line 31)
doit
serves the static contents and dynamic contents respectively. -serve_static
,serve_dynamic
(Line 36, 44)
clienterror
sends an HTTP response to the client with the appropriate status code and status message in the response line, along with an HTML file in the response body that explains the error to the browser’s user.
read_requesthdrs
:TINY
doesn't use any of the information in the request headers →read_requesthdrs
simply reads and ignores them.
parse_uri
TINY
assumes that the home directory for static content is its current directory and that the home directory for executables is./cgi-bin
. Also, the default filename is./home.html
.
- If the request is for static content,
parseline
clear the CGI argument string and then converts the URI into a relative Linux pathname. If the URI ends with a ‘/’, it appends the default filename.
- If the request is for dynamic content,
parseline
extract any CGI arguments and converts the remaining portion of the URI to a relative Linux filename.
serve_static
TINY
serves five common types of static content: HTML, unformatted text, GIF, PNG, and JPEG.
serve_static
determine the file type by inspecting the suffix in the filename. (Line 7)
serve_static
sends the response line and response headers to the client. (Line 8-13)
serve_static
opensfilename
for reading and gets its descriptorsrcfd
. (Line 18)
serve_static
maps the requested file to a VM area usingmmap
. (Line 19)
serve_static
closes the file - we no longer need its descriptor. (Line 20)
serve_static
performs the transfer of the file to the client usingrio_writen
.rio_writen
copies thefilesize
bytes starting at locationscrp
to the client’s connected descriptor. (Line 21)
serve_static
frees the mapped VM area.
serve_dynamic
serve_dynamic
sends a response line and the header.
serve_dynamic
forks a new child process. (Line 11)
- The child initializes the QUERY_STRING environment variable with the CGI arguments from the request URI. (Line 13)
- The child redirects the child’s standard output to the connected file descriptor using
dup2
. (Line 14)
- The child loads and runs the CGI program. Everything that the CGI program writes to standard output goes directly to the client process. (Line 15)
- The parent blocks in a call to
wait
, waiting to reap the child when it terminates. (Line 17)
Leave a comment