CS:APP Chapter 11 Summary 🌐

Recently, I’ve been studying CS:APP - I’m posting my own summary of chapter 11 that I wrote up using Notion.

Chapter 11 : Network Programming

11.1 The Client-Server Programming Model

Every network application is based on the client-server model:

  • An application consists of a server process and one or more client processes.
  • A server manages some resource, and it provides some service for its clients by manipulating that resource.
  • A client-server transaction consists of four steps:
    1. A client initiates a transaction by sending a request to the server. (example : A Web browser sends a request to a Web server.)
    1. The server receives the request, interprets it, and manipulates its resources in the appropriate way. (example : When a Web server receives a request, it reads a disk file.)
    1. The server sends a response to the client and then waits for the next request. (example : A Web server sends the file back to a client.)
    1. The client receives the response and manipulates it. (example : After a Web browser receives a page from the server, it displays it on the screen.)
  • A single host can run many different clients and servers concurrently.

11.2 Networks

  • LAN and Ethernet

    An adapter plugged into an expansion slot on the I/O bus provides the physical interface to the network.

    Data received from the network are copied from the adapter across the I/O and memory buses into memory, typically by a DMA transfer.

    Physically, a network is a hierarchical system that is organized by geographical proximity. At the lowest level is a LAN (local area network) that spans a building or a campus. (example : Ethernet)

    • Ethernet segment
      • consists of some wires and a small box called a hub.
      • A hub copies every bit that it receives on each port to every other port. - Every host sees every bit.
      • One end of each wire is attached to an adapter on a host, and the other end is attached to a port on the hub.
      • Each Ethernet adapter has a globally unique 48-bit address stored in a nonvolatile memory on the adapter.
      • A host can send a chunk of bits called a frame to any other host on the segment. Every host adapter sees the frame, but only the destination host actually reads it.

        Each frame includes:

        • fixed number of header bits that identify the source and destination of the frame
        • the frame length
        • a payload of data bits.
    • Bridged Ethernet
      • Multiple Ethernet segments can be connected into larger LANs, called bridged Ethernets, using a set of wires and small boxes called bridges.
      • Bridges automatically learn over time which hosts are reachable from which ports, and then selectively copy frames from one port to another only when it’s necessary. → can make better use of the available wire bandwidth than hubs.
    • Router
      • At a higher level in the hierarchy, multiple incompatible LANs can be connected by specialized computers called routers to form an internet.
      • Routers can also connect to networks known as WANs (wide area networks), which can span larger geographical areas than LANs.
      • In general, routers can be used to build internets from arbitrary collections of LANs and WANs.
    • Protocol software
      • Protocol software running on each host and router smoothes out the differences between the different networks.
      • Protocol software implements a protocol that governs how hosts and routers cooperate in order to transfer data.
      • The protocol must provide a naming scheme:
        • The internet protocol defines a uniform format for host addresses.
        • Then, each host is assigned at least one of these internet addresses that uniquely identifies it.
      • The protocol must provide a delivery mechanism:
        • The internet protocol defines a uniform way to bundle up data bits into discrete chunks called packets.
        • A packet consists of:
          • a header which contains the packet size and addresses of the source and destination hosts.
          • a payload which contains data bits sent from the source host.
      • Example - Hosts and routers using the internet to transfer data
        1. The client on host A invokes a system call that copies the data from the client’s VAS into a kernel buffer.
        1. The protocol software on host A creates LAN1 frame by appending an internet header and a LAN1 frame header to the data (encapsulation):
          • The internet header is addressed to internet host B.
          • The LAN1 frame header is addressed to the router.
        1. The LAN1 adapter copies the frame to the network.
        1. The frame reaches the router → the router’s LAN1 adapter reads it from the wire and passes it to the protocol software.
        1. The router creates a new LAN2 frame:
          1. The router fetches the destination internet address from the internet packet header.
          1. The router uses this as an index into a routing table to determine where to forward the packet. (LAN2)
          1. The router strips off the old LAN1 frame buffer, and prepends a new LAN2 frame header addressed to host B.
          1. The router passes the resulting frame to the adapter.
        1. The router’s LAN2 adapter copies the frame to the network.
        1. The frame reaches host B → its adapter reads the frame from the wire and passes it to the protocol software.
        1. The protocol software on host B strips off the packet header and frame header.

          The protocol copy the resulting data into the server’s VAS when the server invokes a system call that reads the data.

11.3 The Global IP Internet

  • The Global IP Internet

    The global IP Internet is the most famous and successful implementation of an internet.

    Each Internet host runs software that implements the TCP/IP protocol, which is a family of protocols, each of which contributes different capabilities.

    Internet is a worldwide collection of hosts with the following properties:

    • The set of hosts is mapped to a set of 32-bit IP addresses.
    • The set of IP addresses is mapped to a set of identifiers called Internet domain names.
    • A process on one Internet host can communicate with a process on any other Internet host over a connection.
  • IP Addresses - htonl, ntohl, inet_pton, inet_ntop

    An IP address is an unsigned 32-bit integer. Network programs store IP addresses in the IP address structure.

    /* IP address structure */
    struct in_addr {
    	uint32_t s_addr; /* Address in network byte order (big-endian) */
    };
    • TCP/IP defines a uniform network byte order (big-endian byte order) for any integer data item like an IP address.
    • Unix provides some functions for converting between network and host byte order:
      #include <arpa/inet.h>
      
      uint32_t htonl(uint32_t hostlong);
      uint16_t htons(uint16_t hostshort);
      
      uint32_t ntohl(uint32_t netlong);
      uint16_t ntohs(uint16_t netshort);
      • htonl converts an unsigned 32-bit integer from host byte order to network byte order.
      • ntohl converts an unsigned 32-bit integer from network byte to host byte order.
      • htons and ntohs perform corresponding conversions for unsigned 16-bit integers.
    • IP addresses are presented to human in a dotted-decimal notation, where each byte is represented by its decimal value and separated from the other bytes by a period.

      (example : 0x8002c2f2 → 128.2.210.175)

    • Application programs can convert IP addresses ↔ dotted-decimal strings using inet_pton and inet_ntop.
      #include <arpa/inet.h>
      
      int inet_pton(AF_INET, const char *src, void *dst);
      
      const char *inet_ntop(AF_INET, const void *src, char *dst, socklen_t size);
      • inet_pton converts a dotted-decimal string (src) to a binary IP address in network byte order (dst).
      • inet_ntop converts a binary IP address in network byte order (src) to the corresponding dotted-decimal representation and copies at most size bytes of the resulting null-terminated string to dst.
  • Internet Domain Names

    The Internet defines a set of human-friendly domain names and a mechanism that maps the set of domain names to the set of IP addresses.

    • The set of domain names forms a hierarchy, and each domain name encodes its position in the hierarchy.
    • The hierarchy is represented as a tree:
      • The first level in the hierarchy is an unnamed root node.
      • The second level is a collection of first-level domain names that are defined by a nonprofit organization called ICANN. (example : com, edu, gov, org, and net)
      • The third level are second-level domain names, which are assigned by various authorized agents of ICANN. (example : cmu.edu)
      • Once an organization has received a second-level domain name, it’s free to create any other new domain name within its subdomain. (example : cs.cmu.edu)
    • The mapping between the set of domain names and the set of IP addresses has been maintained in a distributed worldwide database known as DNS.
    • Linux NSLOOKUP program displays the IP addresses associated with a domain name.
      • Each Internet host has the locally defined domain name localhost, which always maps to the loopback address 127.0.0.1:

        linux> nsllokup localhostAddress: 127.0.0.1

        We can use HOSTNAME to determine the real domain name of our local host:

        linux> hostnamewhaleshark.ics.cs.cmu.edu

      • In the most general case, multiple domain names are mapped to the same set of multiple IP addresses.
  • Internet Connections
    • Connection
      • Internet clients and servers communicate by sending and receiving streams of bytes over connections.
      • A connection is a point-to-point : it connects a pair of processes.
      • A connection is full duplex : data can flow in both directions at the same time.
      • A connection is reliable : the stream of bytes sent by the source process is eventually received by the destination process in the same order it was sent.
    • Socket
      • A socket is an end point of a connection.
      • Each socket has a socket address that consists of an Internet address and a 16-bit integer port and is denoted by the notation address:port.
      • The port in the client’s socket address is assigned automatically by the kernel when the client makes a connection request. (ephemeral port)
      • The port in the server’s socket address is well-known port that is permanently associated with the service. (example : Web servers - 80, email servers - 25)
      • A well-known service name is associated with each service with a well known port. (example : http - Web service, smtp - email)
    • Identifying a connection
      • A connection is uniquely identified by the socket addresses of its two end points. - A socket pair (cliaddr:cliport, servaddr:servport)

        (example : (128.2.194.242:51213, 208.216.181.15:80))

11.4 The Sockets Interface

The sockets interface is a set of functions that are used with the Unix I/O functions to build network applications.

  • Socket Address Structures

    Internet socket addresses are stored in 16-byte structures having the type sockaddr_in:

    /* IP socket address structure */
    struct sockaddr_in {
    	uint16_t sin_family; /* Protocol family (always AF_INET) */
      uint16_t sin_port; /* Port number in network byte order */
    	struct in_addr sin_addr; /* IP address in network byte order */
    	unsigned char sin_zero[8]; /* Pad to sizeof(struct sockaddr) */
    };
    
    /* Generic socket address structure (for connect, bind, and accept) */
    struct sockaddr {
    	uint16_t sa_family; /* Protocol family */
    	char sa_data[14]; /* Address data */
    };
    • For Internet applications, the sin_family field is always AF_INET, the sin_port field is a 16-bit port number in network byte order, and the sin_addr field contains a 32-bit IP address in network byte order.
    • connect, bind, and accept expect a pointer to a generic sockaddr structure → applications are required to cast any pointers to protocol-specific structures to generic structure.
    • We define the following type: typedef struct sockaddr SA;
  • The socket Function

    Clients and servers use the socket to create a socket descriptor.

    #include <sys/types.h>
    #include <sys/socket.h>
    
    int socket(int domain, int type, int protocol);
    • If we wanted the socket to be the end point for a connection, we call socket with hardcoded arguments :

      clientfd = Socket(AF_INET, SOCK_STREAM, 0);

      • AF_INET indicates that we are using 32-bit IP addresses.
      • SOCK_STREAM indicates that the socket will be an end point for a connection.
    • The best practice is to use the getaddrinfo to generate parameters automatically.
    • The clinetfd descriptor returned by socket is only partially opened and can’t be used for reading and writing yet.
  • The connect Function

    A client establishes a connection with a server by calling the connect.

    #include <sys/socket.h>
    
    int connect(int clientfd, const struct sockaddr *addr, socklen_t addrlen);
    • The connect attempts to establish an Internet connection with the server at socket address addr, where addrlen is sizeof(sockaddr_in).
    • The connect blocks until either the connection is successfully established or an error occurs:
      • successful → the clientfd is ready for reading and writing, and the resulting connection is characterized by the socket pair (x:y, addr.sin_addr:addr.sin_port).
    • The best practice is to use the getaddrinfo to generate parameters automatically.
  • The bind Function

    bind, listen, and accept are used by servers to establish connections with clients.

    #include <sys/socket.h>
    
    int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
    • bind asks the kernel to associate the server’s socket address in addr with the socket descriptor sockfd.
    • The best practice is to use the getaddrinfo to generate parameters automatically.
  • The listen Function
    #include <sys/socket.h>
    
    int listen(int sockfd, int backlog); /* Returns 0 if OK, -1 o n error */
    • Clients are active entities that initiate connection requests. ↔ Servers are passive entities that wait for connection requests from clients.
    • The kernel assumes that a descriptor created by the socket corresponds to an active socket that will live on the client end of a connection. → A server calls the listen to tell the kernel that the descriptor will be used by a server.
    • listen converts sockfd from an active socket to a listening socket that can accept connection requests from clients.
  • The accept Function

    Servers wait for connection requests from clients by calling the accept function.

    #include <sys/socket.h>
    
    int accept(int listenfd, struct sockaddr *addr, int *addrlen);
    • accept waits for a connection request from a client to arrive on the listening descriptor listenfd, then fills in the client’s socket address in addr, and returns a connected descriptor that can be used to communicate with the client using Unix I/O functions.
    • Listening descriptor vs. Connected descriptor
      • Listening descriptor serves as an end point for client connection requests, is created once, and exists for the lifetime of the server.
      • Connected descriptor is the end point of the connection that is established between the client and the server, is created each time the server accepts a connection request, and exists only as long as it takes the server to service a client.
      • Listening descriptor and connected descriptors are distinguished to allow to build concurrent servers that can process many client connections simultaneously.
  • Host and Service Conversion - getaddrinfo, getnameinfo

    getaddrinfo converts string representations of hostnames, host addresses, service names, and port numbers into socket address structures.

    #include <sys/types.h>
    #include <sys/socket.h>
    #include <netdb.h>
    
    int getaddrinfo(const char *host, const char *service, const struct addrinfo *hints, struct addrinfo **result);
    
    void freeaddrinfo(struct addrinfo *result);
    
    const char *gai_strerror(int errcode);
    • Given host and service, getaddrinfo returns a result that points to a linked list of addrinfo structures, each of which points to a socket address structure that corresponds to host and service.
      struct addrinfo {
      	int ai_flags; /* Hints argument flags */
      	int ai_family; /* First arg to socket function */
      	int ai_socktype; /* Second arg to socket function */
      	int ai_protocol; /* Third arg to socket function */
      	char *ai_canonname; /* Canonical hostname */
      	size_t ai_addrlen; /* Size of ai_addr struct */
      	struct sockaddr *ai_addr; /* Ptr to socket address structure */
      	struct addrinfo *ai_next; /* Ptr to next item in linked list */
      };
    • To avoid memory leaks, the application must free the list by calling freeaddrinfo after using the result.
    • host argument can be either a domain name or a numeric address (example : a dotted-decimal IP address).
    • service argument can be either a service name (example : http) or a decimal port number.
    • If we’re not interested in converting the hostname or to an address, we can set host to NULL. The same holds for service, while at least one of them must be specified.
    • hints is an addrinfo structure that provides finer control over the addrinfo structure that getaddrinfo returns:
      • When passed as a hints argument, only the ai_family, ai_socktype, ai_protocol, and ai_flags fields can be set, while other fields must be set to zero. In practice, we use memset to zero the entire structure and then set a selected fields.
      • ai_family
        • By default, getaddrinfo return both IPv4 and IPv6 socket addresses.
        • set to AF_INET → restricts the list to IPv4 addresses.w
        • set to AF_INET6 → restricts the list to IPv6 addresses.
      • ai_socktype
        • By default, getaddrinfo can return up to 3 addrinfo structures.
        • set to SOCK_STREAM → restricts the list to at most one addrinfo structure for each unique address.
      • ai_flags
        • We can put flags to ai_flags by bitwise OR-ing to modify the default behavior.
        • AI_ADDRCONFIG : asks getaddrinfo to return IPv4 addresses only if the local host is configured for IPv4. Similarly for IPv6.
        • AI_CANONNAME : By default, ai_canonname is NULL. If this flag is set, getaddrinfo point the ai_cannonname in the first addrinfo structure to the canonical name of host.
        • AI_NUMERICSERV : By default, service can be a service name or a port number. This flag forces the service argument to be a port number.
        • AI_PASSIVE : By default, getaddrinfo returns socket addresses that can be used by clients as active sockets. This flag instructs it to return socket addresses that can be used by servers as listening sockets. In this case, the host should be NULL. The address field in the resulting socket address structures will be wildcard address, which tells the kernel that this server will accept requests to any of the IP addresses for this host.
    • getaddrinfo fills addrinfo structure’s each field except for ai_flags:
      • ai_addr points to a socket address structure.
      • ai_addrlen gives the size of this socket address structure.
      • ai_next points to the next addrinfo structure in the list.
      • The other fields describe various attributes of the socket address.

    The getnameinfo converts a socket address structure to the corresponding host and service name strings.

    #include <sys/socket.h>
    #include <netdb.h>
    
    int getnameinfo(const struct sockaddr *sa, socklen_t salen, char *host, size_t hostlen, char *service, size_t servlen, int flags);
    • getnameinfo converts the socket address structure sa to the corresponding host and service name strings and copies them to the host and service buffers.
    • sa argument points to a socket address structure of size salen bytes, host to a buffer of size hostlen bytes, and service to a buffer of size servlen bytes.
    • If we don’t want the hostname, we can set host to NULL and hostlen to zero. The same holds for service, while at least one of them must be set.
    • We can put flags to flags by bitwise OR-ing to modify the default behavior.
      • NI_NUMERICHOST : By default, getnameinfo returns a domain name in host. This flag forces it to return a numeric address string instead.
      • NI_NUMERICSERV : By default, getnameinfo return a service name instead of a port number. This flag forces it to return the port number.

    Example Code - HOSTINFO that displays the mapping of a domain name to its associated IP addresses

    #include "csapp.h"
    
    int main(int argc, char **argv) 
    {
        struct addrinfo *p, *listp, hints;
        char buf[MAXLINE];
        int rc, flags;
    
        if (argc != 2) {
    	fprintf(stderr, "usage: %s <domain name>\n", argv[0]);
    	exit(0);
        }
    
        /* Get a list of addrinfo records */
        memset(&hints, 0, sizeof(struct addrinfo));                         
        hints.ai_family = AF_INET;       /* IPv4 only */
        hints.ai_socktype = SOCK_STREAM; /* Connections only */
        if ((rc = getaddrinfo(argv[1], NULL, &hints, &listp)) != 0) {
            fprintf(stderr, "getaddrinfo error: %s\n", gai_strerror(rc));
            exit(1);
        }
    
        /* Walk the list and display each IP address */
        flags = NI_NUMERICHOST; /* Display address string instead of domain name */
        for (p = listp; p; p = p->ai_next) {
            Getnameinfo(p->ai_addr, p->ai_addrlen, buf, MAXLINE, NULL, 0, flags);
            printf("%s\n", buf);
        } 
    
        /* Clean up */
        Freeaddrinfo(listp);
    
        exit(0);
    }
  • Helper Functions for the Sockets Interface - open_clientfd, open_listenfd
    int open_clientfd(char *hostname, char *port)
    {
        int clientfd;
        struct addrinfo hints, *listp, *p;
     
        /* Get a list of potential server addresses */
        memset(&hints, 0, sizeof(struct addrinfo));
        hints.ai_socktype = SOCK_STREAM; /* Open a connection */
        hints.ai_flags = AI_NUMERICSERV; /* ... using a numeric port arg. */
        hints.ai_flags |= AI_ADDRCONFIG; /* Recommended for connections */
        Getaddrinfo(hostname, port, &hints, &listp);
     
        /* Walk the list for one that we can successfully connect to */
        for (p = listp; p; p = p->ai_next)
        {
            /* Create a socket descriptor */
            if ((clientfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0)
                continue; /* Socket failed, try the next */
     
            /* Connect to the server */
            if (connect(clientfd, p->ai_addr, p->ai_addrlen) != -1)
                break;       /* Success */
            Close(clientfd); /* Connect failed, try another */
        }
     
        /* Clean up */
        Freeaddrinfo(listp);
        if (!p) /* All connects failed */
            return -1;
        else /* The last connect succeeded */
            return clientfd;
    }
    • The open_clientfd establishes a connection with a server running on host hostname and listening for connection requests on port number port.
    • The open_clientfd returns an open socket descriptor that is ready for input and output using the Unix I/O functions.
    • The arguments to socket and connect are generated automatically by getaddrinfo → There is no dependence on any particular version of IP.
    int open_listenfd(char *port)
    {
        struct addrinfo hints, *listp, *p;
        int listenfd, optval = 1;
     
        /* Get a list of potential server addresses */
        memset(&hints, 0, sizeof(struct addrinfo));
        hints.ai_socktype = SOCK_STREAM;             /* Accept connections */
        hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; /* ... on any IP address */
        hints.ai_flags |= AI_NUMERICSERV;            /* ... using port number */
        Getaddrinfo(NULL, port, &hints, &listp);
     
        /* Walk the list for one that we can bind to */
        for (p = listp; p; p = p->ai_next)
        {
            /* Create a socket descriptor */
            if ((listenfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) < 0)
                continue; /* Socket failed, try the next */
     
            /* Eliminates "Address already in use" error from bind */
            Setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR,
                       (const void *)&optval, sizeof(int));
     
            /* Bind the descriptor to the address */
            if (bind(listenfd, p->ai_addr, p->ai_addrlen) == 0)
                break;       /* Success */
            Close(listenfd); /* Bind failed, try the next */
        }
     
        /* Clean up */
        Freeaddrinfo(listp);
        if (!p) /* No address worked */
            return -1;
     
        /* Make it a listening socket ready to accept connection requests */
        if (listen(listenfd, LISTENQ) < 0)
        {
            Close(listenfd);
            return -1;
        }
        return listenfd;
    }
    • open_listenfd returns a listening descriptor that is ready to receive connection requests on port port.
    • We have called getaddrinfo with the AI_PASSIVE flag and a NULL host argument → the address field in each socket address structure is set to the wildcard address, which tells the kernel that this server will accept requests to any of the IP addresses for this host.
  • Example Echo Client and Server

    Echo client main routine

    #include "csapp.h"
    
    int main(int argc, char **argv){
    	int clientfd;
    	char *host, *port, buf[MAXLINE];
    	rio_t rio;
    
    	if (argc != 3) {
    		fprintf(stderr, "usage: %s <host> <port>\n", argv[0]);
    		exit(0);
    	}
    	host = argv[1];
    	port = argv[2];
    
    	clientfd = Open_clientfd(host, port);
    	Rio_readinitb(&rio, clientfd);
    
    	while (Fgets(buf, MAXLINE, stdin) != NULL) {
    		Rio_writen(clientfd, buf, strlen(buf));
    		Rio_readlineb(&rio, buf, MAXLINE);
    		Fputs(buf, stdout);
    	}
    	Close(clientfd);
    	exit(0);
    }
    • The loop terminates when fgets encounters EOF on standard input, either because the user typed Ctrl+D or because it has exhausted the text lines in a redirected input file.
    • After the loop terminates, the client closes the descriptor. → EOF notification is sent to the server.

    Iterative echo server main routine

    #include "csapp.h"
    
    void echo(int connfd) {
    	size_t n;
    	char buf[MAXLINE];
    	rio_t rio;
    
    	Rio_readinitb(&rio, connfd);
    	while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) {
    		printf("server received %d bytes\n", (int)n);
    		Rio_writen(connfd, buf, n);
    	}
    }
    
    int main(int argc, char **argv) {
    	int listenfd, connfd;
    	socklen_t clientlen;
    	struct sockaddr_storage clientaddr; /* Enough space for any address */
    	char client_hostname[MAXLINE], client_port[MAXLINE];
    	
    	if (argc != 2) {
    		fprintf(stderr, "usage: %s <port>\n", argv[0]);
    		exit(0);
    	}
    
    	listenfd = Open_listenfd(argv[1]);
    	while (1) {
    		clientlen = sizeof(struct sockaddr_storage);
    		connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);
    		Getnameinfo((SA *) &clientaddr, clientlen, client_hostname, MAXLINE, client_port, MAXLINE, 0);
    		printf("Connected to (%s, %s)\n", client_hostname, client_port);
    		echo(connfd);
    		Close(connfd);
    	}
    	exit(0);
    }
    • sockaddr_storage structure, clientaddr’s type, is large enough to hold any type of socket address. → The code is protocol-independent.
    • This server can only handle one client at a time. → A server of this type that iterates through clients one at a time is called an iterative server.
    • Client’s EOF notification is detected when the server receives a return of zero from its rio_readlineb in echo.
  • Additional Section - EOF in several contexts

    There is no such thing as an EOF character - EOF is a condition that is detected by the kernel.

    • An application finds out about the EOF condition when it receives a zero return code from the read function.
    • For disk files, EOF occurs when the current file position exceeds the file length.
    • For Internet connections, EOF occurs when a process closes its end of the connection. The process at the other end detects the EOF when it attempts to read pass the last byte in the stream.

11.5 Web Servers

  • Web Basics

    Web clients and servers interact using a text-based application-level protocol known as HTTP (hypertext transfer protocol):

    1. A Web client (= browser) opens an Internet connection to a server and requests some content.
    1. The server responds with the requested content and then closes the connection.
    1. The browser reads the content and displays it on the screen.

    Web services differs from conventional file retrieval services for that Web content can be written in a language known as HTML.

  • Web Content

    To Web clients and servers, content is a sequence of bytes with an associated MIME (multipurpose internet mail extensions) type.

    Web servers provide content to clients in two different ways:

    • Serving static content : Fetch a disk file (static content) and return its contents to the client.
    • Serving dynamic content : Run an executable and return its output (dynamic content) to the client.

    Each files returned by a Web server has a unique name known as a URL (universal resource locator).

    • The port number is optional and defaults to the HTTP port 80.
    • URLs for executable files can include program arguments after the filename. - ? separates the filename from the arguments, and each argument is separated by an &.
    • Clients and servers use different parts of the URL during a transaction:
      • A client uses the prefix http://www.google.com:80 to determine what kind of server to contact, where the server is, and what port it is listening on.
      • A server uses the suffix /index.html to find the file on its filesystem and to determine whether the request is for static or dynamic content.
  • HTTP Transactions

    We can use the Linux TELNET program to conduct transactions with any Web server on the Internet.

    • Line 1 : We run TELNET and ask it to open a connection to the AOL Web server.
    • Line 2 ~ 4 : TELNET opens the connection, and waits for us to enter text.
    • Line 5 ~ 7 : We enter an HTTP request.
      • Each time we enter a text line and hit the enter key, TELNET reads the line, appends carriage return and line feeds characters (’\r\n’), and sends the line to the server.
    • Line 8 : 17 : The server replies with an HTTP response.
    • Line 18 : The server closes the connection.

    An HTTP request consists of a request line, zero or more request headers, and an empty text line that terminates the list of headers.

    • A request line
      • A request line has the form method URI version.
      • method : HTTP supports a number of different methods, including GET, POST, OPTIONS, HEAD, PUT, DELETE, and TRACE.
      • URI (uniform resource identifier) : the suffix of the corresponding URL that includes the filename and optional arguments.
      • version : indicates the HTTP version to which the request conforms.
    • Request headers
      • Request headers provide additional information to the server. (example : brand name of the browser, the MIME types that the browser understands, …)
      • Request headers have the form header-name: header-data. (example : Host: www.aol.com)
    • The empty text line
      • The empty text line terminates the headers and instructs the server to send the requested HTML file.

    HTTP Responses consists of a response line, zero or more response headers, an empty line that terminates the headers, and the response body.

    • A response line
      • A response line has the form version status-code status-message.
      • version : describes the HTTP version that the response conforms to.
      • status-code : a three-digit positive integer that indicates the disposition of the request.
      • status-message : the English equivalent of the error code.
    • Response headers
      • Response headers provide additional information about the response. (example : Content-Type (the MIME type of the content), Content-Length (the content’s size in bytes))
    • The empty text line
      • The empty text line terminates the headers, followed by the response body.
    • The response body
      • The response body contains the requested content.
  • Serving Dynamic Content
    • Client passing program arguments to the server
      • Argument for GET requests are passed in the URI. (example : GET /cgi-bin/adder?15000&213 HTTP)
      • Special characters like spaces must be represented with special encoding. (example : %20 for space)
    • Server passing arguments to the child
      1. The server receives the request : GET /cgi-bin/adder?15000&213 HTTP
      1. The server calls fork to create a child process.
      1. The child process sets the CGI environment variable QUERY_STRING to 15000&213.
      1. The server calls execve to run the /cgi-bin/adder program in the context of the child.
      1. The adder program can reference the arguments using the Linux getenv function.
      • The server can also pass other information to the child process using other CGI environment variables.
    • Child sending its output to the client
      • A CGI program sends its dynamic content to the standard output.
      • The child process uses the Linux dup2 to redirect standard output to the connected descriptor before loading and running the CGI program.
      • The child is responsible for generating the Content-type and Content-length response headers, and the empty line that terminates the headers.
    • Example Code - a simple CGI program adder
      #include "csapp.h"
      
      int main(void) {
          char *buf, *p;
          char arg1[MAXLINE], arg2[MAXLINE], content[MAXLINE];
          int n1=0, n2=0;
      
          /* Extract the two arguments */
          if ((buf = getenv("QUERY_STRING")) != NULL) {
      	p = strchr(buf, '&');
      	*p = '\0';
      	strcpy(arg1, buf);
      	strcpy(arg2, p+1);
      	n1 = atoi(arg1);
      	n2 = atoi(arg2);
          }
      
          /* Make the response body */
          sprintf(content, "Welcome to add.com: ");
          sprintf(content, "%sTHE Internet addition portal.\r\n<p>", content);
          sprintf(content, "%sThe answer is: %d + %d = %d\r\n<p>", 
      	    content, n1, n2, n1 + n2);
          sprintf(content, "%sThanks for visiting!\r\n", content);
        
          /* Generate the HTTP response */
          printf("Content-length: %d\r\n", (int)strlen(content));
          printf("Content-type: text/html\r\n\r\n");
          printf("%s", content);
          fflush(stdout);
          exit(0);
      }

11.6 Putting It Together: The TINY Web Server

  • TINY is an iterative server that listens for connection requests on the port that is passed in the command line.
    1. TINY opens a listening socket by calling open_listenfd. (Line 29)
    1. TINY executes the typical infinite server loop, repeatedly accepting a connection request. (Line 32)
    1. TINY performs a transaction. - doit (Line 36)
    1. TINY closes its end of the connection. (Line 37)
  • doit handles one HTTP transaction.
    1. doit reads and parses the request line. (Line 11-14)
    1. TINY only supports the GET method. If another method is requested, doit sends an error message. - clienterror (Line 15-19)
    1. doit reads and ignores any request headers. (Line 20)
    1. doit parses URI into a filename and a possibly CGI argument string, and sets a flag indicating whether the request is for static or dynamic content. - parse_uri (Line 23)
    1. If the file doesn’t exist, doit sends an error message. (Line 24-28)
    1. If the file exists, doit verifies that the file is a regular file and that we have read permission. (Line 31)
    1. doit serves the static contents and dynamic contents respectively. - serve_static, serve_dynamic (Line 36, 44)
  • clienterror sends an HTTP response to the client with the appropriate status code and status message in the response line, along with an HTML file in the response body that explains the error to the browser’s user.
  • read_requesthdrs : TINY doesn't use any of the information in the request headers → read_requesthdrs simply reads and ignores them.
  • parse_uri
    • TINY assumes that the home directory for static content is its current directory and that the home directory for executables is ./cgi-bin. Also, the default filename is ./home.html.
    • If the request is for static content, parseline clear the CGI argument string and then converts the URI into a relative Linux pathname. If the URI ends with a ‘/’, it appends the default filename.
    • If the request is for dynamic content, parseline extract any CGI arguments and converts the remaining portion of the URI to a relative Linux filename.
  • serve_static
    • TINY serves five common types of static content: HTML, unformatted text, GIF, PNG, and JPEG.
    1. serve_static determine the file type by inspecting the suffix in the filename. (Line 7)
    1. serve_static sends the response line and response headers to the client. (Line 8-13)
    1. serve_static opens filename for reading and gets its descriptor srcfd. (Line 18)
    1. serve_static maps the requested file to a VM area using mmap. (Line 19)
    1. serve_static closes the file - we no longer need its descriptor. (Line 20)
    1. serve_static performs the transfer of the file to the client using rio_writen. rio_writen copies the filesize bytes starting at location scrp to the client’s connected descriptor. (Line 21)
    1. serve_static frees the mapped VM area.
  • serve_dynamic
    1. serve_dynamic sends a response line and the header.
    1. serve_dynamic forks a new child process. (Line 11)
    1. The child initializes the QUERY_STRING environment variable with the CGI arguments from the request URI. (Line 13)
    1. The child redirects the child’s standard output to the connected file descriptor using dup2. (Line 14)
    1. The child loads and runs the CGI program. Everything that the CGI program writes to standard output goes directly to the client process. (Line 15)
    1. The parent blocks in a call to wait, waiting to reap the child when it terminates. (Line 17)

Categories:

Updated:

Leave a comment