CS:APP Chapter 10 Summary ↔
Recently, I’ve been studying CS:APP - I’m posting my own summary of chapter 10 that I wrote up using Notion.
10.1 Unix I/O
10.1 Unix I/O
All I/O devices are modeled as files, and all input and output is performed by reading and writing the appropriate files. (example : networks, disks, and terminals)
→ The Linux kernel can export a simple, low-level application interface, Unix I/O, that enables all input and output to be performed in a uniform and consistent way.
- Opening files
- An application announces its intention to access an I/O device by asking the kernel to open the file.
- The kernel returns a small nonnegative integer, a descriptor, that identifies the file in all subsequent operations on the file.
- The kernel keeps track of all information about the open file. ↔ The application only keeps track of the descriptor.
- Each process created by a Linux shell begins life with 3 open files:
- standard input (descriptor 0)
- standard output (descriptor 1)
- standard error (descriptor 2)
- Changing the current file position
- A file position k, initially 0, is a byte offset from the beginning of a file, and is maintained for each open file.
- An application can set the current file position explicitly by performing a seek operation.
- Reading and writing files
- A read operation copies n > 0 bytes from a file to memory, starting at the current file position k and then incrementing k by n.
- A write operation copies n > 0 bytes to a file, starting at the current file position k and then incrementing k by n.
- Given a file with a size of m bytes, performing a read operation when k ≥ m triggers a condition end-of-file (EOF), which can be detected by the application.
- Closing files
- An application finishes accessing a file by asking the kernel to close the file.
- The kernel frees the data structures it created when the file was opened, and restores the descriptor to a pool of available descriptors.
- When a process terminates, the kernel closes all open files and frees their memory resources.
10.2 Files
10.2 Files
Each Linux file has a type that indicates its role in the system:
- Regular file
- A regular file contains arbitrary data.
- Application programs distinguish between text files, and binary files. To the kernel, there is no difference between text and binary files.
- Text files are regular files that contain only ASCII or Unicode characters.
- Binary files are everything else.
- Directory
- A directory is a file consisting of an array of links, where each link maps a filename to a file, which may be another directory.
- Each directory contains at least two entries:
.
is a link to the directory itself, and..
is a link to the parent directory in the directory hierarchy.
- You can create a directory with the
mkdir
, view its contents withls
, and delete it withrmdir
.
- Socket
- A socket is a file that is used to communicate with another process across a network. (Section 11.4)
The Linux kernel organizes all files in a single directory hierarchy anchored by the root directory named /
.
Each process has a current working directory as part of its context, that identifies its current location in the directory hierarchy.
You can change the shell’s current working directory with the cd
command.
Locations in the directory hierarchy are specified by pathnames, each of which is a string consisting of an optional slash followed by a sequence of filenames separated by slashes.
- An absolute pathname starts with a slash and denotes a path from the root node. (example :
/home/droh/hello.c
)
- A relative pathname starts with a filename and denotes a path from the current working directory. (example :
./hello.c
,../home/droh/hello.c
)
10.3 Opening and Closing Files
10.3 Opening and Closing Files
A process opens an existing file or creates a new file by calling the open
.
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open(char *filename, int flags, mode_t mode);
- The
open
converts a filename to a file descriptor and returns the descriptor number.
- The descriptor returned is always the smallest descriptor that is not currently open in the process.
- The
flags
argument indicates how the process intends to access the file:O_RDONLY
: Reading only
O_WRONLY
: Writing only
O_RDWR
: Reading and writing
O_CREAT
: If the file doesn’t exist, then create a truncated (empty) version of it.
O_TRUNC
: If the file already exists, then truncate it.
O_APPEND
: Before each write operation, set the file position to the end of the file.
- The
flags
can also be ORed with one or more bit masks. (example :fd = Open("foo.txt", O_WRONLY|O_APPEND, 0);
)
- The
mode
argument specifies the access permission bits of new files.- Each process has a
umask
as part of its context, that is set set by calling theumask
function.
- When a process creates a new file by calling the
open
with somemode
argument, the access permission bits of the file are set tomode & ~umask
.
umask(S_IWGRP|S_IWOTH); fd = Open("foo.txt", O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH); // -> Creates a new file in which the owner of the file has read and write permissions, and all other users have read permissions.
- Each process has a
A process closes an open file by calling the close
.
#include <unistd.h>
int close(int fd);
10.4 Reading and Writing Files
10.4 Reading and Writing Files
Applications perform input and output by calling read
and write
functions.
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t n);
ssize_t write(int fd, const void *buf, size_t n);
read
copies at mostn
bytes from the current file position of descriptorfd
to memory locationbuf
.- A return value of 1 → an error, a return value of 0 → EOF.
- Otherwise, the return value = the number of bytes that were actually transferred.
write
copies at mostn
bytes from memory locationbuf
to the current file position of descriptorfd
.
- Applications can explicitly modify the current file position by calling the
lseek
.
- In some situations,
read
andwrite
transfer fewer bytes than the application requests. (Short counts) They occur for a number of reasons:- Encountering EOF on reads
- Reading text lines from a terminal : If the open file is associated with a terminal (i.e., a keyboard and display), then each
read
will transfer one text line at a time, returning a short count equal to the size of the text line.
- Reading and writing network sockets : If the open file corresponds to a network socket, then internal buffering constraints and long network delays can cause
read
andwrite
to return short counts.
- To deal with short counts, repeatedly call
read
andwrite
until all requested bytes have been transferred.
10.5 Robust Reading and Writing with the RIO
Package
10.5 Robust Reading and Writing with the RIO
Package
RIO
PackageThe RIO package provides two different kinds of functions:
- Unbuffered input and output functions
- transfer data directly between memory and a file, with no application-level buffering.
- especially useful for reading and writing binary data to and from networks.
- Buffered input functions
- Using these functions, you can read text lines and binary data from a file whose contents are cached in an application-level buffer. (similar to standard I/O functions such as
printf
)
- are thread-safe (Section 12.7)
- can be interleaved arbitrarily on the same descriptor.
- Using these functions, you can read text lines and binary data from a file whose contents are cached in an application-level buffer. (similar to standard I/O functions such as
- Unbuffered input and output functions
RIO
Unbuffered Input and Output FunctionsApplications can transfer data directly between memory and a file by calling the
rio_readn
andrio_writen
functions.#include "csapp.h" ssize_t rio_readn(int fd, void *usrbuf, size_t n){ size_t nleft = n; ssize_t nread; char *bufp = usrbuf; while (nleft > 0) { if ((nread = read(fd, bufp, nleft)) < 0) { if (errno == EINTR) /* Interrupted by sig handler return */ nread = 0; /* and call read() again */ else return -1; /* errno set by read() */ } else if (nread == 0) break; /* EOF */ neleft -= nread; bufp += nread; } return (n - nleft); /* Return >= 0 */ } ssize_t rio_writen(int fd, void *usrbuf, size_t n){ size_t nleft = n; ssize_t nwritten; char *bufp = usrbuf; while (nleft > 0) { if ((nwritten = write(fd, bufp, nleft)) <= 0) { if (errno = EINTR) /* Interrupted by sig handler return */ nwritten = 0; /* and call write() again */ else return -1; /* errno set by write() */ } nleft -= nwritten; bufp += nwritten; } return n; }
rio_readn
transfers up ton
bytes from the current file position of descriptorfd
to memory locationusrbuf
.
rio_readn
can only return a short count if it encounters EOF.
rio_writen
transfersn
bytes from locationusrbuf
to descriptorfd
.
- The
rio_writen
never returns a short count.
- Calls to
rio_readn
andrio_writen
can be interleaved arbitrarily on the same descriptor.
- Each function manually restarts the
read
orwrite
function if it is interrupted by the return from an application signal handler.
RIO
Buffered Input Functions- Read buffer -
rio_t
,rio_readinitb
#define RIO_BUFSIZE 8192 typedef struct { int rio_fd; /* Descriptor for this internal buf */ int rio_cnt; /* Unread bytes in internal buf */ char *rio_bufptr; /* Next unread byte in internal buf */ char rio_buf[RIO_BUFSIZE]; /* Internal buffer */ } rio_t; void rio_readinitb(rio_t *rp, int fd){ rp->rio_fd = fd; rp->rio_cnt = 0; rp->rio_bufptr = rp->rio_buf; }
rio_t
is the read buffer used inRIO
package’s buffered input functions.
rio_readinitb
sets up an empty read buffer and associates an open file descriptor with that buffer.rio_readinitb
is called once per open descriptor.
- Buffered Input Function -
rio_read
static ssize_t rio_read(rio_t *rp, char *usrbuf, size_t n){ int cnt; while (rp->rio_cnt <= 0){ /* Refill if buf is empty */ rp->rio_cnt = read(rp->rio_fd, rp->rio_buf, sizeof(rp->rio_buf)); if(rp->rio_cnt < 0) { if (errno != EITNR) /* Interrupted by sig handler return */ return -1; } else if (rp->rio_cnt == 0) /* EOF */ return 0; else rp->rio_bufptr = rp->rio_buf; /* Reset buffer ptr */ } /* Copy min(n, rp->rio_cnt) bytes from internal buf to user buf */ cnt = n; if (rp->rio_cnt < n) cnt = rp->rio_cnt; memcpy(usrbuf, rp->rio_bufptr, cnt); rp->rio_bufptr += cnt; rp->rio_cnt -= cnt; return cnt; }
rio_read
is a buffered version of the Linuxread
.
- When
rio_read
is called with a request to readn
bytes, there arerp->rio_cnt
unread bytes in the read buffer.
- If the buffer is empty → replenish the buffer with a call to
read
. A short count from thisread
is not an error; it simply has the effect of partially filling the read buffer.
- Once the buffer is nonempty →
rio_read
copies themin(n, rp->rio_cnt)
bytes from the read buffer to the user buffer and returns the number of bytes copied.
- Buffered Input Function -
rio_readlineb
,rio_readnb
ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen){ int n, rc; char c, *bufp = usrbuf; for (n = 1; n < maxlen; n++){ if ((rc = rio_read(rp, &c, 1)) == 1) { *bufp++ = c; if(c == '\n') { n++; break; } } else if (rc == 0) { if (n == 1) return 0; /* EOF, no data read */ else break; /* EOF, some data was read */ } else return -1; /* Error */ } *bufp = 0; /* Terminate with NULL */ return n-1; }
rio_readlineb
reads the next text line from filerp
, copies it to memory locationusrbuf
, and terminates the text line with the NULL character.
rio_readlineb
reads at mostmaxlen-1
bytes, leaving room for the terminating NULL. Text lines that exceedmaxlen-1
bytes are truncated and terminated with a NULL.
- Using
rio_readlineb
is much more efficient than usingread
to transfer 1 byte at a time, checking each byte for the newline character.
ssize_t rio_readnb(rio_t *rp, void *usrbuf, size_t n){ size_t nleft = n; ssize_t nread; char *bufp = usrbuf; while (nleft > 0) { if ((nread = rio_read(rp, bufp, nleft)) < 0) return -1; /* errno set by read() */ else if (nread == 0) break; /* EOF */ nleft -= nread; bufp += nread; } return (n - nleft); /* Return >= 0 */ }
rio_readnb
reads up ton
bytes from filerp
to memory locationusrbuf
.
rio_readnb
has the same structure asrio_readn
, withrio_read
substituted forread
.
- Calls to
rio_readlineb
andrio_readnb
can be interleaved arbitrarily on the same descriptor. ↔ Calls to these buffered functions shouldn’t be interleaved with calls to the unbufferedrio_readn
function.
- Example Code - Copying a text file from standard input to standard output
#include "csapp.h" int main(int argc, char **argv){ int n; rio_t rio; char buf[MAXLINE]; Rio_readinitb(&rio, STDIN_FILENO); while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) Rio_writen(STDOUT_FILENO, buf, n); }
- Read buffer -
10.6 Reading File Metadata
10.6 Reading File Metadata
An application can retrieve information about a file (metadata) by calling the stat
and fstat
.
#include <unistd.h>
#include <sys/stat.h>
int stat(const char *filename, struct stat *buf);
int fstat(int fd, struct stat *buf);
stat
takes a filename as input and fills the members of astat
structure.st_size
contains the file size in bytes.
st_mode
encodes both the file permission bits (S_IRUSR
,S_IWGRP
, …) and the file type (regular file, directory, socket, …).
- can determine the file type from the
st_mode
using macros defined insys/stat
:S_ISREG(m)
: Is this a regular file?
S_ISDIR(m)
: Is this a directory file?
S_ISSOCK(m)
: Is this a network socket?
- can check the file’s permission bit using the
st_mode
asif (stat.st_mode & S_IRUSR) readok = "yes";
.
fstat
takes a filename as file descriptor instead of a file name.
10.7 Reading Directory Contents
10.7 Reading Directory Contents
Applications can read the contents of a directory with the readdir
family of functions.
#include <sys/types.h>
#include <dirent.h>
DIR *opendir(const char *name);
opendir
takes a pathname and returns a pointer to a directory stream. A stream is an abstraction for an ordered list of items.
#include <dirent.h>
struct dirent *readdir(DIR *dirp);
- Each call to
readdir
returns a pointer to the next directory entry in the streamdirp
, or NULL if there are no more entries (end-of-stream).
- Each directory entry is a structure of the form
struct dirent { ino_t d_ino; /* inode number */ char d_name[256] /* Filename */ }
d_name
is the filename.
d_ino
is the file location.
- On error,
readdir
returns NULL and setserrno
. Checking ifreaddir
has changederrno
is the only way to distinguish an error from the end-of-stream condition.
#include <dirent.h>
int closedir(DIR *dirp);
closedir
closes the stream and frees up any of its resources.
10.8 Sharing Files
10.8 Sharing Files
Descriptor table, file table, and v-node table
The kernel represents open files using 3 related data structures:
- Descriptor table
- Each process has its own descriptor table.
- Descriptor table’s entries are indexed by the process’s open file descriptors.
- Each open descriptor entry points to an entry in the file table.
- File table
- The set of open files is represented by a file table.
- File table is shared by all processes.
- Each file table entry consists of:
- the current file position
- a reference count - the number of descriptor entries that currently point to it
- a pointer to an entry in the v-node table
- The kernel won’t delete the file table entry until its reference count is zero.
- v-node table
- v-node table is shared by all processes.
- Each entry contains most of the information in the
stat
structure.
- Descriptor table
File Sharing
Multiple descriptors can reference the same file through different file table entries.
This can happen if you call the
open
twice with the same filename.Each descriptor has its own distinct file position, so different reads on different descriptors can fetch data from different locations in the file.
The
fork
RevisitedWhen
fork
is called, the child gets its own duplicate copy of the parent’s descriptor table.Parent and child share the same set of open file tables and thus share the same file position.
The parent and child must both close their descriptor before the kernel will delete the corresponding file table entry.
10.9 I/O Redirection
10.9 I/O Redirection
Linux shells provide I/O redirection operators that allow users to associate standard input and output with disk files.
Typing linux> ls > foo.txt
causes the shell to load and execute the ls
program, with standard output redirected to disk file foo.txt
.
I/O redirection uses the dup2
function:
#include <unistd.h>
int dup2(int oldfd, int newfd);
dup2
copies descriptor table entryoldfd
to descriptor table entrynewfd
, overwriting the previous contents of descriptor table entrynewfd
.
- If
newfd
was already open, thendup2
closesnewfd
before it copiesoldfd
.
10.10 Standard I/O
10.10 Standard I/O
The C language defines a standard I/O library, that provides programmers with a higher-level alternative to Unix I/O. The library libc
provides several functions:
fopen
,fclose
- for opening and closing files
fread
,fwrite
- reading and writing bytes
fgets
,fputs
- reading and writing strings
scanf
,printf
- sophisticated formatted I/O
The standard I/O library models an open file as a stream, which is a pointer to a structure of type FILE
.
Every ANSI C program begins with 3 open streams, stdin
, stdout
, and stderr
:
#include <stdio.h>
extern FILE *stdin; /* Standard input (descriptor 0) */
extern FILE *stdout; /* Standard output (descriptor 1) */
extern FILE *stderr; /* Standard error (descriptor 2) */
A stream of type FILE
is an abstraction for a file descriptor and a stream buffer, which is for minimizing the number of expensive Linux I/O system calls.
10.11 Putting It Together: Which I/O Functions Should I Use?
10.11 Putting It Together: Which I/O Functions Should I Use?
- The Unix I/O model is implemented in the OS kernel. -
open
,close
,lseek
,read
,write
, andstat
- The higher-level
RIO
and standard I/O functions are implemented using the Unix I/O functions.
- Which of these functions should you use in your programs? :
- Use the standard I/O functions whenever possible.
The standard I/O functions are the method of choice for I/O on disk and terminal devices.
- Don’t use
scanf
orrio_readlineb
to read binary files.Functions like
scanf
andrio_readlineb
are designed for reading text files.Using these functions for reading binary files can lead to unpredictable errors. (example : binary files can be littered with many
0xa
bytes that is irrelevant to\n
.)
- Use the
RIO
functions for I/O on network sockets.Standard I/O poses some problems when used for input and output on networks.
- Use the standard I/O functions whenever possible.
- Standard I/O streams are full duplex - programs can perform input and output on the same stream.
- Standard I/O’s streams have restrictions that interact badly with restrictions on sockets:
- Input functions following output functions
An input function can’t follow an output function without an intervening call to
fflush
,fseek
,fsetpos
, orrewind
.fflush
empties the buffer, and the latter three use the Unix I/Olseek
to reset the current file position.
- Output functions following input functions
An output function can’t follow an input function without an intervening call to
fseek
,fsetpos
, orrewind
, unless the input function encounters an end-of-file.
These restrictions can lead to a problem for network applications - it is illegal to use the
lseek
on a socket. → UseRIO
instead. - Input functions following output functions
Leave a comment