2.2 User mode, supervisor mode, and system calls
Strong isolation requires a hard boundary between applications and the operating system. Applications shouldn’t be allowed to disturb the operation of the operating system or other programs, even if the application has a bug or is malicious. To achieve strong isolation, the operating system must arrange that applications cannot modify (or even read) the operating system’s data structures and instructions and that applications cannot access other processes’ memory.
CPUs provide hardware support for strong isolation. For example, RISC-V has three privilege levels which constrain what code can do: machine mode, supervisor mode, and user mode. Instructions executing in machine mode have full privilege; a CPU starts in machine mode. Machine mode is mostly intended for setting up the computer during boot. Xv6 executes briefly in machine mode and then changes to supervisor mode.
In supervisor mode the CPU is allowed to execute privileged instructions: for example, enabling and disabling interrupts, reading and writing the register that holds the address of the page table, etc. If an application in user mode attempts to execute a privileged instruction, then the CPU doesn’t execute the instruction, but “traps” to special code in supervisor mode that can terminate the application. Figure 1.1 in Chapter 1 illustrates this organization. An application can execute only user-mode instructions (e.g., adding numbers, etc.) and is said to be running in user space, while the software in supervisor mode can also execute privileged instructions and is said to be running in kernel space. The software running in kernel space (or in supervisor mode) is called the kernel.
Applications interact with the kernel via system calls
calls such as read.
Applications are not allowed to directly call kernel functions
or access the kernel’s memory.
RISC-V provides the ecall instruction
for system calls; it switches the CPU from user to supervisor mode
and jumps to a kernel-specified entry point.
Once the CPU has switched to supervisor mode,
the kernel can then validate the arguments of the system call (e.g.,
check if the address passed to the system call is part of the application’s memory), decide whether
the application is allowed to perform the requested operation (e.g.,
check if the application is allowed to write the specified file), and then deny it
or execute it. It is important that the kernel control the entry point for
transitions to supervisor mode; if the application could decide the kernel entry
point, a malicious application could, for example, enter the kernel at a point where the
validation of arguments is skipped.