A driver is the code in an operating system that manages a particular device: it configures the device hardware, tells the device to perform operations, handles the resulting interrupts, and interacts with processes that may be waiting for I/O from the device.
| Part | Triggered by | Runs in | Key characteristics |
| --------------- | --------------------------------------------------- | ------------------------------------ | ------------------------------------------------------------------- |
| Top half | Hardware interrupt | Interrupt context | π Runs immediately, must be fast, non-blocking |
| Bottom half | Deferred work (e.g., via syscall wakeup or tasklet) | Process or kernel thread context | π’ Can do slower, more complex work, may block or sleep |
Why is the top half faster? Interrupt context is very sensitive: Cannot sleep Interrupts may be disabled Must acknowledge the hardware fast The top half often: Reads from a hardware register Buffers a value Wakes up a process waiting for data Example: In uartintr() (xv6), the interrupt handler reads a char and calls wakeup() β thatβs all.
Why is the bottom half allowed to be slower? It runs in normal thread/process context It has full kernel privileges It can: Copy data to/from user space Sleep/wait for resources Do file I/O, scheduling, etc. Example: In consoleread() in xv6, the process sleeps waiting for input. When the interrupt (top half) wakes it, it finishes reading.
The following sections will give a driver code example using console. Console is implemented using uart. When the user types characters (e.g., via terminal in QEMU), they come in through the UART hardware. The UART raises an interrupt for each character.
When the kernel or user process calls write(1, “hello”, 5), it writes to standard output β the console. kernel sends the chars one by one to the UART transmit register
Top Half: uart.c are driver code directly interacting with the hardware
Bottom Half: console.c is a higher-level abstraction built on top of the UART.
The bottom half interacts with OS and can be blocked, while the TOP half interact with the hardware, should be non-blocking.
UART (Universal Asynchronous Receiver-Transmitter): This is a hardware device (serial port) that sends/receives bytes serially. xv6 has a UART driver in uart.c, which talks directly to the UART hardware using memory-mapped I/O or port-mapped I/O. It has functions like uartinit(), uartgetc() (read a byte), uartputc() (write a byte), and uartintr() (interrupt handler).
uart.c β Low-level UART driver Talks directly to the UART hardware Handles: UART register setup Sending/receiving bytes via MMIO Handling UART interrupts (uartintr()) Device-specific β tied to the 16550 UART Think of uart.c as the device driver for a specific hardware peripheral.
console.c β Higher-level console abstraction Implements the OS-facing “console device” User read() and write() go through here Uses uart.c as the backend Handles: Input buffering (input.buf[]) Echoing characters Handling special keys (like backspace or control chars) Connecting read(0,…) and write(1,…) syscalls to UART Also supports kernel printf() output
You may wondering how is I/O blocked handled here. How does one process block there when read, and how does it resume when there is something ready to read.
The following is the flow:
The console uses UART the following is Top half and bottom half involved in the console input process.
Bottom half: User process calls read() β enters kernel read(0, buf, n) β syscall β sysread β consoleread(). This runs in process context. If no data is available in the input buffer, the process sleeps
Upper half: User types a key β UART raises interrupt The UART hardware generates an interrupt: CPU jumps to trap handler β calls devintr() β calls uartintr() This is interrupt context. Reads the incoming character from the UART hardware register Pushes it into the input buffer Calls wakeup(&input) to resume any process sleeping on input
User process
|
| read(0, buf, n)
|β consoleread() β (bottom half)
| |
| | sleep(&input); β if buffer empty
β
UART input event (interrupt)
β trap β devintr() β uartintr() β (top half)
| read char from UART
| store in input.buf
| wakeup(&input)
User process resumes
|
| consoleread() returns data
| read() β back to user space
after boot up kernel calls main to do console init, then uart init from high level abstraction code to low level specific hw level code main -> consoleinit -> uartinit
void consoleinit(void) { initlock(&cons.lock, "cons"); uartinit(); // connect read and write system calls // to consoleread and consolewrite. devsw[CONSOLE].read = consoleread; devsw[CONSOLE].write = consolewrite; }
console init does high level staff, like tell the os where is the routine for console read and write.
The hw detailed init will be pointed to uartinit hw init are mained to write the hw register to program the hw. The UART hardware appears to software as a set of memory-mapped control registers.
The memory-mapped addresses for the UART start at 0x10000000, or UART0
There are a handful of UART control registers, each the width of a byte. Their offsets from UART0 are defined in uart.c (hw device specific code).
The following is the table of control registers used by xv6 for UART programming.
| Name | Offset | Access | Purpose / Description | Used in xv6? |
| ------- | ------ | ------ | --------------------------------------------------------------- | ------------------------ |
| RHR | 0x00 | Read | Receiver Holding Register β read incoming byte | β Yes |
| THR | 0x00 | Write | Transmitter Holding Register β write byte to send | β Yes |
| IER | 0x01 | R/W | Interrupt Enable Register β enables interrupt types | β Yes |
| FCR | 0x02 | Write | FIFO Control Register β enables/clears FIFOs | β Yes |
| LCR | 0x03 | R/W | Line Control Register β sets word length, stop bits, parity | β Yes |
| MCR | 0x04 | R/W | Modem Control Register β controls OUT2, RTS, DTR | β Yes (for enabling IRQ) |
| LSR | 0x05 | Read | Line Status Register β RX ready, TX ready, errors | β Yes |
| DLL | 0x00 | R/W | Divisor Latch LSB β set baud rate (if DLAB=1) | β Yes |
| DLM | 0x01 | R/W | Divisor Latch MSB β set baud rate (if DLAB=1) | β Yes |
void uartinit(void) { // disable interrupts. WriteReg(IER, 0x00); // special mode to set baud rate. WriteReg(LCR, LCR_BAUD_LATCH); // LSB for baud rate of 38.4K. WriteReg(0, 0x03); // MSB for baud rate of 38.4K. WriteReg(1, 0x00); // leave set-baud mode, // and set word length to 8 bits, no parity. WriteReg(LCR, LCR_EIGHT_BITS); // reset and enable FIFOs. WriteReg(FCR, FCR_FIFO_ENABLE | FCR_FIFO_CLEAR); // enable transmit and receive interrupts. WriteReg(IER, IER_TX_ENABLE | IER_RX_ENABLE); initlock(&uart_tx_lock, "uart"); }
The following code path is from system call read to read from console. The kernel process thread will enter into sleep if no input is given. This part of code belongs to the bottom half of driver.
fd opened when the first init process started init.c
int main(void) { if(open("console", O_RDWR) < 0){ mknod("console", CONSOLE, 0); open("console", O_RDWR); } dup(0); // stdout dup(0); // stderr }
One thing to note is mknod here: It creates a special file named “console” in the root directory / with: Major number: 1 (i.e., CONSOLE) Minor number: 0
This tells the OS: Whenever a user reads/writes to this file, dispatch the I/O to the driver registered as major number 1 β which is the console driver.
The mapping between major number to console driver funciton is defined in the device switch table:
devsw[] maps major device numbers to their driver functions. Each entry has a read and write function pointer. This belongs to the file system concept.
| File type | Where handled | Example syscall path |
| ----------- | ------------------------- | ----------------------------------- |
| `FDINODE` | File system code (`fs.c`) | `read()` β `fileread()` β `readi()` |
| `FDDEVICE` | Device driver (`devsw[]`) | `read()` β `devsw[major].read()` |
| `FDPIPE` | Pipe code (`pipe.c`) | `read()` β `piperead()` |
Normal files created with touch or mkdir belongs to FDINODE, which is handled by file system layer. Not through the device switch table.
The following gives a summary of their differences:
| Operation | File Type | Handled By | Code Path |
| ----------------- | ------------ | ------------------------- | ------------------------ |
| `read(“myfile”)` | Regular file | File system layer | `fileread()` β `readi()` |
| `read(“console”)` | Device file | Device driver (`devsw[]`) | `devsw[CONSOLE].read()` |
struct devsw { int (*read)(int, uint64, int); int (*write)(int, uint64, int); };
#define CONSOLE 1 void consoleinit(void) { // connect read and write system calls // to consoleread and consolewrite. devsw[CONSOLE].read = consoleread; // dispatch the read to driver consoleread devsw[CONSOLE].write = consolewrite; // dispatch the write to driver consolewrite }
The following shows how to read data from file with DEVICE type. Note that the OS needs to place the things read from the device to somewhere in the user space of the process so that the process can access it.
So the addr in the fileread function means the address in process’s
virtual address space. And is used by the kernel to place what it reads.
The addr is usually a argument input given by int read(int fd, void *buf, int n)
system call.
// Read from file f. // addr is a user virtual address. int fileread(struct file *f, uint64 addr, int n) { // ... else if(f->type == FD_DEVICE){ if(f->major < 0 || f->major >= NDEV || !devsw[f->major].read) return -1; r = devsw[f->major].read(1, addr, n); //... }
The following is the sysread which will be triggered when system call read is made. sysfile.c
uint64 sys_read(void) { struct file *f; int n; uint64 p; argaddr(1, &p); // get buffer address argument argint(2, &n); // get number of bytes arg if(argfd(0, 0, &f) < 0) // get fd arg return -1; return fileread(f, p, n); }
So currently the code trace is: read -> sysread -> fileread -> consoleread
console buffer is used for console read. console write will write to UART directly and will not be buffered
struct {
struct spinlock lock;
// input
#define INPUT_BUF_SIZE 128
char buf[INPUT_BUF_SIZE];
uint r; // Read index
uint w; // Write index
uint e; // Edit index
} cons;
cons.r β Read Index
Points to the next character to be read by a process (e.g., the shell or read() syscall).
Advanced by consoleread() after it consumes a character.
Sleeps if cons.r == cons.w (i.e., nothing to read).
cons.w β Write Index
Points to the next position where a new character will be written.
Advanced by consoleintr() when a character is received (typically from UART via interrupt).
int consoleread(int user_dst, uint64 dst, int n) { target = n; acquire(&cons.lock); while(n > 0){ // cons.r == cons.w means the input buffer is empty (nothing typed yet). // It goes to sleep waiting for new input. while(cons.r == cons.w){ if(killed(myproc())){ release(&cons.lock); return -1; } // sleeps on the address &cons.r and releases the lock while sleeping // When an interrupt occurs (user types something), the UART interrupt // handler will wake it up via wakeup(&cons.r) (done in consoleintr()). // process context will be changed to another process and resumed back // once interrupt is hit sleep(&cons.r, &cons.lock); // now we resumed and will re-check the while loop condition } // now we have cons.r != cons.w, we read something from uart // copy the input byte to the user-space buffer. cbuf = c; if(either_copyout(user_dst, dst, &cbuf, 1) == -1) break; // the rest of the code is processing char by char // break if we hit EOF, \n, or read n chars. } release(&cons.lock); }
consoleintr uartintr() calls this for input character.
void consoleintr(int c) { switch(c){ case C('P'): // Print process list. Ctrl-P procdump(); break; default: if(c == '\n' || c == C('D') || cons.e-cons.r == INPUT_BUF_SIZE){ // wake up consoleread() if a whole line (or end-of-file) // has arrived. or the input buffer is full cons.w = cons.e; wakeup(&cons.r); } }
void uartintr(void) { ReadReg(ISR); // acknowledge the interrupt // for input: read and process incoming characters. while(1){ int c = uartgetc(); if(c == -1) break; consoleintr(c); } // for output send buffered characters. acquire(&uart_tx_lock); uartstart(); release(&uart_tx_lock); }
*
A write system call on a file descriptor connected to the console eventually arrives at uartputc. consolewrite
int consolewrite(int user_src, uint64 src, int n) { int i; for(i = 0; i < n; i++){ char c; // copy one byte from user virtual address space to kernel virtual address space if(either_copyin(&c, user_src, src+i, 1) == -1) break; uartputc(c); } return i; }
first byte is triggered by write -> uartputc
// add a character to the output buffer and tell the // UART to start sending if it isn't already. // blocks if the output buffer is full. // because it may block, it can't be called // from interrupts; it's only suitable for use // by write(). void uartputc(int c) { acquire(&uart_tx_lock); if(panicked){ for(;;) ; } while(uart_tx_w == uart_tx_r + UART_TX_BUF_SIZE){ // buffer is full. // wait for uartstart() to open up space in the buffer. sleep(&uart_tx_r, &uart_tx_lock); } uart_tx_buf[uart_tx_w % UART_TX_BUF_SIZE] = c; uart_tx_w += 1; uartstart(); release(&uart_tx_lock); }
Each time the UART finishes sending a byte, it generates an interrupt. uartintr calls uartstart Thus if a process writes multiple bytes to the console, typically the first byte will be sent by uartputcβs call to uartstart, and the remaining buffered bytes will be sent by uartstart calls from uartintr as transmit complete interrupts arrive.
// if the UART is idle, and a character is waiting // in the transmit buffer, send it. // caller must hold uart_tx_lock. // called from both the top- and bottom-half. void uartstart() { while(1){ if(uart_tx_w == uart_tx_r){ // transmit buffer is empty. return; } if((ReadReg(LSR) & LSR_TX_IDLE) == 0){ // the UART transmit holding register is full, // so we cannot give it another byte. // it will interrupt when it's ready for a new byte. return; } int c = uart_tx_buf[uart_tx_r % UART_TX_BUF_SIZE]; uart_tx_r += 1; // maybe uartputc() is waiting for space in the buffer. wakeup(&uart_tx_r); WriteReg(THR, c); } }
Context: consoleread() and consoleintr() (UART interrupt handler) These two kernel functions interact via:
A shared buffer: cons.buf[]
Shared indexes: cons.r, cons.w, cons.e
And shared synchronization: cons.lock
We need to keep concurrency well when interacting with the above resources You may have noticed calls to acquire in consoleread and in consoleintr. These calls acquire a lock, which protects the console driverβs data structures from concurrent access. There are three concurrency dangers here:
Chapter 6 explores these problems and how locks can address them.
Another way in which concurrency requires care in drivers is that one process may be waiting for input from a device, but the interrupt signaling arrival of the input may arrive when a different process (or no process at all) is running. Thus interrupt handlers are not allowed to think about the process or code that they have interrupted. For example, an interrupt handler cannot safely call copyout with the current processβs page table. Interrupt handlers typically do relatively little work (e.g., just copy the input data to a buffer), and wake up top-half code to do the rest.
The console is used globally as one terminal. xv6 supports background process which may leads to shell process and background process read from the same console I/O. If this happens, Input/output may become interleaved, confusing, or broken.
Xv6 uses timer interrupts to maintain its clock and to enable it to switch among compute-bound processes; xv6 sets up the RISC-V timer interrupt to fire regularly (e.g., every 100 Hz).
xv6 handles timer interrupts completely different from the trap mechanism we discussed before.
timer interrupt triggers yield which triggers user interrupt or kernel interrupt to give up CPU.
Timer interrupt can happen in user mode or kernel mode. But no matter what they will trigger yield. And during yield (context switch) it is always in the kernel mode/context.
The yield will trigger context switch to sheduler context, which involves store current context regsiters and load new context registers.
RISC-V requires that timer interrupts be taken in machine mode, not supervisor mode. For other interrupts we take them under supervisor mode.
RISCV machine mode executes without paging, and with a separate set of control registers, so itβs not practical to run ordinary xv6 kernel code in machine mode.
As a result, xv6 handles timer interrupts completely separately from the trap mechanism laid out above.
| Level | Name | Used For |
| ------ | ------------------- | --------------------------------------------------------------- |
| M-mode | Machine mode | Highest privilege, handles low-level traps and timer interrupts |
| S-mode | Supervisor mode | Kernel code (e.g., xv6 kernel) |
| U-mode | User mode | User applications |
xv6 “gets away” with handling timer interrupts in S-mode by delegating them via mideleg. It’s a valid RISC-V feature.
// entry.S jumps here in machine mode on stack0. void start() { // delegate all interrupts and exceptions to supervisor mode. w_medeleg(0xffff); w_mideleg(0xffff); }
devintr() identifies the cause as a timer interrupt and returns 2
int devintr() { if(){ } else if(scause == 0x8000000000000005L){ // timer interrupt. clockintr(); return 2; } }
Question: RSIC-V supports delegating Timer Interrupt to Supervisor Mode, then why does it also supports delegate to Machine mode?
It must support both embedded (no OS) and full OS environments. For some embeded systems, we don’t even have OS, everything runs under machine mode.
When a timer interrupt occurs while in user mode, control is transferred to the trap handler. usertrap() in kernel/trap.c This causes the current process to yield the CPU, allowing the scheduler to pick another process.
// // handle an interrupt, exception, or system call from user space. // called from trampoline.S // void usertrap(void) { // give up the CPU if this is a timer interrupt. if(which_dev == 2) yield(); }
This is rare but necessary to handle preemption even during kernel code execution, such as when doing a long computation or stuck in a system call.
// interrupts and exceptions from kernel code go here via kernelvec, // on whatever the current kernel stack is. void kerneltrap() { // give up the CPU if this is a timer interrupt. if(which_dev == 2 && myproc() != 0) yield(); }
Called when the process wants to voluntarily give up the CPU.
// Give up the CPU for one scheduling round. void yield(void) { struct proc *p = myproc(); acquire(&p->lock); p->state = RUNNABLE; // usually from running to runnable sched(); // will switch to scheduler context release(&p->lock); }
// Switch to scheduler. Must hold only p->lock // and have changed proc->state. Saves and restores // intena because intena is a property of this // kernel thread, not this CPU. It should // be proc->intena and proc->noff, but that would // break in the few places where a lock is held but // there's no process. void sched(void) { int intena; struct proc *p = myproc(); if(!holding(&p->lock)) panic("sched p->lock"); if(mycpu()->noff != 1) panic("sched locks"); if(p->state == RUNNING) panic("sched RUNNING"); if(intr_get()) panic("sched interruptible"); intena = mycpu()->intena; swtch(&p->context, &mycpu()->context); // switch from process context to cpu sheduler context mycpu()->intena = intena; }
// Saved registers for kernel context switches. struct context { uint64 ra; uint64 sp; // callee-saved uint64 s0; uint64 s1; uint64 s2; uint64 s3; uint64 s4; uint64 s5; uint64 s6; uint64 s7; uint64 s8; uint64 s9; uint64 s10; uint64 s11; };
Comparing with trap, swtch saves less registers:
| Context Save | Who Saves It | When | What For |
| --------------- | --------------------- | --------------------------------- | ---------------------------------------------------------------- |
| `trap` frame | Hardware + `usertrap` | On trap/interrupt from user mode | Save full CPU state so user process can resume after kernel code |
| `swtch` context | `swtch()` (assembly) | During `sched()` / context switch | Save minimal state needed to resume kernel thread execution |
Remember that context switches only happens in the kernel mode.