Table of Contents

1. Intro

A driver is the code in an operating system that manages a particular device: it configures the device hardware, tells the device to perform operations, handles the resulting interrupts, and interacts with processes that may be waiting for I/O from the device.

Part Triggered by Runs in Key characteristics
--------------- --------------------------------------------------- ------------------------------------ -------------------------------------------------------------------
Top half Hardware interrupt Interrupt context 🟠 Runs immediately, must be fast, non-blocking
Bottom half Deferred work (e.g., via syscall wakeup or tasklet) Process or kernel thread context 🟒 Can do slower, more complex work, may block or sleep

Why is the top half faster? Interrupt context is very sensitive: Cannot sleep Interrupts may be disabled Must acknowledge the hardware fast The top half often: Reads from a hardware register Buffers a value Wakes up a process waiting for data Example: In uartintr() (xv6), the interrupt handler reads a char and calls wakeup() β€” that’s all.

Why is the bottom half allowed to be slower? It runs in normal thread/process context It has full kernel privileges It can: Copy data to/from user space Sleep/wait for resources Do file I/O, scheduling, etc. Example: In consoleread() in xv6, the process sleeps waiting for input. When the interrupt (top half) wakes it, it finishes reading.

The following sections will give a driver code example using console. Console is implemented using uart. When the user types characters (e.g., via terminal in QEMU), they come in through the UART hardware. The UART raises an interrupt for each character.

When the kernel or user process calls write(1, “hello”, 5), it writes to standard output β€” the console. kernel sends the chars one by one to the UART transmit register

2. Code: Console Input

2.1. console.c and uart.c

Top Half: uart.c are driver code directly interacting with the hardware

Bottom Half: console.c is a higher-level abstraction built on top of the UART.

The bottom half interacts with OS and can be blocked, while the TOP half interact with the hardware, should be non-blocking.

2.1.1. uart.c (Top half)

UART (Universal Asynchronous Receiver-Transmitter): This is a hardware device (serial port) that sends/receives bytes serially. xv6 has a UART driver in uart.c, which talks directly to the UART hardware using memory-mapped I/O or port-mapped I/O. It has functions like uartinit(), uartgetc() (read a byte), uartputc() (write a byte), and uartintr() (interrupt handler).

uart.c β†’ Low-level UART driver Talks directly to the UART hardware Handles: UART register setup Sending/receiving bytes via MMIO Handling UART interrupts (uartintr()) Device-specific β€” tied to the 16550 UART Think of uart.c as the device driver for a specific hardware peripheral.

2.1.2. console.c (Bottom half)

console.c β†’ Higher-level console abstraction Implements the OS-facing “console device” User read() and write() go through here Uses uart.c as the backend Handles: Input buffering (input.buf[]) Echoing characters Handling special keys (like backspace or control chars) Connecting read(0,…) and write(1,…) syscalls to UART Also supports kernel printf() output

2.1.3. Example: Flow of read

You may wondering how is I/O blocked handled here. How does one process block there when read, and how does it resume when there is something ready to read.

The following is the flow:

  1. A user space process calls read
  2. read calls the bottom half and will block there when route to consoleread
  3. Because the process is in kernel context, kernel stack and Program Counter will be saved
  4. Interrupt occurs, which will trigger uartintr() code routine in interrupt context
  5. Interrupt wakeup process and mark them as runnable
  6. Interrupt exit and sheduler runs the processes later

2.2. top part and bottom part for console

The console uses UART the following is Top half and bottom half involved in the console input process.

Bottom half: User process calls read() β†’ enters kernel read(0, buf, n) β†’ syscall β†’ sysread β†’ consoleread(). This runs in process context. If no data is available in the input buffer, the process sleeps

Upper half: User types a key β†’ UART raises interrupt The UART hardware generates an interrupt: CPU jumps to trap handler β†’ calls devintr() β†’ calls uartintr() This is interrupt context. Reads the incoming character from the UART hardware register Pushes it into the input buffer Calls wakeup(&input) to resume any process sleeping on input

User process
   |
   | read(0, buf, n)
   |β†’ consoleread()      ← (bottom half)
   |     |
   |     | sleep(&input); ← if buffer empty
   ↓
UART input event (interrupt)
   β†’ trap β†’ devintr() β†’ uartintr() ← (top half)
         |  read char from UART
         |  store in input.buf
         |  wakeup(&input)

User process resumes
   |
   | consoleread() returns data
   | read() β†’ back to user space

2.3. uart init

consoleinit

after boot up kernel calls main to do console init, then uart init from high level abstraction code to low level specific hw level code main -> consoleinit -> uartinit

void
consoleinit(void)
{
  initlock(&cons.lock, "cons");

  uartinit();

  // connect read and write system calls
  // to consoleread and consolewrite.
  devsw[CONSOLE].read = consoleread;
  devsw[CONSOLE].write = consolewrite;
}

console init does high level staff, like tell the os where is the routine for console read and write.

The hw detailed init will be pointed to uartinit hw init are mained to write the hw register to program the hw. The UART hardware appears to software as a set of memory-mapped control registers.

The memory-mapped addresses for the UART start at 0x10000000, or UART0

There are a handful of UART control registers, each the width of a byte. Their offsets from UART0 are defined in uart.c (hw device specific code).

The following is the table of control registers used by xv6 for UART programming.

Name Offset Access Purpose / Description Used in xv6?
------- ------ ------ --------------------------------------------------------------- ------------------------
RHR 0x00 Read Receiver Holding Register β€” read incoming byte βœ… Yes
THR 0x00 Write Transmitter Holding Register β€” write byte to send βœ… Yes
IER 0x01 R/W Interrupt Enable Register β€” enables interrupt types βœ… Yes
FCR 0x02 Write FIFO Control Register β€” enables/clears FIFOs βœ… Yes
LCR 0x03 R/W Line Control Register β€” sets word length, stop bits, parity βœ… Yes
MCR 0x04 R/W Modem Control Register β€” controls OUT2, RTS, DTR βœ… Yes (for enabling IRQ)
LSR 0x05 Read Line Status Register β€” RX ready, TX ready, errors βœ… Yes
DLL 0x00 R/W Divisor Latch LSB β€” set baud rate (if DLAB=1) βœ… Yes
DLM 0x01 R/W Divisor Latch MSB β€” set baud rate (if DLAB=1) βœ… Yes

uartinit

void
uartinit(void)
{
  // disable interrupts.
  WriteReg(IER, 0x00);

  // special mode to set baud rate.
  WriteReg(LCR, LCR_BAUD_LATCH);

  // LSB for baud rate of 38.4K.
  WriteReg(0, 0x03);

  // MSB for baud rate of 38.4K.
  WriteReg(1, 0x00);

  // leave set-baud mode,
  // and set word length to 8 bits, no parity.
  WriteReg(LCR, LCR_EIGHT_BITS);

  // reset and enable FIFOs.
  WriteReg(FCR, FCR_FIFO_ENABLE | FCR_FIFO_CLEAR);

  // enable transmit and receive interrupts.
  WriteReg(IER, IER_TX_ENABLE | IER_RX_ENABLE);

  initlock(&uart_tx_lock, "uart");
}

2.4. Console Input: read -> sysread -> fileread -> consoleread -> waiting for interrupt

The following code path is from system call read to read from console. The kernel process thread will enter into sleep if no input is given. This part of code belongs to the bottom half of driver.

fd opened when the first init process started init.c

int
main(void)
{
  if(open("console", O_RDWR) < 0){
    mknod("console", CONSOLE, 0);
    open("console", O_RDWR);
  }
  dup(0);  // stdout
  dup(0);  // stderr
}

One thing to note is mknod here: It creates a special file named “console” in the root directory / with: Major number: 1 (i.e., CONSOLE) Minor number: 0

This tells the OS: Whenever a user reads/writes to this file, dispatch the I/O to the driver registered as major number 1 β€” which is the console driver.

The mapping between major number to console driver funciton is defined in the device switch table:

devsw[] maps major device numbers to their driver functions. Each entry has a read and write function pointer. This belongs to the file system concept.

File type Where handled Example syscall path
----------- ------------------------- -----------------------------------
`FDINODE` File system code (`fs.c`) `read()` β†’ `fileread()` β†’ `readi()`
`FDDEVICE` Device driver (`devsw[]`) `read()` β†’ `devsw[major].read()`
`FDPIPE` Pipe code (`pipe.c`) `read()` β†’ `piperead()`

Normal files created with touch or mkdir belongs to FDINODE, which is handled by file system layer. Not through the device switch table.

The following gives a summary of their differences:

Operation File Type Handled By Code Path
----------------- ------------ ------------------------- ------------------------
`read(“myfile”)` Regular file File system layer `fileread()` β†’ `readi()`
`read(“console”)` Device file Device driver (`devsw[]`) `devsw[CONSOLE].read()`

file.c

struct devsw {
  int (*read)(int, uint64, int);
  int (*write)(int, uint64, int);
};
#define CONSOLE 1

void
consoleinit(void)
{
  // connect read and write system calls
  // to consoleread and consolewrite.
  devsw[CONSOLE].read = consoleread;         // dispatch the read to driver consoleread
  devsw[CONSOLE].write = consolewrite;       // dispatch the write to driver consolewrite
}

The following shows how to read data from file with DEVICE type. Note that the OS needs to place the things read from the device to somewhere in the user space of the process so that the process can access it.

So the addr in the fileread function means the address in process’s virtual address space. And is used by the kernel to place what it reads. The addr is usually a argument input given by int read(int fd, void *buf, int n) system call.

file.c

// Read from file f.
// addr is a user virtual address.
int
fileread(struct file *f, uint64 addr, int n)
{
  // ...
  else if(f->type == FD_DEVICE){
    if(f->major < 0 || f->major >= NDEV || !devsw[f->major].read)
      return -1;
    r = devsw[f->major].read(1, addr, n);
  //...
}

The following is the sysread which will be triggered when system call read is made. sysfile.c

uint64
sys_read(void)
{
  struct file *f;
  int n;
  uint64 p;

  argaddr(1, &p);      // get buffer address argument
  argint(2, &n);       // get number of bytes arg
  if(argfd(0, 0, &f) < 0)       // get fd arg
    return -1;
  return fileread(f, p, n);
}

So currently the code trace is: read -> sysread -> fileread -> consoleread

console buffer is used for console read. console write will write to UART directly and will not be buffered

struct {
  struct spinlock lock;

  // input
#define INPUT_BUF_SIZE 128
  char buf[INPUT_BUF_SIZE];
  uint r;  // Read index
  uint w;  // Write index
  uint e;  // Edit index
} cons;

cons.r – Read Index

    Points to the next character to be read by a process (e.g., the shell or read() syscall).

    Advanced by consoleread() after it consumes a character.

    Sleeps if cons.r == cons.w (i.e., nothing to read).

cons.w – Write Index

    Points to the next position where a new character will be written.

    Advanced by consoleintr() when a character is received (typically from UART via interrupt).

consoleread

int
consoleread(int user_dst, uint64 dst, int n)
{
  target = n;
  acquire(&cons.lock);
  while(n > 0){

    // cons.r == cons.w means the input buffer is empty (nothing typed yet).
    // It goes to sleep waiting for new input.
    while(cons.r == cons.w){
      if(killed(myproc())){
        release(&cons.lock);
        return -1;
      }
      // sleeps on the address &cons.r and releases the lock while sleeping
      // When an interrupt occurs (user types something), the UART interrupt
      // handler will wake it up via wakeup(&cons.r) (done in consoleintr()).
      // process context will be changed to another process and resumed back
      // once interrupt is hit
      sleep(&cons.r, &cons.lock);

      // now we resumed and will re-check the while loop condition
    }

    // now we have cons.r != cons.w, we read something from uart

    // copy the input byte to the user-space buffer.
    cbuf = c;
    if(either_copyout(user_dst, dst, &cbuf, 1) == -1)
      break;

    // the rest of the code is processing char by char
    // break if we hit EOF, \n, or read n chars.
  }

  release(&cons.lock);

}

consoleintr uartintr() calls this for input character.

void
consoleintr(int c)
{
    switch(c){
    case C('P'):  // Print process list. Ctrl-P
      procdump();
      break;
    default:
      if(c == '\n' || c == C('D') || cons.e-cons.r == INPUT_BUF_SIZE){
        // wake up consoleread() if a whole line (or end-of-file)
        // has arrived. or the input buffer is full
        cons.w = cons.e;
        wakeup(&cons.r);
      }
}

uartintr

void
uartintr(void)
{
  ReadReg(ISR); // acknowledge the interrupt

  // for input: read and process incoming characters.
  while(1){
    int c = uartgetc();
    if(c == -1)
      break;
    consoleintr(c);
  }

  // for output send buffered characters.
  acquire(&uart_tx_lock);
  uartstart();
  release(&uart_tx_lock);
}

*

3. Code: Console Output

3.1. Console Output: write -> syswrite -> filewrite -> consolewrite -> uartputc

A write system call on a file descriptor connected to the console eventually arrives at uartputc. consolewrite

int
consolewrite(int user_src, uint64 src, int n)
{
  int i;

  for(i = 0; i < n; i++){
    char c;
    // copy one byte from user virtual address space to kernel virtual address space
    if(either_copyin(&c, user_src, src+i, 1) == -1)
      break;
    uartputc(c);
  }
  return i;
}

first byte is triggered by write -> uartputc

uartputc

// add a character to the output buffer and tell the
// UART to start sending if it isn't already.
// blocks if the output buffer is full.
// because it may block, it can't be called
// from interrupts; it's only suitable for use
// by write().

void
uartputc(int c)
{
  acquire(&uart_tx_lock);

  if(panicked){
    for(;;)
      ;
  }
  while(uart_tx_w == uart_tx_r + UART_TX_BUF_SIZE){
    // buffer is full.
    // wait for uartstart() to open up space in the buffer.
    sleep(&uart_tx_r, &uart_tx_lock);
  }
  uart_tx_buf[uart_tx_w % UART_TX_BUF_SIZE] = c;
  uart_tx_w += 1;
  uartstart();
  release(&uart_tx_lock);
}

Each time the UART finishes sending a byte, it generates an interrupt. uartintr calls uartstart Thus if a process writes multiple bytes to the console, typically the first byte will be sent by uartputc’s call to uartstart, and the remaining buffered bytes will be sent by uartstart calls from uartintr as transmit complete interrupts arrive.

uartstart

// if the UART is idle, and a character is waiting
// in the transmit buffer, send it.
// caller must hold uart_tx_lock.
// called from both the top- and bottom-half.
void
uartstart()
{
  while(1){
    if(uart_tx_w == uart_tx_r){
      // transmit buffer is empty.
      return;
    }

    if((ReadReg(LSR) & LSR_TX_IDLE) == 0){
      // the UART transmit holding register is full,
      // so we cannot give it another byte.
      // it will interrupt when it's ready for a new byte.
      return;
    }

    int c = uart_tx_buf[uart_tx_r % UART_TX_BUF_SIZE];
    uart_tx_r += 1;

    // maybe uartputc() is waiting for space in the buffer.
    wakeup(&uart_tx_r);

    WriteReg(THR, c);
  }
}

4. Concurrency consider in driver

Context: consoleread() and consoleintr() (UART interrupt handler) These two kernel functions interact via:

A shared buffer: cons.buf[]

Shared indexes: cons.r, cons.w, cons.e

And shared synchronization: cons.lock

We need to keep concurrency well when interacting with the above resources You may have noticed calls to acquire in consoleread and in consoleintr. These calls acquire a lock, which protects the console driver’s data structures from concurrent access. There are three concurrency dangers here:

  1. two processes on different CPUs might call consoleread at the same time;
  2. the hardware might ask a CPU to deliver a console (really UART) interrupt while that CPU is already executing inside consoleread;
  3. and the hardware might deliver a console interrupt on a different CPU while consoleread is executing. These dangers may result in race conditions or deadlocks.

Chapter 6 explores these problems and how locks can address them.

Another way in which concurrency requires care in drivers is that one process may be waiting for input from a device, but the interrupt signaling arrival of the input may arrive when a different process (or no process at all) is running. Thus interrupt handlers are not allowed to think about the process or code that they have interrupted. For example, an interrupt handler cannot safely call copyout with the current process’s page table. Interrupt handlers typically do relatively little work (e.g., just copy the input data to a buffer), and wake up top-half code to do the rest.

4.1. xv6 does not support multiplexing I/O

The console is used globally as one terminal. xv6 supports background process which may leads to shell process and background process read from the same console I/O. If this happens, Input/output may become interleaved, confusing, or broken.

5. Timer Interrupt

Xv6 uses timer interrupts to maintain its clock and to enable it to switch among compute-bound processes; xv6 sets up the RISC-V timer interrupt to fire regularly (e.g., every 100 Hz).

xv6 handles timer interrupts completely different from the trap mechanism we discussed before.

timer interrupt triggers yield which triggers user interrupt or kernel interrupt to give up CPU.

Timer interrupt can happen in user mode or kernel mode. But no matter what they will trigger yield. And during yield (context switch) it is always in the kernel mode/context.

The yield will trigger context switch to sheduler context, which involves store current context regsiters and load new context registers.

5.1. Why need to handle interrupt differently

RISC-V requires that timer interrupts be taken in machine mode, not supervisor mode. For other interrupts we take them under supervisor mode.

RISCV machine mode executes without paging, and with a separate set of control registers, so it’s not practical to run ordinary xv6 kernel code in machine mode.

As a result, xv6 handles timer interrupts completely separately from the trap mechanism laid out above.

Level Name Used For
------ ------------------- ---------------------------------------------------------------
M-mode Machine mode Highest privilege, handles low-level traps and timer interrupts
S-mode Supervisor mode Kernel code (e.g., xv6 kernel)
U-mode User mode User applications

5.1.1. Timer Interrupt Delegation

xv6 “gets away” with handling timer interrupts in S-mode by delegating them via mideleg. It’s a valid RISC-V feature.

start.c

// entry.S jumps here in machine mode on stack0.
void
start()
{
  // delegate all interrupts and exceptions to supervisor mode.
  w_medeleg(0xffff);
  w_mideleg(0xffff);
}

devintr() identifies the cause as a timer interrupt and returns 2

int
devintr()
{
  if(){

  } else if(scause == 0x8000000000000005L){
    // timer interrupt.
    clockintr();
    return 2;
  }
}

Question: RSIC-V supports delegating Timer Interrupt to Supervisor Mode, then why does it also supports delegate to Machine mode?

It must support both embedded (no OS) and full OS environments. For some embeded systems, we don’t even have OS, everything runs under machine mode.

5.2. Timer Interrupt in user mode

When a timer interrupt occurs while in user mode, control is transferred to the trap handler. usertrap() in kernel/trap.c This causes the current process to yield the CPU, allowing the scheduler to pick another process.

//
// handle an interrupt, exception, or system call from user space.
// called from trampoline.S
//
void
usertrap(void)
{
  // give up the CPU if this is a timer interrupt.
  if(which_dev == 2)
    yield();

}

5.3. Timer Interrupt in kernel mode

This is rare but necessary to handle preemption even during kernel code execution, such as when doing a long computation or stuck in a system call.

// interrupts and exceptions from kernel code go here via kernelvec,
// on whatever the current kernel stack is.
void
kerneltrap()
{
  // give up the CPU if this is a timer interrupt.
  if(which_dev == 2 && myproc() != 0)
    yield();
}

5.4. yield mechanism

Called when the process wants to voluntarily give up the CPU.

yield

// Give up the CPU for one scheduling round.
void
yield(void)
{
  struct proc *p = myproc();
  acquire(&p->lock);
  p->state = RUNNABLE;         // usually from running to runnable
  sched();                     // will switch to scheduler context
  release(&p->lock);
}

sched

// Switch to scheduler.  Must hold only p->lock
// and have changed proc->state. Saves and restores
// intena because intena is a property of this
// kernel thread, not this CPU. It should
// be proc->intena and proc->noff, but that would
// break in the few places where a lock is held but
// there's no process.
void
sched(void)
{
  int intena;
  struct proc *p = myproc();

  if(!holding(&p->lock))
    panic("sched p->lock");
  if(mycpu()->noff != 1)
    panic("sched locks");
  if(p->state == RUNNING)
    panic("sched RUNNING");
  if(intr_get())
    panic("sched interruptible");

  intena = mycpu()->intena;
  swtch(&p->context, &mycpu()->context); // switch from process context to cpu sheduler context
  mycpu()->intena = intena;
}

context

// Saved registers for kernel context switches.
struct context {
  uint64 ra;
  uint64 sp;

  // callee-saved
  uint64 s0;
  uint64 s1;
  uint64 s2;
  uint64 s3;
  uint64 s4;
  uint64 s5;
  uint64 s6;
  uint64 s7;
  uint64 s8;
  uint64 s9;
  uint64 s10;
  uint64 s11;
};

Comparing with trap, swtch saves less registers:

Context Save Who Saves It When What For
--------------- --------------------- --------------------------------- ----------------------------------------------------------------
`trap` frame Hardware + `usertrap` On trap/interrupt from user mode Save full CPU state so user process can resume after kernel code
`swtch` context `swtch()` (assembly) During `sched()` / context switch Save minimal state needed to resume kernel thread execution

Remember that context switches only happens in the kernel mode.