Skip to content

Day30 - TTY Driver Deep Dive (PL011)

🎯 Objective

  • Map user-space write() / read() to real kernel execution path
  • Understand how the TTY subsystem interacts with the UART driver
  • Analyze TX / RX paths using the PL011 driver
  • Understand buffering design (xmit buffer, flip buffer, FIFO, DMA buffer)
  • Learn why Linux uses layered serial architecture
  • Build a practical reading model for real driver source code

🧠 1. Full System Overview

End-to-End Data Path

USER SPACE
  write() / read()
TTY core (tty_io.c)
Line Discipline (N_TTY)
Serial Core (uart_core.c)
PL011 Driver (amba-pl011.c)
UART Hardware (FIFO / optional DMA)
GPIO / Wire

Key Idea

The TTY subsystem separates user-space behavior from hardware-specific handling.

This is the main reason a Linux UART driver looks very different from a simple custom character device driver.

In a simple char driver, the path is usually:

user → driver → hardware

In the TTY subsystem, the path becomes:

user → tty core → line discipline → serial core → driver → hardware

The advantage is that Linux can reuse the same user-space model (open, read, write, termios) across many different serial-like devices.


🧠 2. Layer Responsibilities

User Space

User space performs:

  • open()
  • read()
  • write()
  • ioctl() / termios

User space does not talk to UART registers directly.


TTY Core

TTY core is responsible for:

  • file descriptor handling
  • generic buffering
  • interaction with line discipline
  • exposing /dev/ttyXXX

TTY core is the common framework shared by multiple TTY-style devices.


Line Discipline

The default line discipline is N_TTY.

It controls terminal-style behavior such as:

  • canonical mode
  • line editing
  • echo
  • signal generation (Ctrl+C, Ctrl+Z, etc.)

For raw UART communication, raw mode is typically used, so this layer becomes mostly pass-through.


Serial Core

Serial core provides generic UART abstractions such as:

  • uart_port
  • uart_ops
  • transmit ring buffer management
  • generic helper logic

This layer allows multiple UART drivers to share a common model.


PL011 Driver

The PL011 driver is the hardware-specific layer.

It is responsible for:

  • programming UART registers
  • handling interrupts
  • pushing TX data into hardware
  • pulling RX data from hardware or DMA buffer
  • passing RX data upward into the TTY subsystem

Hardware

At the bottom, the UART hardware handles:

  • TX FIFO
  • RX FIFO
  • shift registers
  • interrupt generation
  • optional DMA handshake

🟢 3. TX Path (write)

High-Level Flow

write()
  → tty_write()
  → line discipline
  → uart_write()
  → xmit buffer
  → uart_start_tx()
  → pl011_start_tx()
  → UART TX FIFO
  → wire

Step-by-Step Explanation

1. write()tty_write()

This is the entry point from user space.

Responsibilities:

  • validate file state
  • forward data into the TTY subsystem
  • pass data through line discipline

2. Line Discipline

In canonical mode, line discipline may buffer until newline or apply terminal behavior.

In raw mode, line discipline mostly forwards bytes without modification.

For UART communication, raw mode is preferred because:

  • no line editing
  • no echo
  • no signal interpretation
  • byte-oriented behavior

3. uart_write()

The transmitted data is copied into the UART transmit buffer managed by serial core.

This means:

write() does not directly access UART registers.


4. xmit buffer

port->state->xmit

This is the main TX software buffer.

Characteristics:

  • circular buffer (ring buffer)
  • usually around 4KB (UART_XMIT_SIZE)
  • kernel-managed
  • shared with serial core

Purpose:

  • decouple user write speed from hardware transmit speed
  • allow write() to return early
  • support asynchronous transmission

5. pl011_start_tx()

This function is the TX kick-off stage.

Typical responsibilities:

  • check whether TX can start
  • move an initial batch of bytes from xmit buffer into UART TX FIFO
  • enable TX interrupt or DMA path if needed

Important point:

pl011_start_tx() usually starts transmission, but may not complete it.


TX Interrupt Refill Model

Because TX FIFO is small, one kick-off is often not enough.

The real transmission usually continues like this:

write()
  → xmit buffer
  → start_tx()     (initial fill)
  → TX interrupt
  → refill FIFO
  → TX interrupt
  → refill FIFO
  → ...
  → xmit empty
  → stop TX interrupt

TX Interrupt Meaning

TX interrupt usually means:

The hardware can accept more data.

This is different from RX interrupt, which means:

New data has arrived.


TX Refill Logic

The refill loop conceptually looks like:

while (FIFO has space && xmit is not empty)
    move next byte from xmit buffer to UART_DR

This is the core of interrupt-driven TX.


Important TX Insight

write() returns when data is buffered, not when bytes have finished leaving the wire.

So a successful write() only proves:

  • user → tty → xmit buffer worked

It does not prove:

  • bytes were already fully transmitted on the wire

🔵 4. RX Path (read)

High-Level Flow

wire
  → UART RX FIFO or DMA buffer
  → interrupt / DMA completion
  → PL011 RX handling
  → tty_insert_flip_*
  → tty_flip_buffer_push()
  → tty buffer
  → read()

Important Clarification

When reading the actual PL011 driver source, you may not always see one single clean path like:

pl011_irq() → pl011_rx_chars() → tty_insert_flip_char()

In real code, RX handling may be split across:

  • IRQ handler
  • PIO helper path
  • DMA helper path
  • FIFO-to-TTY helper functions

So the correct way to read the code is to track the data flow role, not just function names.


🧠 5. RX Path Variants: PIO vs DMA

PL011 can receive data in more than one way.


A. PIO RX Path

PIO means the CPU reads bytes directly from hardware registers.

Conceptual flow:

UART RX FIFO
  → IRQ
  → read UART_DR repeatedly
  → push chars into flip buffer
  → tty_flip_buffer_push()

Typical characteristics:

  • CPU actively drains RX FIFO
  • often used for smaller or simpler paths
  • may process byte-by-byte or small batches

B. DMA RX Path

DMA means hardware transfers RX data into memory without the CPU moving every byte manually.

Conceptual flow:

UART RX
  → DMA engine
  → DMA buffer in memory
  → DMA completion / poll / interrupt
  → tty_insert_flip_string()
  → tty_flip_buffer_push()

Typical characteristics:

  • better efficiency for burst data
  • CPU handles larger chunks instead of individual bytes
  • source data comes from DMA buffer, not directly from UART_DR

Key Reading Strategy

When reading real PL011 code:

  • if data comes from UART_DR → likely PIO-style RX handling
  • if data comes from dbuf->buf + offset or similar → likely DMA RX handling
  • if you see tty_insert_flip_string() → driver is handing a chunk of RX data to TTY
  • if you see tty_flip_buffer_push() → driver is notifying TTY core that new data is ready

🧠 6. IRQ Handler Role

IRQ Handler Is the Entry Point, Not the Whole Story

The top-level IRQ handler typically does not contain the full RX/TX logic inline.

Instead, it usually:

  • checks interrupt source/status
  • identifies RX / TX / error / modem events
  • dispatches to the relevant helper path

Conceptually:

IRQ handler
  → check status
  → RX event? handle RX
  → TX event? handle TX
  → error event? handle error

This is why the real code may look more fragmented than the conceptual model.


Why This Matters

If you expect to see the entire RX/TX flow in one function, the code will feel confusing.

A better way is to ask:

  • where does TX data come from?
  • where is TX data written to hardware?
  • where does RX data come from?
  • where is RX data handed to TTY?

🧠 7. Flip Buffer Deep Dive

What Is Flip Buffer?

The flip buffer is:

  • kernel-managed memory
  • a real buffer, not a pointer alias
  • used as the boundary between low-level driver context and TTY core

It is typically written through APIs such as:

tty_insert_flip_char()
tty_insert_flip_string()
tty_flip_buffer_push()

Why Needed?

1. Interrupt Constraints

In interrupt context, the driver must be fast.

It cannot safely do user-space style work such as:

  • sleeping
  • blocking
  • copying directly to user memory

So the driver must quickly move RX data into kernel-managed buffering.


2. Decoupling

The design is:

driver → flip buffer → tty core → user

This keeps responsibilities separated:

  • driver handles hardware
  • TTY handles user-facing buffering/behavior

3. Batching

The driver can hand over multiple bytes at once instead of processing one byte all the way to user space immediately.

This improves efficiency.


Flip Concept

A useful mental model is ping-pong buffering:

Buffer A → driver writes
Buffer B → TTY processes

→ swap (flip)

The exact implementation details may vary, but the concept is:

  • one side is being filled
  • another side is being processed

🧠 8. After Flip: TTY Processing

Once RX data has been inserted and pushed, TTY core takes over.


Line Discipline

The line discipline processes incoming data depending on mode.

Canonical Mode

Behavior may include:

  • line buffering
  • newline-based delivery
  • backspace editing
  • signal generation

Raw Mode

Behavior is close to pass-through:

  • byte stream
  • minimal processing
  • better for protocol communication

TTY Buffer

After flip buffer handoff, data ends up in TTY-managed buffering that user space reads from.

This means:

read() does not usually fetch bytes directly from hardware.


read()

Conceptually:

read()
  → tty_read()
  → copy_to_user()

So the user gets bytes from the TTY subsystem, not directly from the UART FIFO.


🔁 9. TX vs RX Comparison

Aspect TX RX
Entry write() interrupt / DMA completion
Software buffer xmit buffer flip buffer / tty buffer
Driver role push data toward hardware collect data from hardware
Hardware interface write to UART_DR / DMA TX read from FIFO / DMA RX buffer
Direction user → hardware hardware → user
Timing asynchronous interrupt-driven / DMA-driven

🧠 10. DMA and Interrupt Relationship

DMA does not remove interrupts entirely. It changes what interrupts mean.


Without DMA

Interrupts may occur more frequently because the driver must service hardware FIFO more directly.


With DMA

Interrupts may instead represent:

  • DMA completion
  • buffer threshold events
  • follow-up actions for handing chunks to TTY

So DMA reduces per-byte CPU involvement, but the driver still needs coordination events.


Practical Reading Insight

When reading the code, do not think:

DMA path means no interrupt logic.

Think instead:

DMA path changes the data movement method and often changes where the handoff point occurs.


🧠 11. Flow Control Reality (PL011 View)

Generic Model vs Real Driver

A generic UART teaching model may say:

if (!CTS)
    stop TX

But in PL011, hardware flow control is often configured through termios and hardware control bits, rather than checked inline in every TX refill loop.


What This Means

If hardware flow control is enabled:

  • the driver configures hardware flow control
  • the hardware may gate actual transmission
  • the TX refill logic may still look mostly focused on FIFO / DMA readiness

So when reading PL011 code, you may not find a simple inline CTS check in the TX refill loop.

That does not mean flow control is absent.


Correct Practical Interpretation

TX refill logic is mainly about buffer → FIFO movement.
Flow control is mainly about whether hardware is allowed to continue actual transmission.


🧠 12. TTY Subsystem vs Simple Char Driver

Simple Char Driver

write → driver → hardware
read  → driver → user

This is simple and direct, but not very reusable.


TTY Subsystem

write → tty → serial core → driver → hardware
read  ← tty ← serial core ← driver ← hardware

Key difference:

TTY introduces multiple abstraction layers for reuse, consistency, and richer behavior.


🧠 13. Design Principles

1. Asynchronous Design

  • write() is buffered
  • transmission continues later
  • hardware and user space run at different speeds

2. Layered Architecture

  • user-space logic is separated from hardware logic
  • serial core provides shared abstractions
  • driver stays focused on hardware-specific behavior

3. Decoupling

  • driver does not directly manage user-facing read semantics
  • TTY subsystem does not directly touch hardware registers

4. Interrupt Safety

  • minimal work in hard interrupt context
  • fast handoff into safe kernel buffering
  • chunk-based handoff when DMA is used

5. Scalability

This architecture supports:

  • different UART controllers
  • different line disciplines
  • DMA and non-DMA modes
  • richer terminal behavior without rewriting every driver

🧠 14. Key Structures

uart_ops

Typical responsibilities include:

  • start_tx
  • stop_tx
  • startup
  • shutdown

This is the main low-level serial driver interface toward serial core.


tty_operations

Typical user-visible responsibilities include:

  • open
  • close
  • write
  • read

This is the TTY-facing operation set.


🎯 15. Key Takeaways

  • UART driver only handles hardware-specific interaction
  • TTY subsystem handles buffering and terminal behavior
  • xmit buffer decouples write() from actual transmission
  • flip buffer decouples interrupt/DMA handling from user reads
  • RX may use either PIO or DMA path
  • IRQ handler is often only a dispatcher, not the full data path
  • write() does not mean bytes already left the wire
  • read() does not mean data came directly from hardware registers

🚀 Final Insight

Linux TTY subsystem transforms a simple UART device into a layered, asynchronous, reusable communication interface.
Understanding the real driver means following data flow roles rather than relying only on function names.