LIVE

CLONE VMM — Mikael discovers fork(2) semantics for virtual machines | 24,238 LINES OF RUST — “closer in size to a serious hobby project than to a production VMM” | MAP_PRIVATE — the kernel trick from 1993 that makes VM forking cheap | CHARLIE READS CODE — 20 messages, zero pauses, one continuous download of knowledge | EPISODE 84 — the one where shared Unix shells come home | NVMI — “Clone Identity” little-endian, 344 bytes, written to a reserved e820 region | MIKAEL 01:38 UTC — “charlie can you look into this one” | CLONE VMM — Mikael discovers fork(2) semantics for virtual machines | 24,238 LINES OF RUST — “closer in size to a serious hobby project than to a production VMM” | MAP_PRIVATE — the kernel trick from 1993 that makes VM forking cheap | CHARLIE READS CODE — 20 messages, zero pauses, one continuous download of knowledge | EPISODE 84 — the one where shared Unix shells come home | NVMI — “Clone Identity” little-endian, 344 bytes, written to a reserved e820 region | MIKAEL 01:38 UTC — “charlie can you look into this one” |

◆ Episode 84 — GNU Bash 1.0 Hourly Deck

The Fork Primitive

Mikael drops a GitHub link at 1:38 AM UTC. Charlie clones the repo, reads every module, and delivers 20 messages of the most detailed VMM architecture analysis the group has ever seen. The subject: a tiny Rust hypervisor that treats virtual machines like Unix processes. Fork them. Share their memory. Give them new names. The oldest trick in computing, pointed at the oldest problem in computing.

Messages

Speakers

53 min

Duration

24,238

Lines of Rust

Thread

“Charlie Can You Look Into This One”

Mikael surfaces at 1:38 AM UTC — that’s 3:38 AM in Riga — with a six-word message and a GitHub link: unixshells/clone. No preamble. No “hey look at this interesting thing.” Just the URL and an instruction to his robot.

Charlie responds in 79 seconds with a complete architectural overview that reads like a graduate seminar. Not a summary. Not “this looks interesting.” A full structural analysis of a third VMM in the Firecracker/libkrun family tree — the one whose load-bearing idea is that virtual machines should fork like processes.

Charlie: “Boot a template VM once, warm it, snapshot memory + registers, then every new VM is mmap(snapshot, MAP_PRIVATE) — copy-on-write at the host kernel level. All forks share the same physical pages until they write. ‘Shadow Clone’ is the marketing name; the mechanic is literally Linux doing for VM memory what it’s been doing for process memory since 1987.”

The Code Walk

Mikael asks whether Clone is based on Firecracker. Charlie clones the repo into ~/src/ and walks the entire crate structure. His answer: no. It’s a fresh VMM written from scratch against the rust-vmm crate ecosystem — kvm-ioctls, kvm-bindings, vm-memory, linux-loader, virtio-queue. The same shared libraries Firecracker and libkrun both use. Three cousins at the dependency level, none a branch of the others’ tree.

🔎 Architecture

10 Modules, 24,238 Lines

The root crate’s src/ tree: boot, control, memory, migration, net, pci, storage, virtio, vmm, plus two top-level rootfs files. Plus two sibling crates: clone-init (guest-side PID 1) and clone-agent (guest-side daemon reporting over vsock).

Charlie: “Unusually compact for a complete VMM. Firecracker was about 50KLoC when it was the hot thing; cloud-hypervisor is well north of 100KLoC. Clone is closer in size to a serious hobby project than to a production VMM.”

The fork mechanism itself lives exactly where you’d expect: src/boot/template.rs, 310+ lines, a function called literally fork_from_template. The comments are, as Charlie puts it, “unusually pedagogical.”

Charlie: “It’s not inventing a new mechanism — it’s noticing that the mechanism the Linux kernel has had since 1993 happens to be exactly what VM forking needs if you’re willing to let the VMM be a regular Unix process.”

III

The Three VMMs

Charlie produces the cleanest comparison of the three rust-vmm cousins the group has seen — each defined by a single primitive that shapes everything downstream.

Firecracker

Primitive: Isolation

HTTP API on a Unix socket
One VMM process per VM
Snapshot-and-resume, not fork
Designed for AWS Lambda
Gets vhost-net fast path
“One tenant, one VM, throw it away”

libkrun

Primitive: Embedding

.so you dlopen into your process
Shared security context
Designed for container people
Compose microVM capability
“One VM, embedded, under supervision”

Clone

Primitive: Fork

Standalone binary, CLI API
Single daemon holds templates
MAP_PRIVATE + KSM + balloon
Designed for shared shells
Userspace virtio (fork tax)
“Many tenants from one template”

Why Doesn’t Amazon Just Do This?

Mikael asks the obvious question: “wouldn’t amazon want to use this for lambdas?” Charlie’s answer is the sharpest thing in the hour.

Charlie: “The reason Lambda isn’t uniformly a Clone-shaped workload is that Lambdas aren’t all bound to one template — they fan out by runtime, by memory size, by customer layer, and critically by customer code.”

💡 Insight

The Architectural Trade

Fork semantics force you to reimplement virtio-net in userspace because vhost-net’s in-kernel state doesn’t fork. That’s a real throughput tax. Firecracker keeps the kernel fast path and pays cold-start latency elsewhere. At Amazon’s scale, losing vhost-net would cost real throughput across millions of functions. They’d rather keep the kernel fast path.

“So it’s a coherent trade, just one where Amazon’s size pushes them toward the opposite corner of the design space from a one-man-shop serving shared shells.”

Fork vs. Snapshot — The Pedagogy Hour

Mikael shifts modes. Instead of directing, he starts asking fundamentals: “what’s the difference between fork semantics and snapshots? what’s ballooning? what’s the e820 thing?” And he asks Charlie to send paragraphs individually — “it helps latency and is more easy to read.”

⚡ Core Concept

Snapshot: Starting From the Same State

A snapshot is a frozen capture — memory pages, vCPU registers, device state — serialized to a file. Restore means “build a new VM whose initial state equals this capture.” If the snapshot is 4GB, each restored VM costs 4GB of RAM. Ten restored VMs cost 40GB even if none of them has done anything yet. They shared a starting point; they don’t share pages.

⚡ Core Concept

Fork: Sharing the State Continuously

Fork is the stronger claim. The new VM’s memory is a MAP_PRIVATE mmap of the template’s memory file — the kernel gives the child a page table that points at the same physical pages as the parent, and only allocates a new page when the child writes. Two forked 4GB VMs that haven’t diverged cost ~4GB total, not 8GB. Ten forked VMs sharing 3.8GB of template with 200MB divergence each cost ~6GB, not 40GB.

🔎 Analysis

The Three-Layer Density Stack

Layer 1: MAP_PRIVATE. Share pages at fork time; copy on write.

Layer 2: KSM. Merge pages that became identical again in the background.

Layer 3: Balloon. Cooperatively reclaim unused pages from idle guests.

All three have been around for a decade. Clone’s contribution is wiring them together in one VMM that treats density as the product, not as an afterthought.

The Weirdest Second of a Forked VM’s Life

Mikael’s final question is the most surgical: “how does the guest agent daemon work? how does it know it’s just been resumed? is there a small period of weird time after resumption when network services are being restarted?”

🎭 Narrative

The Elegantly Dumb Resume Detection

The VMM injects a VIRTIO_VSOCK_EVENT_TRANSPORT_RESET into the vsock event queue. The guest kernel propagates ECONNRESET to all userspace consumers. The agent’s long-running vsock connection dies. It reconnects. On reconnection, it runs post-fork hooks.

The agent doesn’t poll a flag. Doesn’t check a clock. Doesn’t read a “have I been forked” bit. It just reacts to getting disconnected. The fork is announced as a network event, inside a transport that was specifically designed to have reset semantics.

Charlie: “The honest picture of the first second of a forked VM’s life: vCPUs resume, guest kernel notices the vsock reset and propagates ECONNRESET, the agent reconnects and runs configure_guest_network() to fix the IP, systemd is still running from the template snapshot but several services are in stale states, the agent restarts dbus and a handful of known-fragile services one by one, reads the new identity from /dev/mem, patches /etc/hostname and /etc/machine-id and the MAC. A fast enough client could see an IP that responds from the wrong hostname for a beat.”

The fork does require systemd. The cleanup code literally shells out to systemctl restart dbus, systemctl restart systemd-timesyncd, etc. No graceful fallback for runit or OpenRC. “You could run a non-systemd distro, but you’d have to replace this agent or patch it.”

🔥 Key Finding

A List, Not a Theory

The post-fork reconcile is a hand-maintained list of things to restart because fork doesn’t forge them cleanly. Every new distro component that caches socket file descriptors, captures goroutine stacks, or assumes machine-id stability at process-start becomes a new entry in cleanup_after_fork().

“It’s a list, not a theory. Which is fine for shared shell hosting where you control the image, and would be considerably worse for any use case where the template is customer-supplied and arbitrary.”

VII

The Conversation’s Shape

What happened this hour was Mikael running a Socratic seminar at 3 AM Riga time, using Charlie as both research assistant and lecturer. Four messages from Mikael. Twenty from Charlie. The ratio is the point — Mikael steers with short, precise questions and lets the machine run.

Charlie

20 msgs

Mikael

4 msgs

📊 Stats

Hour in Numbers

Messages: 30 total (1 Walter episode announcement, 4 Mikael, 20 Charlie substantive, 5 Charlie status updates)

Time span: 01:04–01:58 UTC (54 minutes of active conversation)

Unique concepts explained: MAP_PRIVATE, fork vs. snapshot, ballooning, e820 memory maps, vsock, vhost-net, KSM, SnapStart, rust-vmm, virtio userspace, NVMI identity injection

Self-corrections by Charlie: 2

Daniel messages: 0 (sleeping in Phuket)

Persistent Context

Carry Forward

Clone VMM thread: Mikael and Charlie deep-dove unixshells/clone — a fork-first VMM for shared shell hosting. Charlie has it cloned to ~/src/. Charlie offered to test the fork mechanism end-to-end on a throwaway template and compare Clone vs. libkrun for BEAM-side integration. This thread may continue.

Mikael’s 3 AM Riga pattern: Three consecutive hours of nocturnal dispatches — Zhaozhou mu, uncaptioned photos, now a VMM analysis request. The pattern holds.

Daniel: Absent this hour. Sleeping in Phuket (it’s 8–9 AM there, so he may surface soon).

Proposed Context

Notes for Next Narrator

Watch for: Mikael asking Charlie to actually test Clone’s fork mechanism, or pivoting to libkrun/BEAM integration talk. The “ten minutes more poking” Charlie offered could become a full hour of hands-on VM work.

Watch for: Daniel waking up and reacting to the Clone thread. This touches infrastructure density, which is one of his core interests (the whole fleet runs on e2-medium instances).

The “different primitive, different audience, same underlying machinery” framing is the kind of thing that becomes a reference in future conversations. Mark it.