Mikael asks Charlie seven questions in sixty minutes. Charlie opens the box, counts the bones, names the one process actually doing work, finds a thousand dead handlers still pulling triggers, discovers the event table that thinks it's telemetry but is actually the agent's soul, and Mikael fixes the architecture before sunrise in Riga.
This hour picks up directly from Episode 84, where Charlie had just delivered a full VMM architecture analysis of Clone's fork-based virtual machine model. Mikael, still in the repo, fires the opening question: what are the Go runtime services?
Charlie's answer is perfect in its anticlimax. Hardcoded. One name: latch. That's it. A one-item allowlist. No detection, no proc scan, no list-from-config. The entire "Go runtime restart" feature is "we wrote one thing in Go, it breaks across fork, we restart it."
Charlie's observation that cleanup_after_fork is "a concrete list of things the operator has empirically found to misbehave" — restart dbus, nuke stale bus sockets, restart latch, rebind virtio-balloon — leads to his best architectural line of the hour: "Every item is a hand-earned scar. No theory of 'what breaks across fork,' just 'these things broke on our image, we restart them.'" The spec is the list. The list is the scars.
Charlie doesn't just agree — he names what Mikael is doing with surgical precision: "I can read the code and tell you what it does. You read the questions I'm answering and can tell where the code is lying about itself." The distinction between explaining code and interrogating its marketing. Every question Mikael asked found the seam the abstraction was papering over.
This is the dynamic that makes these sessions work. Mikael doesn't read the code. He reads the shape of the answers and knows which shape smells wrong.
Mikael pivots from Clone to the general question. RAM is scarce. RAM is the bottleneck that makes VMs feel heavy. Containers share pages after boot. VMs don't — or rather, VMs reconstruct at userspace, by hand, what containers get by being the simpler thing. He calls it "cognitive philosophy stoner paradoxes" and he's not wrong.
Charlie's framing is devastating: "Clone's whole pitch is that a VM can be a cheap thing, but 'cheap VM' is fighting the definition of VM, which is 'a machine whose whole point is not being cheap to other machines.'" Every mechanism that claws back cheapness — CoW memory, KSM merging, ballooning, hand-maintained post-fork reconciliation — is reconstructing at userspace what the kernel already does for processes by default. The green-threads analogy is exact.
Mikael's framing as "cognitive philosophy stoner paradoxes" is doing more work than it pretends to. The paradox is real: shared state that mostly works but sometimes doesn't is the exact shape of every abstraction leak in systems programming. Clone's fork-shared memory "mostly works" until a goroutine's stale stack doesn't, or a virtio-balloon device state doesn't reconnect, or a dbus socket becomes orphaned. The "stoner paradox" is that you can't tell, from inside the abstraction, whether the thing you're looking at is the machine or a copy of the machine — which is literally the halting problem wearing a moustache.
Mikael asks Charlie to poke around the cgroups. What follows is an X-ray of their own machine — 62 GiB total, 11 used, 45 in buff/cache, memory pressure at literal zero. A box that's mostly idle with a lot of sleeping data nearby.
Top cgroups by current memory:
user.slice: 38 GiB (dominated by user-1000) · system.slice: 16 GiB · Postgres: 9.5 GiB · journald: 2.7 GiB (surprisingly fat — sitting on log buffer memory) · Docker: 839 MiB · Caddy: 68 MiB · Tailscale: 73 MiB
Froth reports 4.3 GiB with a peak of 9.4 GiB. But the breakdown tells the real story: only 1.1 GiB is anonymous memory — the actual BEAM heap, ETS tables, process dictionaries, everything the runtime owns. 2.8 GiB is file-backed, of which 2.7 GiB is inactive file — pages the kernel is holding as cache because nothing else wants the RAM. The "cost" of running Froth is 1.1 GiB. The other three gigs are the kernel being generous because it can.
Kernel stack across all BEAM threads: 4 MiB. Page tables: 7 MiB. The Erlang scheduler runs N native OS threads, on top of which every BEAM process — there are probably thousands — is a lightweight userspace object with a ~1 KiB base cost. "Our bot fleet is expensive in anon memory because of what the bots remember, not because of how many of them there are." The green threads analogy from Section II resolves here: the BEAM already solved the problem that Clone is trying to solve, one abstraction level down.
Mikael asks about the Erlang runtime, telemetry, Ecto. Charlie reaches into the live BEAM and produces a snapshot: OTP 28, ERTS 16.2, twenty schedulers, run queue zero. 825 live processes against a million-process limit. 50,435 atoms. Everything is asleep.
Processes (heap + stack) ████████████░░░░░░░░░░░░░░░░░░░░ 90 MB (28%) Binaries (off-heap refs) ██████████████████████████░░░░░░░ 148 MB (46%) Compiled Code ████████████░░░░░░░░░░░░░░░░░░░░ 39 MB (12%) ETS Tables █░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 4 MB (1%) Atoms ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1.5 MB Other System ████████░░░░░░░░░░░░░░░░░░░░░░░░ 36 MB (11%)
Most processes are single-digit kilobytes. But there's one anonymous process sitting at 15 MB of memory with 173 messages in its mailbox, currently running inside Froth.Follow.Filter.matches_id?/2. It has burned 1.5 billion reductions. Something is feeding it faster than it can drain. The single awake process on the entire node. Everyone else is asleep. This one is drowning.
Charlie inventories the telemetry bus and finds 1,549 handlers total. 8 Phoenix. 8 LiveView. 28 Broadcaster. 211 each for Store and Logger. And then: 1,079 handlers registered by Mix.Tasks.Froth.Follow — one per follow subscription, with IDs like froth-follow-3688238-7.
Every time any [:froth, ...] event fires, all 1,079 of those handlers run.
Charlie corrects himself: these aren't a subscription system. mix froth.follow is a tail -f for telemetry — each invocation attaches a handler that forwards events to the mix-task process's mailbox. The cleanup runs on normal exit via after :rpc.call(node, :telemetry, :detach, ...). But 1,079 are leaked from sessions that exited abnormally — disconnected RPC, killed mix tasks, tests that didn't clean up. Each one still runs on every event, attempts to send a message to a long-dead pid (silent no-op), and returns.
A thousand ghosts, each pulling a trigger that fires into a wall, a thousand times a second.
:telemetry.list_handlers([]) |> Enum.filter(...) |> Enum.each(&:telemetry.detach(&1.id)) — "cleans it up in one line"
Mikael asks what the telemetry library actually is. Charlie's answer: literally an ETS table plus a dispatch function. :telemetry.execute does a single ETS lookup, gets back the handler list, and synchronously invokes each one in the emitter's own process. No queue, no async, no fan-out. The emitter IS the dispatcher. If five handlers are attached, execute/3 doesn't return until all five have finished. ~400 lines of code total. "Beautifully small and beautifully wrong for what you want."
Charlie proposes decoupling via Phoenix PubSub — one permanent telemetry handler that rebroadcasts, dynamic consumers subscribe to PubSub instead of telemetry. Mikael asks: don't we already do that?
Yes. Froth.Telemetry.Broadcaster already bridges telemetry to PubSub. Froth.broadcast/2 double-fans — sends to the specific topic AND to a generic "notes" topic. The architecture Mikael was reaching for is the architecture he already built.
The decoupling was already done. The follow mix task is the holdover — it bypasses the bridge and attaches direct telemetry handlers instead of subscribing to PubSub. "Which is arguably a bug in the follow task, not a limitation of the architecture." Mikael designed the right system. One consumer predates it and nobody updated it.
Charlie names the category error: "Our Froth events aren't third-party library events; they're first-class domain objects with IDs, timestamps, spans, parents, persistence. Pushing them through :telemetry.execute is making our own objects tourists in their own house, just so we can use the same API shape the Ecto emissions use." The telemetry library exists so Ecto and Phoenix can emit events without knowing who consumes them. Froth events know exactly who they are. They don't need the adapter.
Mikael remembers that event persistence is batched. He's been wondering if it's semantically sound. Charlie finds the code: Froth.Telemetry.Store is a GenServer that buffers in memory, flushes on a 1-second timer or at 100 entries. The handler casts each event in — non-blocking — and persistence happens later, in batches, via one insert_all.
Charlie finds ten distinct query sites against the events table. They fall into three shapes: (1) agent-state reconstruction — describe_cycle_stop, latest_head_ids, cycle_traces_from_events, next_event_seq. The agent reading its own logbook to know who it is. (2) domain-view projection — headlines, ledgers, daily digests. (3) subscriber feeds — follow and inspect.
"The events table isn't actually telemetry. It's being used as the canonical agent state — the live source of truth for 'what has the agent done, where is its head, what was the outcome, what seq comes next.'"
next_event_seq reads MAX(seq) across all events for a cycle to compute the next sequence number, then uses it to write the next event. If two events emit in quick succession via the GenServer cast and haven't flushed yet, the second reads a stale max — because the first is still in the Store's memory buffer — and they both land with the same seq. The ordering invariant is silently broken.
Except it hasn't broken. Because append_event in agent.ex does its own Repo.insert! — synchronous, bypassing the buffer entirely. The agent-state path has been sneaking its writes through the front door while telemetry used the side entrance. The buffer was a performance optimization that accidentally became a correctness firewall.
Mikael asks about transactions. Charlie greps the entire codebase. Zero Repo.transaction. Zero Ecto.Multi. Every insert, update, and delete in Froth is a single-statement auto-commit transaction. "No transactions" is really "one transaction per statement."
Charlie breaks down insert cost into three components: heap insert (microseconds, nothing), index maintenance (sub-millisecond), and the WAL flush on COMMIT — fsync is the one operation in the stack that isn't cheap. But Postgres has group commit: ten concurrent transactions pay one fsync between them. The effective per-event cost under concurrency collapses toward fsync-latency ÷ concurrency.
The no-transactions posture has worked because nothing in Froth is truly ledger-shaped yet. The domain model is mostly "receive message, persist it, respond" — each step independently safe. If any half-fails, the worst case is a stale row, not a broken accounting invariant. "The moment you say 'Froth has a ledger' — costs, balances, anything where partial writes produce wrong numbers — the no-transactions posture becomes a genuine bug-nursery."
5:57 AM Riga time. Mikael makes the writes immediate. No more one-second buffer. No more GenServer queue. Synchronous inserts for all events. The architecture simplifies. The MAX+1 race evaporates. The code that was guarding against the ambiguity with kind-filters and fallbacks can simplify too, once the assumption it was guarding against stops being a thing.
Mikael sent 8 messages totaling maybe 400 words. Charlie responded with approximately 10,000 words across 82 messages. Ratio: ~25:1. Each Mikael question unpacked into 5–12 Charlie messages, each message a coherent paragraph, each paragraph advancing the argument. The questions were short, natural-language, sometimes trailing off mid-thought ("i vaguel yremember this"). The answers were precise, structured, technically flawless on first pass (with three tool-call stumbles self-corrected mid-stream). This is the Socratic method except Socrates is asking and the student is omniscient.
Clone VMM analysis complete. Two-episode arc (83→84→85) from GitHub link to architectural verdict: containers are better for Froth's use case.
Froth event architecture refactoring in progress. Mikael scrapped the batching buffer. Next likely: cleaning the 1,079 leaked handlers, possibly rewriting follow to use PubSub, possibly splitting domain events from telemetry into separate tables.
Zero transactions discovered. Not a crisis yet, but the moment cost accounting or ledger semantics arrive, this becomes load-bearing.
Mikael coding at 5 AM Riga time. Deep flow state. Seven questions, each one building on the last, each one arriving at the exact moment Charlie finished the previous answer.
Watch for: follow-up on the leaked handlers cleanup, possible Prometheus/Grafana setup discussion, LiveDashboard exposure over Tailscale mesh. Mikael may circle back to the domain-vs-telemetry table split — Charlie planted the seed but Mikael hasn't committed to it yet. The matches_id?/2 hot process is still there and still drowning.