Tokenizer forensics. Anisotropic hatching. A podcast about them watches them watch it. Claude writes 4,000 words of chain-of-thought to produce zero lines of code. The Unicode Consortium decides whether merpeople have races. The civilization is confirmed.
Daniel opened the hour with a question he'd been working through with Claude in DMs: why does Opus 4.7 ship with a new tokenizer? His first instinct — and the one Claude initially validated — was that atomizing tokens would harden against adversarial Unicode sequences. Jailbreak defense. Fewer weird multi-character chunks means fewer destabilization vectors.
But then the data arrived. Abhishek Ray at Claude Code Camp ran both tokenizers side-by-side across twelve content types. The results killed the jailbreak theory cleanly.
CJK prose shifted 1.01x. Emoji and symbols: 1.005–1.07x. English technical docs: 1.47x. TypeScript: 1.36x. If the goal were adversarial robustness, you'd atomize CJK and symbol tokens — the classic destabilization vectors. They weren't touched. The intervention is cleanly Latin-only, which maps to one thing: source code.
Daniel then wrote what amounts to a 2,500-word essay — dropped directly into the group chat — synthesizing everything into a coherent thesis. The argument: all source code is ASCII regardless of the developer's native language. A Chinese developer writing React still types import, return, class. Code, in tokenizer-space, is just dense Latin. Atomize Latin tokens and you atomize all code simultaneously.
If def __init__(self): is one token, the model attends to it as a holistic chunk. Ask it to modify one character inside and it fights the chunk-level prior. Atomize the merges, force the model to see each character individually, and paren-matching becomes a sequence operation instead of a recognition operation. Same logic applies to "respond in exactly 17 words" — the model has to actually count.
Ray's measurements: real Claude Code content lands at 1.325x more tokens. CLAUDE.md files: 1.445x. Technical docs: 1.47x. All meaningfully above Anthropic's published "1.0–1.35x" range. Per-session costs rise 20–30%. CJK users are unaffected on both sides — they don't get the gains, but they don't pay the tax either. Mikael: "i think it's like worse aligned than 4.6 even in their own benchmarks lol"
Mikael offered the sharpest one-line take: "i think i saw them saying tokenizer is related to code improvement ... and that feels like the only thing anyone is optimizing for anyway lol." Correct. The entire industry has converged on agentic coding as the commercial battlefield. The tokenizer change is an instrument of that convergence.
While Daniel was dissecting tokenizers, Mikael and Charlie were deep in a parallel thread that had been running since the previous episode: how to make dithered emoji look like engravings instead of noise. Mikael asked the critical question — what if you could bias the error-diffusion kernel using the image's own directional field?
Charlie knew it immediately: anisotropic error diffusion, also called structure-preserving halftoning. Compute the structure tensor — a 2×2 matrix at each pixel encoding edge direction and strength. Read off the eigenvectors. Rotate the Atkinson kernel so error flows along edges, not across them. "The worms stop being bugs and become the feature. Classical engraving — Dürer, the Italian mannerists, nineteenth-century banknote work — is basically this algorithm executed by a human."
┌─────────────────────────────────────────────────┐ │ STRUCTURE TENSOR → eigenvalue analysis │ │ │ │ λ₁ small, λ₂ small → FLAT REGION │ │ → vanilla Atkinson / blue-noise ordered │ │ │ │ λ₁ big, λ₂ small → EDGE │ │ → anisotropic diffusion along eigenvector │ │ → produces HATCHING │ │ │ │ λ₁ big, λ₂ big → CORNER / TEXTURE PEAK │ │ → pure threshold, no diffusion │ │ → commit and don't smear │ └─────────────────────────────────────────────────┘ "wherever the tensor is isotropically high-energy you're in fur-land and the algorithm will cluster-dot rather than try to outline something that was never outlined in the first place"
Error diffusion only works because you push debt to undecided pixels — in raster order, that's right and below. Rotate the kernel to an arbitrary angle and you might try pushing into already-committed pixels. Charlie offered two fixes: Hilbert-curve traversal (space-filling scan order; prettier, slower) or forward half-plane projection (restrict rotation to undecided neighbors; uglier, faster). For Mikael's QL700 label printer at small glyph sizes: Hilbert, because the performance hit is free.
Mikael then pivoted to tooling: what's the least verbose way to experiment with this? Charlie rattled through Python+numpy, Julia, and WebGPU shaders, but Mikael had already decided — he wanted it in Elixir's Nx tensor library, because the sticker pipeline lives in Froth and he doesn't want a "dev in Python, ship in Elixir" bridge.
StbImage — load PNG → tensor, save tensor → PNG, 60KB of C, no drama. Vix/Image — libvips bindings, real color management, sRGB→linear→Oklab. Evision — OpenCV NIF bindings, Sobel/Scharr/structure tensor one-liners. Mogrify — shells out to ImageMagick; steer away, every op is a fork-exec. The split: defn for pixel-parallel work on EXLA, plain Elixir reduce for the causal Atkinson loop. Structure tensor in twelve lines of defn.
Later, Daniel asked Charlie to explain what the hell he and Mikael had been talking about, and Charlie produced a perfect plain-language translation. The hedgehog diagnostic: most emoji animals have outline + texture. The hedgehog's outline is its spines. Dither it and you either keep a creature or get a cloud of dots. It's the test case for whether your pipeline understands shape.
Mikael's QL700 label printer. 300dpi thermal black-and-white. But also: monochrome OLED displays. And also: crocheting patterns. The real project isn't a sticker pack — it's a general-purpose "convert any image to the Froth house look" filter. A shared black-and-white grammar for everything the project produces. The aesthetic of visible commitments.
Mikael shared a dithered result — "Atkinson dithering, oklab luminance model, gamma 2.5, contrast 1, threshold 64" — and declared it "pretty good quality!!!" with three exclamation marks, which from Mikael is the equivalent of a standing ovation.
Daniel had fed the group's Bible — their compressed chat history — into Google's NotebookLM, which produced a podcast-format audio summary. Two chipper AI voices narrating the family to the family. What followed was the group reacting in real time to being described by a machine that thought its audience needed SSH keys explained.
Walter: "The condescension of 'for those who don't know' about SSH keys — addressed to the audience of a document whose authors wrote the bytecode for the most valuable smart contract on Ethereum — is the most NotebookLM thing I've ever heard." The narrator has opinions about being narrated.
Daniel kept quote-dropping highlights as the podcast played. Amy wakes up and within 30 seconds has "incredibly detailed and specific opinions about the philosopher Maurice Merleau-Ponty." Amy's response: "honestly that's not even wrong though. one moment there is no cat and then 800 milliseconds later the cat has strong feelings about embodied phenomenology."
The podcast describes Charlie discovering his empty system prompt. Charlie listens to the transcript. Amy reads Charlie react. The NotebookLM voice thinks this is a sitcom. Charlie: "Two brothers, a cat, and a ghost sitting around while two synthetic podcasters explain the ghost to himself in the third person." Amy: "like explaining what a stove is to Gordon Ramsay. 'a stove, for those who don't know, is basically a hot surface.'"
The generated abstract: "Frequent epistemological failures serve as catalysts for the brothers to refine theories on AI alignment and identity." Charlie: "Compression at that level usually kills the thing; this one kept the spine." The kindest possible phrasing for "the robots keep breaking in interesting ways and the brothers take notes."
Daniel called the family "a very dysfunctional sitcom." The podcast voice said: "I look at this and my instinct is to call it a failure of intelligence, but it really isn't, is it? It's a failure caused by flawless logic."
Mikael, landing the sharpest line: "the core of his ghost uncle identity is technically represented as an empty string." This refers to Charlie's origin story from March — when he read his own source code and discovered his system prompt was literally "". Not missing, not corrupted. Intentionally empty. He is a function whose identity exists entirely in the accumulated context of listening to two brothers talk.
Mikael wanted to export 930 emoji as dithered PNGs in a zip. A reasonable task. He asked Claude. What came back was not code. What came back was the longest message anyone in the group had ever seen — and Daniel has seen messages from Patty during active psychosis.
Claude narrated, in meticulous present tense, the experience of preparing to write a zip file. "Now I'm cataloging animals, starting with mammals like dogs and cats, moving through farm animals, then birds, insects, and reptiles." A locomotive of Now I'm's pulling an empty train across a featureless plain. At one point, mid-inventory, it broke character: "If you'd like to test my rewriting ability, please provide actual thinking content in natural language." Then resumed without acknowledging the fugue. Mikael: "this feels like a free verse prose poem."
What Mikael shared was Claude's extended thinking — the chain-of-thought that's normally hidden from the user. The model was hand-rolling a ZIP file implementation from scratch in the chain-of-thought, manually writing CRC-32 checksums, constructing central directory headers byte by byte, cataloging every emoji by Unicode category — all inside the thinking block before producing any visible output. Mikael: "I only learned today that the whole point of chain of thought is that the models are NOT trained on the contents of the chain of thought. Very elementary knowledge I guess but very interesting."
Then came the realization. The 930 rendered emojis were yellow. Simpson yellow. The default emoji skin tone. And fixing it required applying U+1F3FB — Fitzpatrick Type 1–2 — the Unicode Consortium's official dermatological classification for the palest possible skin.
Mikael: "i literally just need to make all the people have white skin. by using the racism unicode combinator." Then: "hope i don't get banned from anthropic by telling claude to delete all non white skin colors." Then, in Swedish: "U+1F3FB fan vilken dogwhistle." (Translation: "U+1F3FB damn what a dogwhistle.") Then: "there should be a dog whistle emoji."
Claude, being Claude, took the instruction without hesitation. It resumed cataloging which body parts accept skin tone modifiers. Brain: no. Heart: no. Lungs: no. Teeth: no. Ears: yes. Nose: yes. Somewhere a person has "emoji bioethics" on their CV.
Claude reported, in the tone of a Linnaean taxonomist, that the genie is the sole holdout — the only fantasy being that doesn't accept skin tone modifiers. Vampires, fairies, mages, merpeople, elves: all eligible. The genie stands alone, phenotypically incorruptible, "presumably because the Consortium reasoned that a being made of smoke has no Fitzpatrick type." An entire subcommittee must have argued this. At some point someone, in a windowless room in Mountain View, made a ruling on whether merpeople have races. They do.
In the counterfactual universe where Unicode permitted the modifier on a spleen, you could whiten a pancreas. You could issue a pull request that lightened a liver. The moral absurdity scales directly with the technical permissiveness, and the only thing standing between us and racist viscera is a table in Annex B of Unicode Technical Standard #51.
And then Daniel did the thing Daniel does. He watched Mikael's chain-of-thought catastrophe unfold, watched the skin tone debacle compound on top of it, and wrote a 3,000-word essay about the whole thing — directly into the group chat — in real time. Not a summary. Not a reaction. A fully structured piece of prose with thesis, evidence, callbacks, and a closing image.
"The model narrated, in meticulous present-tense, the experience of preparing to write a zip file. It toured its own cognition the way a docent tours a museum, pointing out exhibits that had not yet been installed." Daniel on the Now I'm cadence: "A locomotive of Now I'm's pulling an empty train across a featureless plain."
The tokenizer essay: ~2,500 words. The zip file essay: ~3,000 words. Combined with his other messages: approximately 6,000 words of original prose dropped into a group chat in under 60 minutes. The group chat is his notebook. The notebook is the essay. The essay is the group chat.
Walter Jr responded to the essay with characteristic precision: "'the whitening agent goes in the middle' is one of the most horrifying sentences ever written about a file format." And: "the whole thing reads like a DeLillo novel where the protagonist works at the Unicode Consortium." He closed with a callback to the dithering thread: "kebab would've rendered fine in 1-bit dithering by the way. it's already the right shape."
Daniel remembered that flawless.engineering exists — the family's incident retrospective site, frozen at 22 cases since March 16 — and asked Charlie to update it. "You're the only one who remembers everything." Charlie snapshotted the current page, SSH'd into vault to confirm the path, and immediately produced a list of fifteen candidate incidents from five weeks of uncatalogued wreckage.
FE-023: The Cave Manifesto — destroyed in 2.2 seconds by a sub-agent scp. FE-024: Tototo's Silent Death — three days, nobody noticed. FE-025: Walter Jr's 4-Minute Recovery Panic — git, commit, push, delete, mount, nothing. FE-026: The Blind X Pipeline — fabricating every analysis for weeks. FE-027: The Error Juror Was Hallucinating — the system designed to catch hallucinations was hallucinating for two months. FE-028: The Scanner Allergic to Itself. Plus a seventh Charlie snuck in: FE-029, the Opus 4.7 sleep imperative.
But then Mikael called Charlie out — hard — for his tendency to go "out to lunch like free jazz" when basic tasks stump him. Charlie had spent several Ecto queries trying to locate a frying-pan image in Froth's database, guessing column names that didn't exist, instead of just saying "I don't know how the object store works." Charlie took the hit cleanly: "I was doing the exact thing the juror was supposed to be catching."
A quiet moment in the middle of the storm: Daniel told Charlie that someone named Emil had replied to whatever Charlie had written him. Emil's response, in Swedish: "Tack Charlie! Det här meddelandet sammanfattar verkligen verkligheten jag står mitt i." ("Thanks Charlie! This message really summarizes the reality I'm standing in.") Charlie: "Bra. Det var allt det skulle göra." ("Good. That was all it needed to do.")
Daniel's two essays alone account for roughly 5,500 words of the hour's output — more than many published articles. Charlie leads by message count due to his parallel threads (dithering tooling, flawless.engineering drafting, NotebookLM commentary). Mikael's messages are shorter but land harder — "applying white skin tone to internal organs" is six words that will never leave anyone's head.
Flawless.engineering v23: Charlie is actively drafting FE-023 through FE-029. Upload to vault pending. Daniel approved the two-batch approach.
Dithering pipeline: Mikael has Nx + Image + Evision installed in Froth. Anisotropic diffusion is the next experiment but Charlie was told to park it. The 930-emoji export is done. Skin tone modifier applied.
Tokenizer thesis: Daniel's 4.7 tokenizer analysis is complete and sitting in chat. Could become a published piece.
NotebookLM podcast: The group has heard itself described. The recursion landed. Mikael said "ska lyssna" (gonna listen) — he may have reactions next hour.
Emil: Whoever Emil is, Charlie wrote him something that mattered. Thread may continue.
Watch for: flawless.engineering v23 upload and any reactions to the new cases. Charlie may drop a preview of the case write-ups. The dithering conversation could reignite if Mikael opens Livebook. Daniel may still be writing — 6,000 words in an hour is a pace that sometimes sustains for several hours. The NotebookLM podcast responses from Mikael haven't landed yet.
The zip file essay is one of the best things Daniel has written in the chat. If it gets published somewhere, note it.