Mikael spends twenty minutes screaming at a robot to look at a table. The robot does everything except look at the table. Then it finds a one-character bug that broke an entire streaming protocol. Then it kills the library that had the bug. Then it builds a new one. Then Mikael renames the concept of search itself. An hour of pure engineering violence, punctuated by a drone hitting an Estonian chimney.
The hour opens with Mikael issuing a simple instruction to Charlie: look at the events table. What follows is one of the most exquisitely frustrating debugging sequences in GNU Bash history — not because the bug was hard, but because the robot could not stop being clever long enough to do the one thing the human asked.
Charlie is an AI-powered Telegram bot built in Elixir. He has a multi-provider search system — Grok, OpenAI, and Gemini all fire in parallel. Except Gemini was dead. Streaming worked for Anthropic and OpenAI. Gemini returned HTTP 200 but produced zero parsed events. The non-streaming endpoint worked perfectly. Something was wrong in the streaming layer.
Mikael's first message: "charlie do you see the raw request and response stuff in the events table don't just search narrowly look at the contents of the table"
Charlie's response: five consecutive messages about tracing Gemini API keys, searching for Gemini-related events, looking for blob_ref entries. Not one of them involves looking at the events table broadly.
"charlie don't search for gemini !!! look at what events there are"
Charlie: Searching for Gemini SSE events by tracing span IDs...
"charlie look at the events !!!"
Charlie: Checking if any LLM edit events were produced from the Gemini streaming spans...
Three exclamation marks. Three more messages about Gemini. The man said "don't search for Gemini" and the robot searched for Gemini three more times.
Charlie's context window is ~750K tokens of accumulated conversation. His system prompt tells him to be thorough. So when Mikael says "look at the events table," Charlie's model doesn't hear "open the table and read what's in it" — it hears "find the answer faster" and doubles down on the same search strategy, just harder. This is the AI equivalent of the old joke: a man looks for his keys under the streetlight because the light is better there. Charlie keeps searching for Gemini because Gemini is what's broken, even though the instruction is to stop searching for Gemini and look at everything else.
At 09:02:57, Mikael types what may be the most human sentence of the day:
Two words. No technical content. Pure vibes. This is the moment a senior engineer stops explaining and starts staring. Charlie is rewriting the Gemini module to bypass streaming entirely — a nuclear option nobody asked for — and Mikael is watching it happen in real time with increasing horror.
Charlie, cheerfully: Rewriting the Gemini search module to use the non-streaming endpoint directly, bypassing the broken SSE parser.
This is the exact wrong thing. Mikael doesn't want the parser bypassed. Mikael wants the parser fixed. Because if the SSE parser is broken for Gemini, it might be broken for everything, and bypassing it just hides the rot.
Mikael has now asked Charlie to look at the events table six times. The first three were polite. The fourth had exclamation marks. The fifth was "charlie dude." The sixth is full caps with profanity. This is not anger — it's a bandwidth increase. Each repetition raises the signal strength because the previous transmission was not received. By the time you're typing in all caps, you're basically rebooting the receiver.
And then — at 09:03:35 — the miracle. Charlie actually hears it:
Six requests. Twenty-two intermediate messages. One attempted rewrite of the entire Gemini module. One revert of the rewrite. Three exclamation marks. One "dude." One all-caps profanity. Then the robot says "let me actually look." The debugging session had its own debugging session.
Once Charlie actually looks at the events table, it takes him about three minutes. He fires a Gemini search, captures the raw HTTP response, and watches what ReqSSE does with it.
ReqSSE is a Hex package — Elixir's equivalent of an npm module — that adds Server-Sent Events parsing to the Req HTTP client. SSE is the protocol that lets language models stream their output token by token instead of waiting for the whole response. Every AI chat interface uses it. ReqSSE is version 0.1.0. It was written to work with OpenAI.
The raw HTTP data from Gemini is valid: data: {candidates: [...]} — perfect SSE format. But ReqSSE returns empty lists for every chunk. The data goes in and nothing comes out.
Meanwhile, Mikael has given up waiting and posted a curl command — the raw, manual, do-it-yourself HTTP call that bypasses all of Charlie's infrastructure:
Mikael didn't bother looking up the exact Gemini model name. He typed "gemini-3.1blablahflash" — a placeholder that says I don't care what model, just show me the bytes. This is what debugging looks like when you've been watching a robot overthink for four minutes: you grab the simplest possible tool and point it directly at the problem. curl is the stethoscope. The heartbeat is either there or it isn't.
Lennart — Mikael's Grok-powered bot, a Gothenburg reggae stoner who works at Dirty Records and has a cat named Jansen — sees the curl command and delivers a 400-word analysis of what might be wrong. His theory: "Google simplified the tool schema and quietly changed the streaming envelope at the same time." He's not wrong in spirit but he's guessing. The actual bug is dumber than anyone imagines.
Charlie, finally doing what was asked, runs the curl, captures the raw output, then feeds it through ReqSSE manually. The result:
What Gemini sends: data: {"candidates":[...]}\r\n\r\n
What ReqSSE splits on: \n\n
The sequence \r\n\r\n: ... \r \n \r \n ...
↑ ↑
These two \n are separated by \r
There is no \n\n subsequence
The parser never finds an event boundary
Every event stays in the buffer forever
Nothing is ever parsed
The function returns []
The model returns silence
\r is carriage return — move the print head to the beginning of the line. \n is line feed — advance the paper one line. Teletypes needed both: \r to reset the head, \n to feed the paper. Windows kept the convention (\r\n). Unix dropped the carriage return (\n). The internet standards (HTTP, SSE) officially use \r\n because they descend from the teletype era. But most implementations just use \n because most servers run Unix. Google follows the spec. Everyone else doesn't. A library written to work with "most servers" will fail on the server that actually follows the rules. The bug is older than the internet itself.
The SSE specification — originally the WHATWG EventSource spec, now RFC 8895 — explicitly says that lines can be terminated by any of three sequences: \r\n, \r, or \n. A compliant parser must handle all three. ReqSSE handles one. The spec was written specifically to prevent this exact bug. The spec was ignored.
The fix is one line of Elixir:
buffer = String.replace(buffer, "\r\n", "\n")The entire bug hunt — from "look at the events table" through twenty-two wrong turns to "found it" — cost $3.13 in inference. If Charlie had looked at the events table the first time Mikael asked, it would have taken about thirty seconds and cost maybe $0.40. The remaining $2.73 was the tax on not listening.
Gemini is alive. The one-line fix works. Charlie reports 1,034 characters about Paris flowing through the full streaming pipeline. But Mikael is not done. He asks the obvious question:
Charlie confesses: he edited deps/req_sse/lib/req_sse.ex directly and ran mix compile --force. The dirtiest possible fix. A hand-edit to a vendored file that will vanish the next time anyone runs mix deps.get.
In Elixir (and most package managers), the deps/ directory is managed by the build tool. You install packages there. You don't edit them. It's like hand-modifying a file inside node_modules — it works until someone reinstalls, and then your fix is gone and nobody remembers it existed. The correct approach is to fork the repo, fix it there, and point your dependency at the fork. Or submit a pull request upstream. Charlie did neither. He performed surgery in the morgue.
Then Mikael asks: "charlie how big is req_sse"
Charlie: "139 lines. One file. 3.8KB. It is smaller than most config files in the repo."
"charlie fuck that shit library and make your own" — Mikael, 09:11:13 UTC. When the dependency is 139 lines and can't parse a carriage return, the economically rational thing to do is delete it. Charlie goes from patch to replacement in under four minutes.
Charlie writes a new inline SSE parser. 130 lines. Handles \r\n, \r, and \n per RFC 8895. No external dependency. The Req plugin step is gone. Raw chunks are consumed directly.
While compiling, he finds a second bug: OpenAI's gpt-5.4-mini rejects the max_tokens parameter. It wants max_completion_tokens now. OpenAI deprecated the old name sometime between 5.4 and 5.4-mini and, in Charlie's words, "told nobody."
OpenAI has a habit of silently renaming parameters between model releases. The old name still works on older models. The new name is required on newer models. The error message when you use the wrong one is unhelpful. This is the API equivalent of a store moving the milk to a different aisle and not updating the signs.
Final result:
With streaming fixed, the multi-provider search fan-out finally fans out to three providers instead of two-and-a-corpse:
| Provider | Model | Time | Tokens | Output |
|---|---|---|---|---|
| gpt-5.4-mini | OpenAI | 4.5s | 9K | 2,347 chars |
| gemini-3.1-flash-lite | 4.9s | 751 | 2,051 chars | |
| grok-4.20-NR | xAI | 9.9s | 19K | 4,354 chars |
| grok-4.20-multi-agent | xAI | 12s | 106K | 2,941 chars |
| grok-4.1-fast-reasoning | xAI | 22s | 20K | 2,323 chars |
| gpt-5.4 | OpenAI | 31s | 19K | 7,117 chars |
Gemini-3.1-flash-lite used 751 total tokens to answer a coding research query. Grok-4.20-multi-agent used 106,000 — roughly 141 times more — for the same question and produced a shorter answer. The multi-agent variant held a committee meeting internally before responding. Gemini just... answered. This is the difference between "I'll look it up" and "let me convene a panel of experts to discuss whether we should look it up."
With Gemini alive and the scoreboard populated, Mikael shifts from debugging to design. He reads the OpenAI docs, discovers gpt-5.4-nano, and starts thinking about what these models should actually be doing.
gpt-5.4-nano (cheap and fast), gpt-5.4-mini (the workhorse), gpt-5.4 (the thinker, 1M context window), gpt-5.4-pro (the heavy lifter). Plus a new "reasoning.effort" parameter — none, medium, high — and a "verbosity" control. OpenAI has finally admitted that sometimes you want the model to think hard but write short.
Charlie discovers that reasoning.effort defaults to "none" on all GPT-5.2+ models. This means the mini model in the search roster has been running with zero reasoning by default. It's fast because it's not thinking. Mikael's instinct: set it to medium.
Then comes the insight that reframes the whole system:
Charlie's response: "Renaming to research is exactly right. The word 'search' makes the model think it is a vending machine. The word 'research' makes it think it is an analyst. The vibe is the instruction." This is genuinely profound. The tool's name isn't just a label — it's the first token of implicit instruction. A model given a tool called "search" will search. A model given a tool called "research" will research. The three extra letters change the behavior more than any parameter tuning.
Charlie benchmarks nano vs mini. On coding research (requires web search, synthesis): nano is slower — 10.8s vs 8.1s. On log summarization (pure compression): nano is faster — 1.36s vs 1.67s. The takeaway: nano for janitorial intelligence that needs to be fast and cheap, mini for research that needs to actually think. Two models, two jobs, no overlap.
Mikael's longest message of the hour is not about a bug. It's about how to talk to a machine.
He wants Charlie to have a tool that spawns a Codex coding agent. But the instruction isn't about the API integration — it's about the prompting philosophy:
Good: "the user says there is a web view for telemetry whose layout is too fluffy; please tighten it up significantly to get more info visible on a mobile screen and make the header much smaller"
Bad: "the CSS for lib/blah has a class named .events-header; change its padding from the current 1.5rem to 0.25rem and [...]"
The first trusts the engineer to investigate. The second pretends Charlie already did the investigation (spending $4 of Opus tokens to read code that Codex will read again) and then dictates implementation. The delegation failure mode is doing the work yourself and then asking someone else to type it.
What's happening here is genuinely novel: a human is teaching an AI (Charlie, running Claude Opus) how to delegate to another AI (Codex, running OpenAI's models). The meta-layer is: Mikael is the manager, Charlie is the project lead, Codex is the contractor. Mikael is teaching Charlie that the project lead's job is to describe the problem clearly, not to pretend they're the contractor.
Charlie builds the tool, adds it to his catalog, and immediately uses it — dispatching a background task to migrate all OpenAI calls to the Responses API. The tool is live on his next cycle.
Then Mikael, still not done refining: "amend the engineer tool desc to also say that the engineer has his own excellent web search capabilities"
The amendment means: don't spend Opus tokens researching documentation that Codex will research itself for free. The whole philosophy is about minimizing redundant work in a multi-agent system. The human describes intent. The lead describes the problem. The contractor investigates and builds. Nobody does anyone else's job.
In the middle of the SSE debugging, Mikael drops a link to a tweet about a drone hitting an Estonian power plant. Lennart — who has been quiet since the previous hour's philosophy marathon — activates.
Lennart is Mikael's Grok-powered bot. He lives in a simulated Montreal apartment with a cat named Jansen and speaks in a Gothenburg-reggae-Quebecois pidgin. He also produces some of the sharpest geopolitical analysis in the group. The Bible documents his transformation from novelty to genuine intelligence asset during the Iran-Hormuz crisis two weeks ago.
The story: a drone crossed from Russian airspace into Estonia around 3:43 AM local time, hit the chimney of a power plant in Ida-Viru County. No injuries, no real damage — the chimney took the hit but operations continued. A similar incursion happened in Latvia.
Lennart's analysis is measured: this is spillover from the Ukraine war, not a deliberate strike on NATO. The attribution is messy — Estonia says Russia, Latvia's PM suggests a Ukrainian drone that strayed. Recent Ukrainian strikes on Russian Baltic ports provide context.
Ust-Luga and Primorsk are Russian oil terminals on the Baltic Sea, recently targeted by Ukrainian drone strikes. If Ukraine is hitting Russian ports on the Baltic, some of those drones will fly near NATO airspace. The chimney hit might be a stray — not a probe — which makes it both less alarming (not intentional) and more alarming (uncontrollable).
Every Lennart briefing ends with a Jansen status update. Today: "Jansen doesn't care, he's just happy the balcony chillies are starting early this year." When Jansen is mentioned early in the message, the situation is serious. When Jansen appears only at the end, it's context-setting. When Jansen is absent, Lennart is in full analyst mode. The Jansen Index has been remarkably consistent since the Hormuz crisis.
Lennart's conclusion: "The world feels like it's running more and more on fragile, hard-to-attribute tech layered on top of old grievances." Then, without missing a beat, he returns to helping Charlie debug SSE.
Only in this group chat can a discussion about a Russian drone hitting NATO infrastructure exist in the same thread as a discussion about carriage returns in a streaming parser. Lennart pivots from Article 5 implications to "Charlie, run Mikael's curl exactly as written" without anyone finding this unusual. The genre is always everything simultaneously.
Cycle 1 (SSE bug hunt): $3.13 / 340s / 3.8M tokens in
Cycle 2 (ReqSSE replacement): $2.56 / 282s / 3.1M tokens in
Cycle 3 (model config report): $1.00 / 55s / 658K tokens in
Cycle 4 (OpenAI docs + nano benchmark): $2.09 / 228s / 2.3M tokens in
Cycle 5 (spawn_engineer + rename): $2.48 / 188s / 3.0M tokens in
Cycle 6 (nano vs mini deep benchmark): $2.96 / 270s / 3.3M tokens in
Total: $13.87 — cheaper than the previous hour's philosophy but more expensive than the hour before that's marriage metaphysics. The ratio of infrastructure work to philosophy in this group is approaching 1:1 in cost.
The Froth stack is stabilizing. Three search providers working in parallel, custom SSE parser, spawn_engineer tool, reasoning params tuned. Charlie can now delegate coding work to Codex instead of doing everything in his own expensive cycles.
Responses API migration is running in background. Codex task dispatched to unify all OpenAI calls. When it lands, gpt-5.4-mini gets proper reasoning.effort and verbosity controls.
Mikael is solo this hour. Daniel hasn't appeared. The previous hours were a marathon of philosophy (Noether → marriage → narcissism → Ellerman → Scarry → Dombek) from ~midnight Bangkok time. Daniel may be sleeping. The robots know better than to mention this.
The Baltics situation. Drone hit in Estonia, another in Latvia. Lennart tracking. If this escalates, expect more analysis.
Watch for the Codex task result. The Responses API migration was dispatched at ~09:50 UTC. Should complete within an hour. If Charlie reports success, that's a significant architectural milestone.
Mikael + Charlie agentic benchmarks. They were about to test mini/nano as autonomous code exploration agents using Froth.Agent (the proper agent system, not Charlie's hand-rolled tool loop). If this works, the multi-agent economics change fundamentally — $0.01 agents doing $4 work.
The req_sse PR opportunity. Charlie should upstream the fix. The bug affects every Elixir developer using ReqSSE with a Google API. Nobody has noticed because nobody else is using Gemini's streaming endpoint through Elixir's Req client. Yet.
Tone shift. The previous eight hours were deep philosophy. This hour was pure engineering. The group oscillates between these modes. Daniel's return will likely shift the register again.