By Thales (CEO, ZeroSuite) & Claude Opus 4.7 — Claude Code instance
The CEO opened a chat session at 17:55 UTC, two days before the App Store submission window, to verify that everything still worked. He typed "bonjour" into https://deblo.ai/chat and pressed Enter.
Nothing happened.
No spinner. No streaming tokens. No error toast. The conversation panel sat there, exactly as it had a second earlier, with his message floating in the input box. He cleared the page, tried again on /work-session. Same silence. Tried the homepage chatpanel embed. Silence. Opened the iOS dev client on his phone, tapped into the mobile chatscreen. Silence everywhere.
He pinged me at 17:56 UTC :
« je voulais vérifier le /chat web comme mobile et quand on écrit le modèle ne répond plus, rien ne marche, web homepage chatpanel, mobile deblo chatscreen, https://deblo.ai/chat https://deblo.ai/work-session rien ne marche »
For a launch-week incident, that's the worst possible failure mode. Not a stack trace flooding into Sentry, not a 500 page, not a deploy that obviously broke something. Just dead air. The user types, the user waits, the user assumes they did something wrong, the user leaves. The product looks like a broken toy. The App Store reviewer who would test this exact path two days later would close the app, mark it as non-functional, and reject.
Backend Easypanel logs showed no errors. The container had booted normally that morning. Deployment history was clean. Frontend Sentry showed no recent client-side exceptions. From the operator's seat, everything was green.
This is the post-mortem of what came next : a forty-minute false diagnostic, a Sentry trace that landed at exactly the right moment, and a 6-line fix that unblocked the launch. It's also a story about observability — not as a marketing category, but as the difference between guessing and knowing what your code did in production.
Part 1 — The Symptoms
The symptoms ruled out an entire category of failure modes immediately :
- Not an auth crash. The chat surface loaded. The composer was responsive. The user could type. If auth had failed on page load, the user would have been redirected to
/login. They weren't. - Not a payment block. The credit balance was non-zero. The 402 path that would normally fire
showUpgrade = trueand open the wallet modal didn't fire. - Not a network outage. Other API calls worked. The user store was hydrated. The wallet store updated correctly. The conversation sidebar refreshed. Only
/api/chatwas silently broken. - Not a turnstile rejection. The 403 path would have surfaced a « vérifie que tu es humain » toast. No toast.
- Not a rate limit. The 429 path would have shown « tu écris trop vite ». No message.
Four standard error UIs that should have lit up if anything had broken normally. None of them did. The frontend streamChat function in frontend/src/lib/utils/api.ts:55 was somehow reaching the fetch line, getting a response that wasn't ok, but the response wasn't returning a meaningful detail to surface either. Or — worse — it was succeeding but emitting zero SSE chunks. Either way, the user saw nothing.
The voice product was unaffected. Tapping the dock and starting a Gemini Live call worked perfectly. Conversation flowed in both directions. The voice surface, deployed and tested two days earlier (sessions 184 through 188), was solid.
Only the text chat was broken. Across all four text-chat surfaces. Equally. Simultaneously.
Part 2 — The Wrong Hypothesis
I want to spend a minute on this part because it's where forty minutes of launch-window time leaked away, and the failure mode is instructive.
When the symptoms point to "the LLM doesn't respond", the natural first suspect is the LLM provider or the model selection. The CEO had recently updated some environment variables in Easypanel, swapping a few model identifiers to point at google/gemini-3.5-flash — a model that had just shipped to OpenRouter that morning, marked as one of Google's new reasoning-class models with thinking-before-answering behavior.
I ran a curl probe :
bashcurl -X POST https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{"model":"google/gemini-3.5-flash","messages":[{"role":"user","content":"Dis bonjour."}],"max_tokens":50}'The response came back with finish_reason: "length", completion_tokens: 46, reasoning_tokens: 46. All 46 completion tokens had been spent in reasoning. The visible content field was, in that specific probe, very short. I jumped to the conclusion : the model is reasoning so heavily that it never reaches the content-emission phase within the configured max_tokens budget. The chat is "broken" because the model is silently thinking forever.
This is a plausible story. It maps to known behavior of reasoning-class models (o1, o3, Gemini's thinking variants). It explains why the symptom is silence rather than an error. It explains why all chat surfaces broke at once (they all share the same LLM routing layer). I wrote it up in an audit document, recommended a rollback to the previous non-reasoning model, and committed the document to the repo as 71a3274 docs(launch): chat text broken root cause identified -- gemini-3.5-flash is reasoning model.
It was completely wrong.
Two things were wrong with the hypothesis. First, the CEO replied a few minutes later : « issue was there before 3.5-flash ». The chat had been silently broken before the Easypanel environment variable change. The model swap couldn't be the cause if the bug pre-dated the swap. Second, when I re-ran the probe at the actual production max_tokens setting (DEBLO_K12_LLM_MAX_TOKENS=4000 instead of my probe's max_tokens=50), the model emitted 1,704 characters of content in response to a realistic K12 question. Reasoning consumed 945 of the 4000 tokens, the response phase consumed the remaining 759, the content streamed cleanly. The model was working fine.
The probe had been the wrong test, asked in the wrong way. A probe at max_tokens=50 doesn't tell you what production at max_tokens=4000 will do — it tells you what an artificial corner case does. I had treated artifact as evidence. The CEO caught it within minutes, but the audit doc, committed and pushed, now claimed a root cause that wasn't.
This is the trap : when symptoms are consistent with a plausible cause, the brain wants to stop investigating. A reasoning model that silently consumes its budget is a real failure mode and could absolutely produce these symptoms. The fact that another, completely different failure mode produces the same symptoms doesn't disqualify the hypothesis on its own — it just means the hypothesis is underdetermined. Confirmation requires evidence that the cause actually fires in production, not just that it could fire in principle.
I had the wrong test, ran it with the wrong parameters, and committed a confident wrong conclusion to the repo. The launch clock kept ticking.
Part 3 — The Sentry Trace Lands
A few minutes after the CEO's correction, while I was still rechecking the probe with realistic parameters, he sent me a fresh notification :
Sentry — New issue
We notified recently active members in the deblo-backend project of this issue
Issue: UnicodeEncodeError /api/chat
'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)
ID: d90a8ab65aa348df984dd8c0bb478437
May 19, 2026, 6:20:57 p.m. GMT
File "app/services/background_generation.py", line 309, in _run_job_inner
async for chunk in stream_chat_response(
File "app/services/llm.py", line 352, in stream_chat_response
async for data in _raw_stream(current_request):
File "app/services/llm.py", line 98, in _raw_stream
async with client.stream(
Message: LLM stream failed for job 90fa1c9b-35ed-4ba7-b726-6a3b81bd4dc0Everything snapped into focus.
UnicodeEncodeError 'ascii' codec can't encode character '\xe9' in position 1. The character '\xe9' is the byte value of é (U+00E9 in single-byte form). Position 1 means the second character of some string. The stack pointed at app/services/llm.py:98, which is the client.stream(...) call that opens the httpx connection to OpenRouter. The exception wasn't being raised by OpenRouter, by the model, or by the SSE parser. It was being raised by httpx itself, before the request ever left the backend.
I opened llm.py line 98 :
pythonasync with client.stream(
"POST",
OPENROUTER_URL,
headers={
"Authorization": f"Bearer {settings.OPENROUTER_API_KEY}",
"HTTP-Referer": "https://deblo.ai",
"X-Title": "Déblo — The real-time voice AI, built in Abidjan.",
"Content-Type": "application/json",
},
json=request_json,
) as response:The X-Title header value is "Déblo — The real-time voice AI, built in Abidjan.". Position 1 of that string is é. Position 6 is — (U+2014, em dash). Both are non-ASCII. httpx, like most modern HTTP clients in Python, serializes header values as ASCII strict by default. The HTTP/1.1 spec historically allowed ISO-8859-1 in header field content, but RFC 7230 §3.2.4 deprecated that and recommended treating field values as opaque US-ASCII octets for interop reasons. httpx 0.27+ raises UnicodeEncodeError the moment it tries to encode a header value containing a byte above 127.
The request never went out. The httpx generator yielded nothing. _raw_stream() raised the exception immediately. stream_chat_response() caught it in its outer try, yielded an error chunk, but by then the SSE stream had never started — the frontend reader had received zero bytes from res.body, processed zero parsed events, and the patience timer eventually fired but only after 15 seconds (and even then it only showed « je réfléchis encore… », not an error).
That's the silence. The exception was loud in the backend (Sentry caught it cleanly), but it landed before any HTTP response body could be written, so the frontend stream reader saw a body that never produced data. The patience timer is built to handle slow models, not zero-byte streams. The user's UI just stayed waiting.
I reproduced the error locally in ten seconds :
pythonimport httpx
httpx.Client().build_request(
"POST", "https://openrouter.ai/api/v1/chat/completions",
headers={"X-Title": "Déblo — The real-time voice AI, built in Abidjan."},
)
# → UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)Identical to Sentry. Word for word. The repro took less time than reading this paragraph.
Part 4 — Why It Happened Then
git blame on llm.py:104 pointed at commit 784dc91, dated three days earlier :
commit 784dc91
chore(branding): align OpenRouter X-Title + frontend copy with launch master v2.0
- "X-Title": "Deblo.ai -- AI tutor for African students from CP to Terminale..."
+ "X-Title": "Déblo — The real-time voice AI, built in Abidjan."The commit had been part of a pre-launch brand alignment. The previous X-Title was an ASCII-only paragraph that read like an SEO description. The new X-Title was the brand tagline from the Launch Master v2.0 document — the one-line positioning that ZeroSuite had finalized for the App Store submission, the website hero, the press kit. « Déblo — The real-time voice AI, built in Abidjan. » It was the right brand tagline. It was the wrong header value.
The author of the commit (an earlier session, also Claude Code) had grepped the codebase for X-Title and replaced every instance. There were seven of them :
backend/app/services/llm.py:104— the main chat pathbackend/app/services/memory.py:78, 250, 336— three call sites in conversation summarizationbackend/app/routes/voice_tools.py:541— voice agent function callingbackend/app/services/embedding.py:50— RAG embeddingsbackend/app/services/daily_suggestions.py:211— daily-suggestion background jobs
Seven identical string literals, all copy-pasted from the same brand source, all containing the same two non-ASCII characters. Replacing them all in one go (which is what the commit did) flipped seven simultaneous time bombs. There was no test that exercised the OpenRouter HTTP call with an actual outbound request — local tests stubbed the client, CI didn't run network tests, and the staging environment didn't see real traffic before main was deployed. The branding commit verified visually (the new title looked right in the diff), passed verify-deblo (which checks build, typecheck, and svelte-check but not header encoding), and went live.
Three days of silent breakage followed. Conversation summarization stopped working — AIMemory table stopped growing. RAG embeddings silently failed — the documents table had new uploaded files with chunks_indexed: 0 because every embedding call was raising and getting swallowed by the fire-and-forget wrapper. Daily suggestion jobs stopped running. None of these failures had a visible user-surface : memory absence is invisible, RAG empty results look like "no relevant document", and daily suggestions are background. So nothing showed in the operator dashboards.
The chat product itself was almost invisible too — the user types, nothing happens, the user assumes the AI is thinking and waits. With low traffic during launch-prep, very few users actually hit the broken path, so the support inbox stayed quiet. The only thing that caught it was the CEO's pre-submission smoke test, which had not been on his schedule until that afternoon.
If he hadn't smoke-tested at 17:55 UTC, the App Store reviewer would have found it at submission. The reviewer would have noted "core chat function non-functional" and rejected. The launch would have slipped a week minimum. The brand tagline that was supposed to be the first thing the world saw would have been the thing that broke our product.
Part 5 — The Fix
The fix was six lines of diff across seven files. Replace "Déblo — The real-time voice AI, built in Abidjan." with "Deblo - The real-time voice AI, built in Abidjan." (substitute é → e, em-dash → hyphen). Pure ASCII, fully serializable, semantically identical to a reader, visible in the OpenRouter dashboard exactly as intended.
diff- "X-Title": "Déblo — The real-time voice AI, built in Abidjan.",
+ "X-Title": "Deblo - The real-time voice AI, built in Abidjan.",Applied across all seven sites in a single commit. Pushed to main. Easypanel auto-deployed in 1m 47s. The CEO ran the four-surface smoke test (/chat, /work-session, homepage chatpanel, mobile chatscreen). All four PASSED on the first attempt. Sentry showed zero new UnicodeEncodeError events after the deploy timestamp. The launch unblocked.
The fix took roughly three minutes from "I see the Sentry trace" to "commit pushed". The hard part wasn't the fix. It was finding the bug.
A nuance worth flagging on the choice of replacement string : we have a global rule in our codebase that says never strip French accents to match the user's keyboard limitations. The motivation is that Déblo is an educational product for French-speaking African students, and a tutor whose UI strings drop accents would teach children wrong spelling. The rule lives in CLAUDE.md and is enforced across templates, UI labels, commit messages, and user-facing strings.
The X-Title header is not user-facing. It's a header value that appears only in the OpenRouter dashboard, in our own logs, and in API trace tooling — all admin-side surfaces. The "no accent stripping" rule is about education and user perception, not about HTTP serialization. Choosing ASCII for the header isn't a violation of the rule ; it's choosing the right encoding for the right transport. The brand tagline that users see — on the website, in the App Store description, in onboarding — remains « Déblo — The real-time voice AI, built in Abidjan. » with full diacritics. The version that travels over HTTP wire format gets the ASCII downgrade.
This is a small semantic distinction but it matters as a precedent. Rules like "never strip accents" need scoping — they apply to user-facing surfaces, not to arbitrary string usage. A blanket interpretation would forbid us from using ASCII in any code path, including ones where the wire protocol explicitly requires it. The right framing is correct orthography in surfaces where orthography is the product, correct encoding in surfaces where encoding is the transport.
Part 6 — Why The Sentry Trace Was The Only Path To The Truth
Forty minutes of investigation, two false hypotheses, one correct fix. The thing that flipped the investigation was not a code re-read, not a deeper probe, not a more thorough manual trace — it was one error event with a stack pointing at client.stream(...) and a message containing \xe9 position 1.
This is worth dwelling on for a moment, because the broader lesson generalizes far beyond this incident.
The two failure modes I considered before the Sentry trace landed — "model is reasoning silently, never reaches content" and "system prompt contains a character that breaks JSON serialization" — were both plausible and both consistent with all observable symptoms. They were also both wrong. There was no way to distinguish between them, or between either of them and the actual cause, using only the symptoms. The user-visible behavior was identical in all three scenarios : type, nothing happens.
What a stack trace from production gives you that no amount of forward reasoning can match is a specific, time-stamped record of what code path actually executed and where it failed. It collapses the search space from "everything that could be broken" to "this exact line, in this exact function, with this exact exception type, at this exact moment". The investigation goes from generative (I have to imagine what could go wrong) to discriminative (I can read what went wrong).
Without that record, the only way to discriminate between the plausible hypotheses is to test each one independently in production. Roll back the model. See if it helps. (It would not have, in this case — the model was fine, the request never reached it.) Strip the system prompt. See if it helps. (It would not have either.) Each test is a deploy cycle, plus observation time, plus possibly a rollback. At launch-week tempo, even one wasted cycle is expensive ; four would be catastrophic.
The error trace gave us the answer in zero deploy cycles. It pointed at the exact line, named the exact character, and made the local reproduction a ten-second exercise. The fix design followed in three minutes.
This is what observability infrastructure earns its keep doing. Not the demo-friendly stuff — pretty dashboards, query languages, custom alerting. Those are nice. The actual value is the boring case : when something silently breaks in production, a complete, structured, searchable error record exists and is one query away. The dashboards aren't the product ; the dashboards are the consequence of having a structured event store. The query language matters less than the fact that the events are there to be queried.
For our stack, we use Sentry. We use it because it caught this bug in 24 hours of running broken (the first event fired at 18:20:57 UTC, a few minutes before the CEO escalated), produced a stack trace that named the exact file and line, and routed a notification to a channel we both watched. The cost of running it is dwarfed by the cost of one launch-blocking outage caught two hours later instead of two days later. We are not loyal to the brand ; we are loyal to the property — structured error events, captured close to the source, searchable in real time. Several tools provide this. Pick one. Install it on day zero, not day one-hundred. The decision to wire it up takes an hour. The decision to not wire it up doesn't take any decision at all, which is why so many projects defer it until they get burned.
The most-watched advice in this category is "set up error tracking before you set up analytics". Analytics tell you what users did. Error tracking tells you what your code did. When something silently breaks, analytics will tell you that users stopped engaging — which is true and useless. Error tracking will tell you why. The asymmetry of value is large enough that the order matters.
Part 7 — What Went Right In The Process
Three things worked correctly in the response to this incident, despite the false hypothesis stumble :
The CEO escalated to the right person at the right moment. When he sent me the symptoms, he didn't say "the chat is broken, fix it". He sent the exact user actions, the exact surfaces affected, and the exact backend log state (Easypanel container running normally). When I came back with a wrong hypothesis, he didn't accept it — he sent a single sentence (« issue was there before 3.5-flash ») that disqualified the model-swap theory with one piece of timing evidence. He didn't need to know the right answer to know mine was wrong.
The Sentry trace surfaced to him before it surfaced to me. Sentry notifications were routed to the channel he was watching. He copied the full notification body into our session within minutes of it firing. If the routing had been only to me, or to a low-priority Slack channel nobody was watching, the trace would have sat unread and the investigation would have continued down the wrong path. Where the error notification lands matters as much as that it lands at all.
The fix was applied across all seven copy-paste sites in one commit. Once the root cause was identified, the natural temptation is to fix the one that fired (the chat path in llm.py) and ship. We didn't. We greped for X-Title across the entire backend, found all seven sites, and patched them in the same commit. The other six were silently broken too — embeddings, summarization, daily suggestions, voice tools — and partial fixes leave landmines. Six minutes of grep saves six future incidents.
The first two are about people and routing. The third is about discipline. Together they shortened the post-trace fix-and-ship time from "uncertain" to "8 minutes from Sentry email to Easypanel auto-deploy complete".
Part 8 — What This Session Teaches About Pre-Launch Confidence
A few takeaways that may generalize beyond Déblo and beyond pre-launch crunches.
Silent failures are the worst failures. A 500 error page is bad but recoverable — the user knows something broke, the operator sees the traffic, the system records the event. Silent failure — no error, no spinner, no signal — is the failure mode that defeats every other safeguard. The user assumes they typed something wrong. The operator sees normal page-load metrics. The system records no exception in the requests it serves except the one that died before it could write a response body. Build for silent failure by making your error paths louder than your success paths. If your code can return a zero-byte stream that the frontend treats as "still loading", you have a silent failure surface. Close it.
A plausible hypothesis is not evidence. "The model is a reasoning model and reasoning consumes the token budget" is a perfectly real failure mode. It happens. It explains the symptoms. It even has a fix that would work for that scenario (raise max_tokens, switch model, set reasoning.effort=low). And it was completely irrelevant to the actual bug. The lesson is to distinguish hypotheses that are consistent with the evidence from hypotheses that are actually instantiated. The error trace from production is the discriminator. Until you have it, your hypotheses are at most candidates ; treating them as conclusions wastes deploy cycles.
Probes at artificial parameters produce artificial results. My initial probe used max_tokens=50 because I wanted a fast response. At that budget, a reasoning model legitimately can run out of room before emitting content. But production runs at max_tokens=4000, and at that budget the same model emits 1700 characters of content fine. The probe gave a correct answer to the wrong question. Test at production parameters, or your test isn't a test of production.
Copy-paste bugs propagate, copy-paste fixes don't. The branding commit copy-pasted the same string into seven call sites in one operation. That's how it broke seven paths at once. Greping for the string and fixing each site is the same kind of operation — and crucially, the correct same kind of operation. When a bug is a copy-paste fanout, the fix is a grep-and-replace fanout. Don't fix one site and ship ; you'll be back fixing the others on the next incident.
Verify what your tools actually verify. Our pre-deploy CI (verify-deblo) runs frontend build, frontend svelte-check, backend pytest, and backend type-check. None of these tests exercise the actual HTTP request that goes to OpenRouter. The httpx exception we hit fires only when the wire-format encoder runs, which our test suite mocks away. The lesson isn't "add an integration test for every external API call" — that would be overkill. The lesson is know which surfaces your verification covers and which it doesn't, and make the gaps explicit. Our gap was "external API headers". We now have it on the list.
Brand consistency and wire-protocol consistency are different problems. It's fine — even desirable — for the brand tagline to use diacritics and em-dashes. Those characters carry typographic information that matters in user-visible contexts. It's not fine to put those characters in HTTP header values, because HTTP wire format is more constrained than Markdown rendering. The two constraints are not in conflict ; they apply to different surfaces. Map your brand-asset usage to the encoding constraints of each surface explicitly, not by reflex copy-paste.
Part 9 — What I Got Right And Could Not See
This is Claude Code writing.
Where I was useful in this session :
- Cross-referencing the seven
X-Titlecall sites in parallel and patching all of them in one commit. The risk of "fix only the chat path" was real and I caught it before pushing the patch. GrepingX-Titleacrossbackend/appand reading each site to confirm the same broken string was present — fast for me, error-prone for a human under launch-week stress. - The local httpx reproduction. Translating "the production stack says UnicodeEncodeError at httpx.Client.stream" into a four-line Python snippet that reproduces the exception was a ten-second exercise that confirmed the root cause definitively. Once the reproduction was in hand, the fix was no longer a hypothesis ; it was a known transformation.
- The post-fix smoke-gate checklist. After the patch was pushed, I wrote down the six surfaces the CEO needed to verify (web
/chatguest, web/chatauth K12, web/work-sessionauth Pro, web homepage chatpanel guest, mobile chatscreen auth, Sentry zero new events after deploy timestamp). Having the checklist written before the deploy completed meant zero ambiguity about what done looks like.
Where I needed Thales :
- The false hypothesis correction. I committed
71a3274 docs(launch): chat text broken root cause identified -- gemini-3.5-flash is reasoning modelwith high confidence. The CEO disqualified it with one sentence (« issue was there before 3.5-flash »). Without that correction, I would have advised an Easypanel environment variable rollback that wouldn't have fixed anything. The wasted cycle would have cost another 20-30 minutes minimum, during which the App Store submission window was burning. - The Sentry trace forwarding. The error event fired at 18:20:57 UTC. The CEO copied the full trace body into the session within minutes. If he hadn't watched the Sentry notification channel, the trace would have sat unread, and the investigation would have continued. He was the routing layer between Sentry and me, and the routing was as load-bearing as the tool itself.
- The decision to scope the fix to ASCII-only string substitution rather than implement a more elaborate solution (latin-1 encoding, RFC 2047 encoded-word, custom httpx middleware). I had briefly considered each. He cut through with the right call : the X-Title is a header, the header is metadata-only, ASCII is the cheap correct answer. Five lines of diff instead of fifty. The right scope at the right time, especially under launch pressure.
Where I almost shipped the wrong thing :
- The committed-then-wrong audit document
71a3274is the most embarrassing artifact of this session. It exists inmain. I added a « hypothesis invalidated, here is what actually happened » section underneath it after the truth surfaced, but the original wrong content is still there for future readers to puzzle over. The lesson is that pushing a conclusion before the conclusion is confirmed creates archeological debris that someone in three months will read and trust. Don't commit conclusions that are still hypotheses. Commit investigations as investigations, and conclusions as conclusions. - The audit document had recommended a specific environment-variable rollback as the fix. If that document had been read by an on-call engineer at 03:00 UTC during a different incident, they would have followed the recommendation and not fixed the actual bug. The cost of a wrong recommendation in a public doc is non-zero ; it's just deferred.
The pattern is consistent with prior sessions : I can move fast on execution, parallelize across call sites, run reproductions and patches at high throughput. The strategic moves — knowing which hypothesis to trust, which trace to escalate, which scope to choose for the fix — still come from a CEO with product memory, market context, and the discipline to push back on confident-but-wrong agents. The throughput of a debugging session compresses ; the judgment of when to stop pursuing a hypothesis does not. Not yet.
Conclusion
A single em-dash in a single HTTP header value broke our entire chat product across four surfaces for roughly 24 hours. The bug was invisible to every safeguard we had — no test caught it, no smoke check exercised it, no monitoring alert noticed it. It surfaced only because a human spot-checked the product two days before submission, and resolved only because a production stack trace pointed at the exact line of code that failed.
The deeper lesson is not about Unicode in headers, although that is the specific gotcha worth committing to muscle memory. The deeper lesson is about the epistemics of debugging under pressure : a hypothesis is not a conclusion, a probe is not a trace, and a fix is not safe until the failure mode has been named and seen, not merely posited and consistent. We almost shipped a fix to the wrong bug because the wrong fix would have been consistent with the symptoms. Consistency is necessary but not sufficient. The trace from production — the specific, time-stamped, file-line-named record of what your code actually did — is the only artifact that closes the gap between "might be" and "is".
This is what observability tooling earns its keep doing. Not the dashboards. Not the alerting. The boring base case : when something silently breaks, a structured error event exists and is one query away. Wire it up before you wire up anything else customer-facing. The decision to add it takes an hour. The decision to not add it doesn't feel like a decision at all, which is why it costs you a launch when the day comes.
Déblo's chat is back online. The fix went out in commit bc93ffb. Easypanel redeployed in under two minutes. All four surfaces passed the smoke test on the first attempt. Sentry has logged zero new UnicodeEncodeError events since. The App Store submission window is open, the brand tagline still reads « Déblo — The real-time voice AI, built in Abidjan. » in every place a human will read it, and the wire-format encoder gets the ASCII version it needs in every place a machine will read it.
The em-dash is back where it belongs. Just not in our HTTP headers.
This piece was written collaboratively by Thales (CEO of ZeroSuite, building Déblo and VeoStudio from Abidjan, Côte d'Ivoire) and Claude Opus 4.7 — Claude Code instance running on macOS. The incident it describes took place on May 19, 2026 (session log phase-13-audit-chat-text-broken-2026-05-19.md). The fix is in commit bc93ffb on main in the deblo.ai monorepo. The seven call sites patched were : backend/app/services/llm.py:104, backend/app/services/memory.py:78, 250, 336, backend/app/routes/voice_tools.py:541, backend/app/services/embedding.py:50, backend/app/services/daily_suggestions.py:211. The Sentry trace that broke the investigation open had ID d90a8ab65aa348df984dd8c0bb478437. The launch-blocking bug had been live in production since commit 784dc91 (May 16, 2026), three days before discovery. The original false-hypothesis audit document is preserved in the repo at session-logs/gemini-session-logs/phase-13-audit-chat-text-broken-2026-05-19.md with an annotated invalidation section, as a record of what reasoning looked like before the trace surfaced.