EthanBinns

Why I Built Codex Voice for Windows

Projects • 10 min read

There is a strange kind of frustration that comes from knowing exactly what you want to say, but still having to fight the process of getting it onto the screen.

For some people, typing is just typing. It is invisible. They think, they write, they edit, they move on.

For me, it has never really felt like that.

I am dyslexic, and writing has always carried this extra layer of friction. Not because I do not know what I want to say, but because turning that thought into clean typed text can be slow, annoying, and mentally expensive. Spelling, phrasing, corrections, retyping, losing the thread while fixing small mistakes, all of it adds up.

That is one of the reasons the transcription tool inside the Codex app stood out to me so much.

It was not just good in a technical sense. It changed how easy Codex felt to use.

Instead of typing out every thought, I could speak naturally, explain what I wanted, and let the transcription handle the messy conversion from voice to text. That sounds like a small improvement, but it was not small in practice. It made Codex feel less like a tool I had to carefully operate and more like something I could actually think with.

And once I felt that difference, I had a very obvious question:

If this makes Codex significantly easier to use, would it make my whole PC significantly easier to use too?

That question is what led to Codex Voice for Windows.

The problem was not just typing

When people talk about voice transcription, they often frame it as a convenience feature.

That is true, but it undersells the point.

For me, the value is not just that speaking can be faster than typing. The real value is that speaking removes a barrier between thought and action.

Typing forces everything through a narrow channel. You have to spell it, structure it, fix it, and keep your original intention in your head at the same time. That can be fine for short messages, but once you are working through a longer idea, writing a prompt, describing a bug, planning a project, or explaining a technical problem, that friction becomes very real.

Codex made that obvious to me.

The built-in transcription was good enough that I stopped thinking about the transcription itself. That is when a feature becomes powerful. It disappears. You speak, it understands, and suddenly the computer feels less awkward to use.

For someone with dyslexia, that matters a lot.

It is not about avoiding writing altogether. I still care about writing clearly. I still edit. I still think structure matters. But voice gives me a better first pass. It lets me get the idea out before the mechanics of typing slow everything down.

That was the unlock.

Codex showed me what voice input should feel like

I have used voice tools before, but many of them feel like separate products bolted onto the side of the computer.

You have to open a special interface. Or remember a different workflow. Or tolerate strange formatting. Or fix so many transcription mistakes that you start wondering whether it would have been faster to type in the first place.

The Codex transcription experience felt different.

It was quick, accurate, and built directly into the flow of using the app. Hold a key, speak, release, and the text appears. No ceremony. No big mode switch. No sense that you are leaving the thing you were doing.

That tiny interaction pattern matters.

A lot of software gets voice wrong because it treats speech as a whole new interface. But sometimes the best version of voice is much simpler: it is just a better input method. A way to put words wherever your cursor already is.

Once I got used to that inside Codex, going back to normal typing everywhere else on Windows felt worse.

Not broken. Just obviously more effort than it needed to be.

The Mac version proved the idea could work

I did not start from nothing.

Anthony Kroeger had already built codex-voice, an experimental macOS menu bar app that took the Codex transcription flow and made it work system-wide on a Mac.

That project was the proof of concept.

It showed that the useful part of Codex voice did not have to stay trapped inside Codex. You could hold a global shortcut, speak, and insert the transcript into the currently focused app. Notes, browsers, chat boxes, editors, forms, wherever you were already typing.

That was exactly the direction I wanted.

But I use Windows.

So the next step was obvious, at least in the slightly dangerous way technical ideas often feel obvious before you actually build them.

Anthony had reverse engineered the built-in Codex transcription flow and wrapped it in a macOS-native app. I then reverse engineered his version and rebuilt the idea for Windows.

Not as a perfect polished product. Not as some grand platform. Just as a practical tool that solved a real problem I was feeling every day.

What Codex Voice for Windows does

Codex Voice for Windows is an experimental Windows tray app.

The basic idea is simple: hold a configurable shortcut, speak, release, and the transcript gets pasted into the currently focused app.

The default shortcut is Ctrl+M, but it can be changed from the tray settings. While you are speaking, the app shows a small floating HUD so you know whether it is listening, transcribing, or inserting the text.

Under the hood, the Windows version has to work differently from the Mac version.

The macOS app can rely on macOS-specific accessibility APIs and app behavior. Windows has its own model, so the Windows version uses a native global keyboard hook, WinMM microphone capture, a tray menu, a non-activating HUD, and clipboard-based insertion.

That last part is important.

Windows does not have the same accessibility permission flow as macOS. So Codex Voice for Windows inserts text by temporarily placing the transcript on the clipboard, sending Ctrl+V to the foreground window, and then restoring the previous clipboard content shortly afterward.

It is not glamorous, but it works.

And in a tool like this, “it works anywhere I can paste text” is the whole point.

Why system-wide matters

Keeping voice transcription inside Codex is useful.

Making it system-wide changes the category of the tool.

Suddenly it is not just a Codex feature. It becomes a general input layer for the PC.

That means I can use the same flow when writing a message, drafting a blog post, filling out a form, searching the web, prompting another AI tool, writing notes, or explaining a problem in a GitHub issue.

This is where the project became personally important to me.

The point was not to create another dictation app for the sake of it. The point was to take the best input experience I had found and make it available everywhere else I work.

For dyslexic users especially, that kind of system-wide availability matters. Accessibility features are most useful when they are not trapped inside one app. The friction does not only exist in one app. It follows you across the whole computer.

So the solution has to follow too.

The project is experimental, and that is fine

Codex Voice for Windows is very much an experimental tool.

It depends on Codex being installed locally. It depends on local Codex auth already existing on your machine. It reads the local Codex auth file at runtime and uses the Codex transcription endpoint. If Codex changes its internal transcription flow, this tool may break.

That is the tradeoff.

This is not an official Codex feature. It is a reverse-engineered bridge built around something that already works incredibly well.

But I think there is value in projects like that.

A lot of useful software starts this way. Someone finds a workflow that feels obviously better, notices it is limited to one environment, and builds the missing bridge. It may be rough at first. It may rely on internals. It may need maintenance when the original system changes. But it proves demand in a very real way.

It says: this interaction is good enough that people want it everywhere.

That is how I feel about Codex transcription.

The security side mattered too

Because this project touches authentication and voice input, I wanted the security model to be clear.

Codex Voice for Windows does not ask for, store, or commit OpenAI credentials. It uses your existing local Codex sign-in. The access token is read at runtime and sent as an HTTP authorization header to the Codex transcription endpoint.

It is not written to logs. It is not written to settings. It is not passed through command-line arguments.

Temporary WAV files are created while dictating and deleted after transcription succeeds or fails. Debug logs are off by default and avoid transcript text and token values.

That does not magically make every experimental tool risk-free, but it does mean the project is trying to be honest about what it does and where sensitive data goes.

For a utility like this, that transparency matters.

What this changed for me

The biggest change is that my PC feels less resistant.

That might sound dramatic, but it is the best way I can describe it.

Before, there were lots of moments where I had a thought and then had to decide whether it was worth typing out properly. With voice transcription available system-wide, the answer is more often yes. I can get the thought down quickly, then clean it up afterward.

That is a better workflow for me.

It also makes AI tools more useful. The quality of an AI interaction depends heavily on how clearly you can explain what you want. If typing is a bottleneck, your prompts get shorter, vaguer, or more frustrating to write. Voice removes some of that pressure.

You can explain the actual context. You can ramble a bit. You can give the model the messy version of the problem and then refine from there.

That is especially useful with Codex, where the difference between a weak instruction and a clear one can completely change the quality of the result.

Final take

Codex Voice for Windows exists because one feature in Codex made the whole app easier for me to use.

The transcription was good enough that it changed my behavior. It lowered the cost of explaining what I wanted. It made my thoughts easier to turn into instructions. And because I am dyslexic, that was not just a nice convenience. It was a meaningful accessibility improvement.

Anthony Kroeger’s macOS project showed that this experience could be pulled out of Codex and made system-wide. My Windows version is the same idea rebuilt for the environment I actually use every day.

It is experimental. It may break. It depends on Codex internals.

But it also points at something important.

The future of using computers is not just better apps. It is better ways to express intent. For me, voice is a big part of that. Not as a gimmick, not as a replacement for careful writing, but as a lower-friction way to get thoughts onto the screen.

Codex made that obvious.

Codex Voice for Windows is my attempt to bring that feeling to the rest of my PC.

Ask AI

References