The Two Costs Inside Every AI Interview Copilot
Every AI interview copilot has two operational costs: speech transcription and AI inference. These are the two things the tool needs to do — convert your audio to text, and then generate an answer from that text.
Most competitors run both on their own cloud infrastructure and charge you a monthly subscription that bundles those costs plus their margin. That's why the pricing lands at $75–$299/month — they're paying for cloud transcription APIs, OpenAI or Anthropic API calls, server infrastructure, and then adding margin on top.
faFAANG's architecture eliminates both of those cost centers. Here's exactly how.
Cost 1: Transcription — $0, Running on Your CPU
faFAANG bundles a local copy of Moonshine — an open-source speech recognition model developed by Useful Sensors and published on Hugging Face. Moonshine runs entirely on-device using your CPU. No API key. No network call. No cloud account. No per-minute charge.
What Moonshine Actually Is
Moonshine is a family of encoder-decoder transformer models specifically designed for live, streaming transcription on edge hardware. It uses Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings and was trained on variable-length audio segments without zero-padding — which makes it more efficient at inference time than Whisper-class models.
The performance benchmarks are significant. Moonshine achieves 5x faster processing than Whisper Tiny at equivalent word error rates — and Moonshine's larger models outperform Whisper Large V3 on accuracy while using less compute. The medium variant achieves sub-200ms latency on standard consumer CPUs. Moonshine v2 Medium runs at 43.7x faster than real-time on CPU — meaning it can process a 10-second audio clip in under 230 milliseconds.
faFAANG ships vendor/moonshine-medium as a bundled asset inside the packaged Electron app. The speech pipeline works like this:
- ■The control renderer captures microphone audio through the browser's audio worklet
- ■Raw PCM frames are streamed via IPC to the Electron main process
- ■The main process feeds those frames to the local Moonshine engine (running as a native worker or sidecar process)
- ■Text output streams back to the renderer in real time
The audio never leaves your machine during transcription. There's no AssemblyAI, no Deepgram, no Google Speech-to-Text in the pipeline. The transcription cost to faFAANG is zero, and the transcription cost to you is zero.
| Approach | Cost per hour of audio | Privacy |
|---|---|---|
| AssemblyAI (cloud) | ~$0.65 | Audio sent to cloud |
| Deepgram Nova-2 (cloud) | ~$0.44 | Audio sent to cloud |
| OpenAI Whisper API | ~$0.36 | Audio sent to cloud |
| Moonshine (local, faFAANG) | $0.00 | Audio never leaves device |
Over a 6-month job search with two interviews per week, cloud transcription alone would add $8–$16 in infrastructure cost to each user session — costs competitors pass through in their subscription pricing.
Cost 2: AI Inference — Routed Through Your Own Codex Account
When faFAANG generates a response, it doesn't call a faFAANG-owned API. It routes requests through your own ChatGPT/Codex account via the OpenAI Codex app-server.
How the Codex Process Works Technically
faFAANG's main process manages a CodexAppServerClient — a child process that runs the OpenAI Codex app-server binary and communicates with it over JSON-RPC via stdio. The user logs into their own ChatGPT account within faFAANG's setup flow. All requests route through that authenticated session.
The app maintains two separate Codex clients — one for Experience mode (behavioral interviews) and one for Coding mode (technical rounds) — each with its own thread, its own seeded context, and its own turn state. When you press Ctrl+S to stop transcription, the main process assembles a turn request (transcript + optional screenshots), sends it via JSON-RPC to the relevant Codex client, and streams the delta tokens back into the pane.
faFAANG's servers are not in this path. The data flow is:
Your mic → Moonshine (local) → transcript → Codex child process → your ChatGPT account → OpenAI → response streamed back
The outbound traffic is to chatgpt.com — the same destination as using ChatGPT in a browser tab. No faFAANG domain. No fingerprinting surface. No markup.
How Many Tokens Does a 1-Hour Interview Actually Use?
This is where the cost math gets interesting. Let's build the token estimate from first principles.
Step 1: Transcript Tokens
An average person speaks at around 130 words per minute. In a live coding interview, you're not talking the whole time — you're reading the problem, thinking, typing code. Active verbal exchange accounts for roughly 20–25 minutes of a 60-minute session (both interviewer and candidate speech combined).
25 minutes × 130 words/min = ~3,250 words of transcript. At approximately 1.3 tokens per word (standard for English prose), that's ~4,200 transcript tokens over the full session.
These tokens don't all go in one request. A typical interview generates 6–10 separate questions (turns). Each turn submits only the transcript from that question, not the full session history — roughly 300–600 words per turn, or ~400–800 tokens per input turn.
Step 2: Context Seeding (One-Time Per Session)
Before the interview starts, faFAANG seeds the Codex thread with your interview context — resume, STAR stories, job description brief, and mode-specific prompt instructions. This seeding happens once per session and typically totals:
- ■Resume: ~500–1,000 tokens
- ■STAR stories (3–5 stories): ~2,000–4,000 tokens
- ■JD brief + prompt instructions: ~1,000–2,000 tokens
- ■Total context seed: ~4,000–7,000 tokens
Step 3: Screenshots (Coding Mode)
In Coding mode, a screenshot of a problem statement consumes approximately 765–1,700 tokens depending on resolution and detail level. For a typical session with 2 screenshots (initial problem capture + follow-up), that's ~1,500–3,400 tokens.
Step 4: Response Tokens (Output)
For a behavioral turn, a typical faFAANG response (STAR-structured answer) runs 300–600 words — roughly 400–780 output tokens. For a coding turn, a response with a code solution, time complexity analysis, and explanation runs 500–800 tokens.
For 8 turns across a mixed interview: ~4,000–6,000 total output tokens.
Total Token Budget: 1-Hour Interview
| Component | Token Type | Estimated Range |
|---|---|---|
| Context seeding (one-time) | Input | 4,000 – 7,000 |
| Transcript (8 turns) | Input | 3,200 – 6,400 |
| Screenshots (Coding mode) | Input | 1,500 – 3,400 |
| Responses (8 turns) | Output | 4,000 – 6,000 |
| Total | Input + Output | ~13,000 – 23,000 tokens |
Call it ~20,000 tokens worst-case for a full 1-hour mixed behavioral + coding interview. That's the ceiling with generous context, multiple screenshots, and detailed responses.
What 20,000 Tokens Actually Costs
At the retail API price for codex-mini-latest — the model faFAANG uses — OpenAI charges $1.50 per million input tokens and $6.00 per million output tokens.
| Token Type | Volume | Rate | Cost |
|---|---|---|---|
| Input tokens | ~15,000 | $1.50 / 1M | $0.023 |
| Output tokens | ~5,000 | $6.00 / 1M | $0.030 |
| Total API cost | ~20,000 tokens | — | ~$0.05 |
A full 1-hour interview costs approximately five cents at retail API pricing.
This is what competitors are charging $75–$299/month to provide. The AI compute they're buying on your behalf costs them fractions of a dollar per session. The rest is infrastructure margin, team overhead, and pricing power from the fact that you don't know what the raw cost is.
faFAANG doesn't pay that $0.05. You pay it — through your own ChatGPT account, at the rates OpenAI gives you directly. faFAANG's infrastructure is not in the inference path.
Codex Is Free on a Free ChatGPT Account
OpenAI currently includes Codex access in the free ChatGPT tier. Free accounts get approximately ~200 Codex tasks per month (roughly 50 per week, with weekly resets). Each interview session — from context seeding through all your question turns — counts as a small number of tasks.
For most engineers during an active job search, that's 10–15 complete interview sessions per week on a free account. A typical job search involves 2–4 live interviews per week at peak. The free tier handles the entire load.
Free ChatGPT account — what you get:
- ■~200 Codex tasks per month, limits reset weekly
- ■Access to codex-mini-latest (the model faFAANG uses by default)
- ■Enough headroom for 2–4 live interviews per week
- ■$0/month cost
If you're earlier in your prep and doing 1–2 mock interviews per week plus a real interview, a free ChatGPT account is enough. The weekly reset means limits refresh automatically. You don't need to think about it.
ChatGPT Plus at $20/Month: Effectively Unlimited for Interviews
ChatGPT Plus costs $20/month. It gives you significantly higher Codex rate limits — roughly 2x the free tier caps across the board, with 5-hour window limits that reset dynamically and are high enough that normal interview use never approaches them.
At $0.05 per interview at raw API pricing, $20 in compute budget would theoretically cover 400 interview sessions. Even accounting for the overhead of subscription vs. API, the practical limit on ChatGPT Plus for interview use is “more than you will ever need in a job search.”
| Account type | Monthly cost | Practical interview capacity | Limit resets |
|---|---|---|---|
| ChatGPT Free | $0 | 2–4 interviews/week | Weekly |
| ChatGPT Plus | $20 | Effectively unlimited | Rolling 5-hour windows |
Most engineers targeting FAANG-tier roles already have a ChatGPT Plus subscription for daily coding assistance. In that case, faFAANG's AI inference cost is exactly $0 incremental — it draws from a budget you were already paying for. The total cost of faFAANG for a user on ChatGPT Plus is: $49.99 once, period.
The Full Cost Comparison at Each Tier
| Setup | Transcription | Inference | faFAANG license | 6-month total |
|---|---|---|---|---|
| faFAANG + Free ChatGPT | $0 | $0 | $49.99 once | $49.99 |
| faFAANG + ChatGPT Plus | $0 | $20/mo (already paying) | $49.99 once | $156 (if new sub) |
| Final Round AI (annual) | bundled | bundled | $25/mo | $150 |
| Final Round AI (monthly) | bundled | bundled | $90/mo | $540 |
| LockedIn AI | bundled | bundled | ~$50–70/mo | $300–420 |
If you already have ChatGPT Plus — which most engineers in this category do — then faFAANG's incremental cost vs. a job search with no copilot is literally $49.99. That's less than a single month on any competitor.
Why Competitors Charge 80x More Per Interview
If a 1-hour interview costs $0.05 in AI compute, why do competitors charge $75–$299/month? A few reasons:
- ■Cloud transcription markup. Running cloud transcription (AssemblyAI, Deepgram, etc.) at scale costs real money, and those costs are baked into the subscription price.
- ■Server infrastructure overhead. Every cloud-routed request means servers, load balancers, databases, DevOps. These are real costs at scale.
- ■Feature bundling. Most competitors bundle resume builders, mock interview simulators, and application tracking — features that have their own operational costs — into the same subscription price.
- ■Margin. The market has accepted $75–$299/month as normal. When no one publishes the underlying cost, no one knows they're paying a massive markup on compute.
faFAANG's architecture sidesteps all of these. Moonshine eliminates cloud transcription cost entirely. The Codex routing eliminates faFAANG-owned AI infrastructure. The native Electron architecture eliminates the need for a web server in the response path. The $49.99 lifetime price covers the product platform — the OS-level stealth, the dual-mode context system, the 21 global hotkeys, the local context library — not a markup on compute you could access directly.
The Architecture Behind the Price
| Component | How it works | Cost to you |
|---|---|---|
| Speech transcription | Moonshine medium model, CPU-only, bundled in app | $0 |
| AI inference | Codex app-server child process, your ChatGPT account | Drawn from your plan |
| Context storage | Electron userData directory, local files | $0 |
| Screenshots | Local temp files, auto-deleted after request | $0 |
| Stealth overlay | WDA_EXCLUDEFROMCAPTURE, OS-level Win32 API | $0 |
| faFAANG license | One-time platform purchase — all features, forever | $49.99 once |
The numbers are simple: transcription is $0 because Moonshine runs on your CPU. AI inference is ~$0.05 per interview because it routes through your own ChatGPT account. On a free ChatGPT plan, that's enough for multiple interviews per week. On Plus at $20/month, it's unlimited. faFAANG charges $49.99 once for the platform — the stealth, the keyboard system, the context engine. Not a markup on compute.