Frequently Asked Questions

Seven questions about tokenizers, pricing, and Korean efficiency — answered for production use.

작성 김지광 (운영자)마지막 업데이트 2026년 5월 16일balpekr 마이크로 SaaS

Frequently asked questions

Why does Korean use 2-3x more tokens than English for the same meaning?

BPE tokenizers were trained on English-heavy corpora, so most Hangul syllables are not stored as a single merge and end up split into two or three byte-level tokens. GPT-4o uses the newer o200k_base, and Gemini ships a 256k SentencePiece vocab, both of which add many Korean merges and run materially cheaper than cl100k or Llama 3.

Are the Claude and Gemini counts on this page exact?

No. Anthropic does not ship a browser tokenizer and Google does not expose its tokenizer to client-side JavaScript either, so this page approximates both using script-aware heuristics calibrated against public benchmarks. For billing-grade accuracy, call Anthropic’s count_tokens API or Gemini’s countTokens REST endpoint — those are exact.

Which model has the lowest cost per Korean character right now?

As of mid-2026 Gemini 1.5 Flash is usually cheapest per Korean character, followed by GPT-4o mini and Claude Haiku 4.5. The exact ratio depends on prose type — bullet-heavy text closes the gap, while long prose widens it. Use the Korean-efficiency panel on the home page for the current month’s ranking.

Why is output priced 4-5x higher than input on every vendor?

Input tokens go through a single forward pass and can be batched aggressively, but output tokens require autoregressive generation — one forward pass per token, with no batching benefit. The marginal GPU cost per output token is therefore 4-5x higher and every major vendor prices accordingly. Capping max_tokens is the single highest-leverage knob.

Does this tool send my text to a server?

No. All tokenisation and pricing math happens entirely inside your browser tab. The GPT family runs through js-tiktoken loaded as static JS, and the Claude/Gemini/Llama approximations are pure JavaScript. There is no backend that receives your prompt, so you can paste production prompts containing PII or trade secrets without exposure.

Does prompt caching change the numbers I see here?

Yes. With prompt caching enabled, cached prefix tokens are billed at roughly 10% of the regular input price on Claude and GPT-4o. This page always shows the baseline list price, so you should apply the cache discount manually for the portion of your prompt that is actually reused — typically the system message and long retrieved context.

Can I integrate this tokenizer into my own application?

Yes for GPT — use the open-source js-tiktoken package, which this site already loads (about 100 KB minified, WASM-free, runs in any modern JS runtime). For Claude and Gemini use each vendor’s official token-count endpoint over HTTPS. There is no official browser tokenizer for either as of 2026, so client-side counts will be approximate.