The AI Map Claude Response Speed
✓ Fact-checked June 24, 2026 Sources: Anthropic Docs · status.claude.com · platform.anthropic.com · Artificial Analysis benchmarks

Why Does Claude Take So Long to Respond?

You send a message and Claude sits there. The spinner runs. 20 seconds. 40 seconds. Nothing. Here is the complete technical breakdown of what is actually happening during that wait — every cause, with real latency numbers per model, and fixes ranked by impact.

Direct Answer
The most common cause is extended thinking mode being active. When enabled, Claude generates reasoning tokens internally before writing its visible response — this thinking phase takes 10–90 seconds on complex tasks. It is not broken; it is working. Standard Claude Sonnet 4.6 with thinking off begins streaming in 1–3 seconds, comparable to ChatGPT. Claude Opus 4.8 always uses adaptive thinking and cannot be made as fast as Sonnet. Turn off extended thinking for routine tasks and switch to Haiku 4.5 for maximum speed.
Claude Limits — Complete Guide Series

The Technical Pipeline: What Happens Between Send and Stream

Every Claude response goes through two distinct phases before you see any text. Understanding these phases explains most of what users experience as "slowness":

Phase 1
Prefill — processing your context
Before Claude generates any output, it must process all input tokens: your message, the entire conversation history, any uploaded documents, and the system prompt. This is called the prefill phase. It scales linearly with context length. A fresh conversation with one message prefills in milliseconds. A 40-message thread with two uploaded PDFs can take 3–8 seconds just to prefill. The prefill phase produces no visible output — the spinner runs silently.
Phase 2
Thinking tokens — internal reasoning (if enabled)
If extended thinking or adaptive thinking is active, Claude generates reasoning tokens after prefill but before its visible response. These tokens are not shown to you. They are Claude working through the problem — planning, considering alternatives, checking its reasoning. This phase adds 10–90 seconds before the first visible word appears. The longer and more complex the task, the more thinking tokens are generated. This is the dominant cause of long waits in the majority of user reports.
Phase 3
Decode — generating visible output
Once thinking is done, Claude begins generating visible response tokens. This is what you see streaming word by word. Decode speed depends on model (Haiku is fastest, Opus is slowest per token), current server load, and response length. Claude Haiku 4.5 generates tokens significantly faster than Opus 4.8, so even at identical task complexity, Haiku produces visible output faster after the first token appears.
Time-to-first-token (TTFT) is the metric that determines whether Claude feels fast or slow. It is the sum of prefill time + thinking time. The actual streaming speed after the first token appears is usually fine — users experience slowness as the wait before any text shows, not as slow streaming speed.

Cause 1: Extended Thinking Mode (Responsible for ~70% of complaints)

Extended thinking is available on Claude Sonnet 4.6 and Haiku 4.5 as an opt-in feature. When active, it dramatically improves output quality on complex tasks — but adds a significant thinking delay before any visible response.

Task TypeThinking On — Time to First Visible WordThinking Off — Time to First Visible WordQuality DifferenceRecommendation
Simple question or fact lookup10–25s1–2sNegligibleTurn off
Email drafting10–30s1–2sMinimalTurn off
Summarising a document15–40s1–3sSmall — occasionally better structureUsually off
Translation12–25s1–2sNegligible for most languagesTurn off
Code debugging (small function)15–40s1–3sModerate — sometimes catches more edge casesSituational
Complex code architecture review30–90s2–5sSignificant — reasoning quality is markedly betterKeep on
Multi-step logical reasoning problem40–90s2–5sSignificant — fewer reasoning errorsKeep on
Research synthesis across long documents30–70s2–5sModerate to significantKeep on for deep analysis
How to toggle extended thinking: In claude.ai, look for the thinking toggle in the conversation controls below the message box. On Haiku 4.5 and Sonnet 4.6, this is a toggle you control per conversation. On Claude Opus 4.8, you cannot turn it off — see Cause 2.

Cause 2: Using Opus 4.8 or a High-Tier Model

Claude Opus 4.8 uses adaptive thinking — a reasoning mode that is always active and cannot be disabled. Unlike extended thinking on Sonnet/Haiku (which you control), adaptive thinking on Opus engages automatically based on task complexity. Simple questions get lighter thinking. Complex tasks trigger longer thinking passes.

The practical result: Opus 4.8 will never match Sonnet 4.6 or Haiku 4.5 for time-to-first-token. If speed is your priority and you do not need Opus-level capability, switch to Sonnet 4.6 with extended thinking off. The latency difference is dramatic — from 5–15 seconds (Opus) to 1–2 seconds (Sonnet, thinking off).

ModelThinking TypeCan DisableTTFT (Standard)TTFT (Complex Task)Output SpeedSpeed Rank
Claude Haiku 4.5Extended (optional)Yes~1s (off)15–45s (on)Fastest1st
Claude Sonnet 4.6Extended (optional)Yes~1–2s (off)20–60s (on)Fast2nd
Claude Opus 4.8Adaptive (always on)No~5–15s30–90sModerate4th
Output speed vs time-to-first-token: These are different. Claude Haiku generates visible tokens fastest once streaming starts. Opus generates them slower but the total response length matters too — Opus tends to write longer, more detailed responses. For the same task, Haiku may feel faster start-to-finish even on moderate tasks despite streaming more words per second.

Cause 3: Long Conversation Threads and Large Context

This is the silent performance killer that compounds over time in long conversations. Claude processes the entire conversation context (all previous messages plus any uploaded files) during the prefill phase before each new reply.

How prefill time scales with context

Prefill time is roughly linear with token count. Here is what this looks like in practice:

Context SizeWhat It Looks LikeApproximate Prefill Time
5,000 tokensFresh conversation, a few short messages<0.5s — imperceptible
20,000 tokensLong conversation (15–20 messages) or short document upload0.5–1.5s — barely noticeable
50,000 tokensLong thread with a large file attachment1–3s — noticeable
100,000 tokensMultiple large PDFs + extended conversation2–6s — adds to thinking delay
500,000 tokensLarge codebase or multiple book-length documents8–20s — significant
900,000+ tokensNear max context (Sonnet/Opus support up to 1M)15–40s before first token

These prefill times add to any thinking time. If you are using extended thinking on Sonnet in a 100K token context, you might see prefill (3–5s) + thinking (20–40s) + decode start = 25–45 seconds before the first word appears. This is not a bug. It is the model doing real work on a large amount of information.

Prompt caching dramatically reduces prefill cost: If you store frequently used documents in a Claude Project, they are cached. Cached tokens do not require full reprocessing during the prefill phase — they are retrieved from a cache that costs approximately 90% less compute. A 100K token cached context takes nearly the same time to prefill as a 10K token uncached one.

Cause 4: Infrastructure Load and Active Incidents

Claude runs on shared cloud GPU infrastructure. Response latency varies with demand — US business hours drive the highest load, and active incidents further degrade performance.

Current status — June 24, 2026: Elevated error rate on Claude Opus 4.8, under active investigation. Recent incidents on June 22–23 caused widespread slowdowns and errors. If Claude is responding more slowly than usual today, check status.claude.com before troubleshooting your setup.

Typical latency increase during peak hours (US 9am–6pm Eastern): 1.5–2× slower TTFT compared to off-peak. For Opus 4.8 with extended thinking, this can mean the difference between a 20-second and a 40-second wait on the same query.

Claude vs competitor latency

Model (Standard Mode)Avg TTFT — Peak HoursAvg TTFT — Off PeakStreaming Speed
Claude Haiku 4.5 (thinking off)1–2s<1sVery fast
Claude Sonnet 4.6 (thinking off)2–4s1–2sFast
Claude Opus 4.8 (adaptive thinking)8–20s5–12sModerate
GPT-4o (ChatGPT Plus)2–4s1–2sFast
GPT-4o mini (ChatGPT fallback)1–2s<1sVery fast
Gemini 1.5 Pro (Gemini Advanced)2–5s1–3sModerate
Claude Sonnet 4.6 (extended thinking on)20–60s15–45sFast (after thinking)

The table makes clear that standard Sonnet 4.6 is not slow relative to competitors. The perception of Claude being slow comes specifically from users who are using Opus 4.8 or have extended thinking enabled on Sonnet.

Cause 5: Claude Code and Agentic Tool Use

Claude Code and Research mode make multiple sequential model calls as part of their operation. Claude Code might run a terminal command, read the output, reason about it, then generate its next step — each step being its own inference pass. Research mode runs multiple web searches plus synthesis calls.

This is not Claude being slow. It is executing 3–10 model calls where a standard response uses 1. The total wall-clock time for a Research session or a complex Claude Code task is the sum of all individual inference steps. There is no way to make multi-step agentic tasks as fast as single-step responses — you are trading speed for capability.

6 Fixes Ranked by Impact

1
Turn off extended thinking — immediate and dramatic
This eliminates the single biggest source of pre-response delay. On Sonnet 4.6 and Haiku 4.5, the thinking toggle is in conversation controls. Switching from thinking-on to thinking-off changes TTFT from 20–60 seconds to 1–3 seconds for the same message. For most tasks (email, editing, quick Q&A, translation, summaries), the quality difference is negligible. Reserve thinking for complex code architecture, multi-step reasoning, and deep analysis.
2
Switch from Opus 4.8 to Sonnet 4.6 (thinking off)
Claude Opus 4.8 always uses adaptive thinking — there is no way to remove the thinking delay on Opus. If you need fast responses and your tasks do not require Opus-level reasoning depth, switch to Sonnet 4.6 with extended thinking off. TTFT drops from 5–20 seconds to 1–2 seconds. Sonnet 4.6 handles the vast majority of professional tasks — writing, editing, code, analysis — at a quality level most users cannot distinguish from Opus in blind testing.
3
Switch to Haiku 4.5 for high-volume quick tasks
Haiku 4.5 with extended thinking off produces the fastest responses of any Claude model — typically under 1 second TTFT in normal conditions. For tasks that do not require deep reasoning — reformatting text, short translations, quick lookups, light editing, generating structured data from templates — Haiku delivers fast, high-quality results at a fraction of the compute cost. Use the model picker in claude.ai to switch per conversation.
4
Start fresh conversations — reduces prefill overhead
When a conversation thread grows long (especially with file attachments), the prefill phase before each new response takes longer. Starting a new conversation with a one-paragraph recap of context resets the clock — prefill goes from potentially 5–10 seconds back down to under a second. You lose seamless continuity but gain faster, more efficient responses for the rest of the session. This is the second-most impactful fix after disabling thinking.
5
Move repeated context to Projects (prompt caching)
Documents stored in a Claude Project are cached. Cached context does not require full reprocessing during prefill — it is retrieved at ~10% of the original compute cost. If you regularly work with the same large document, codebase, or system prompt, Project storage can cut your prefill time by 80–90% on those repeated elements. The first call after caching takes normal time; every subsequent call in that session is dramatically faster.
6
Use off-peak hours or the API for latency-sensitive work
Response latency peaks during US business hours (9am–6pm Eastern). If you are not time-constrained, early morning or late evening sessions consistently show 30–50% lower TTFT. For latency-sensitive production applications, the Anthropic API typically shows lower and more consistent latency than the claude.ai web UI, because API calls bypass session management overhead. Developers building on Claude should always use the API, not scrape the UI.

Diagnosing What Is Actually Happening

What You SeeCauseDiagnosis ConfidenceFix
Thinking spinner for 10–90s, then normal streamingExtended thinking or adaptive thinking activeVery highTurn off thinking or switch to Sonnet/Haiku
Short spinner, then slow streaming that starts fine but deceleratesServer load or rate limiting mid-responseHighCheck status.claude.com; try off-peak
Normal response speed on first messages, slows after message 20–30Context length prefill scalingHighStart fresh conversation
Every Opus 4.8 response takes 10–20s even for simple questionsAdaptive thinking always on — expectedCertainSwitch to Sonnet 4.6 for speed
Spinner runs indefinitely, no response appearsServer error or connection issue — not normalHighCheck status.claude.com; refresh page
Response starts then cuts off mid-sentenceServer-side error during generationHighRefresh and retry; check status.claude.com
Everything is slower than usual todayActive infrastructure incident or peak loadMediumCheck status.claude.com first

Why Claude Can Feel Slower Than ChatGPT Even Though It Is Not

This perception comes from three factors:

1. Thinking is visible as delay. ChatGPT's default GPT-4o mode does not have extended thinking. ChatGPT o1 and o3 do have thinking modes with similar delays — but most Plus users are using GPT-4o without reasoning modes. Claude's more prominent placement of thinking-capable models in the default experience means users hit the thinking delay more often.

2. Claude Opus is the advertised flagship. Many users who hear "use the best model" try Opus 4.8, experience the 5–15 second TTFT from adaptive thinking, and conclude Claude is slow. Sonnet 4.6 with thinking off is not slow.

3. Longer, more structured responses. Claude tends to write more comprehensive, structured responses than ChatGPT by default. A 1,500-word structured answer takes longer to stream completely than a 300-word conversational reply, even at identical token-per-second rates. If you want faster-feeling responses, ask Claude to be more concise.

Frequently Asked Questions

Why does Claude pause for so long before responding? +
The most common cause is extended thinking mode being active. When enabled, Claude generates reasoning tokens internally before writing its visible response — this thinking phase takes 10–90 seconds depending on task complexity and model. The pause is not Claude being slow; it is working through the problem. Turn off extended thinking in conversation settings for faster responses on routine tasks.
How do I make Claude respond faster? +
Six fixes in order of impact: (1) Turn off extended thinking for non-complex tasks — removes the 10–90s thinking pause immediately; (2) Switch from Opus 4.8 to Sonnet 4.6 with thinking off — Opus always uses adaptive thinking you cannot disable; (3) Switch to Haiku 4.5 for quick tasks — fastest model; (4) Start a fresh conversation — long threads take longer to prefill; (5) Use off-peak hours; (6) Use the API directly for lower overhead than the web UI.
Is Claude slower than ChatGPT? +
Standard Sonnet 4.6 with thinking off begins streaming in 1–2 seconds — comparable to GPT-4o. Haiku 4.5 is typically faster than GPT-4o mini. The slowness users report is almost always from extended thinking on Sonnet/Haiku, or from using Opus 4.8 which always uses adaptive thinking. Claude is not inherently slower than ChatGPT, but its reasoning-capable models have a pre-response thinking delay that standard ChatGPT GPT-4o does not.
What is the difference between extended thinking and adaptive thinking? +
Extended thinking is an optional feature on Sonnet 4.6 and Haiku 4.5 — you toggle it on or off. Adaptive thinking is built into Opus 4.8 and cannot be turned off. Both involve generating internal reasoning tokens before the visible response. Adaptive thinking on Opus scales proportionally to task complexity — simple questions trigger lighter thinking passes. Extended thinking on Sonnet/Haiku runs regardless of task complexity when enabled.
Does Claude slow down in long conversations? +
Yes, measurably. Claude reprocesses the full conversation history during the prefill phase before each new response. The prefill time scales roughly linearly with context length. A 100-message conversation with file attachments can take 5–10 seconds just to prefill before thinking even begins. Starting a fresh conversation with a brief recap is the most effective fix for slowness in long threads.
🗺

Claude vs ChatGPT vs Gemini — Speed and Model Comparison

Latency benchmarks, model specs, pricing, and use-case verdicts — June 2026.

ChatGPT vs Claude — Full Comparison →

More in This Series