The AI Map › Claude Response Speed

✓ Fact-checked June 24, 2026 Sources: Anthropic Docs · status.claude.com · platform.anthropic.com · Artificial Analysis benchmarks

Why Does Claude Take So Long to Respond?

Q: How do I make Claude respond faster?

Six fixes in order of impact: (1) Turn off extended thinking for non-complex tasks — eliminates the 10–90s thinking pause immediately; (2) Switch from Opus 4.8 to Sonnet 4.6 with thinking off — Opus always uses adaptive thinking; (3) Switch to Haiku 4.5 for quick tasks — the fastest model by time-to-first-token; (4) Start a fresh conversation — long threads take longer to prefill before each reply; (5) Use off-peak hours — latency is lower outside US business hours; (6) Use the API directly for developers — lower overhead than the web UI.

Q: What is the difference between extended thinking and adaptive thinking in Claude?

Extended thinking is an optional feature on Claude Sonnet 4.6 and Haiku 4.5 — you can toggle it on or off. Adaptive thinking is built into Claude Opus 4.8 and cannot be turned off. Both involve Claude generating internal reasoning tokens before its visible response, but adaptive thinking on Opus is tuned to activate proportionally to task complexity, while extended thinking on Sonnet/Haiku runs a full reasoning pass regardless of task difficulty.

Q: Does Claude slow down in long conversations?

Yes — measurably. Claude reprocesses the full conversation history (the prefill phase) before generating each new response. The prefill phase scales linearly with context length. A conversation with 40 messages and several uploaded documents takes significantly longer to prefill than a fresh conversation. Starting a new chat with a brief recap is the most effective fix for slowness in long threads.

You send a message and Claude sits there. The spinner runs. 20 seconds. 40 seconds. Nothing. Here is the complete technical breakdown of what is actually happening during that wait — every cause, with real latency numbers per model, and fixes ranked by impact.

Direct Answer

The most common cause is extended thinking mode being active. When enabled, Claude generates reasoning tokens internally before writing its visible response — this thinking phase takes 10–90 seconds on complex tasks. It is not broken; it is working. Standard Claude Sonnet 4.6 with thinking off begins streaming in 1–3 seconds, comparable to ChatGPT. Claude Opus 4.8 always uses adaptive thinking and cannot be made as fast as Sonnet. Turn off extended thinking for routine tasks and switch to Haiku 4.5 for maximum speed.

Claude Limits — Complete Guide Series

Why does Claude run out so fast? → Why does Claude have a limit but ChatGPT doesn't? → Why does Claude take so long to respond? You are here Why does Claude take so long to load? → Why does Claude say "this organization has been disabled"? → Why does Claude need my phone number? →

The Technical Pipeline: What Happens Between Send and Stream

Every Claude response goes through two distinct phases before you see any text. Understanding these phases explains most of what users experience as "slowness":

Phase 1

Prefill — processing your context

Before Claude generates any output, it must process all input tokens: your message, the entire conversation history, any uploaded documents, and the system prompt. This is called the prefill phase. It scales linearly with context length. A fresh conversation with one message prefills in milliseconds. A 40-message thread with two uploaded PDFs can take 3–8 seconds just to prefill. The prefill phase produces no visible output — the spinner runs silently.

Phase 2

Thinking tokens — internal reasoning (if enabled)

If extended thinking or adaptive thinking is active, Claude generates reasoning tokens after prefill but before its visible response. These tokens are not shown to you. They are Claude working through the problem — planning, considering alternatives, checking its reasoning. This phase adds 10–90 seconds before the first visible word appears. The longer and more complex the task, the more thinking tokens are generated. This is the dominant cause of long waits in the majority of user reports.

Phase 3

Decode — generating visible output

Once thinking is done, Claude begins generating visible response tokens. This is what you see streaming word by word. Decode speed depends on model (Haiku is fastest, Opus is slowest per token), current server load, and response length. Claude Haiku 4.5 generates tokens significantly faster than Opus 4.8, so even at identical task complexity, Haiku produces visible output faster after the first token appears.

Time-to-first-token (TTFT) is the metric that determines whether Claude feels fast or slow. It is the sum of prefill time + thinking time. The actual streaming speed after the first token appears is usually fine — users experience slowness as the wait before any text shows, not as slow streaming speed.

Cause 1: Extended Thinking Mode (Responsible for ~70% of complaints)

Extended thinking is available on Claude Sonnet 4.6 and Haiku 4.5 as an opt-in feature. When active, it dramatically improves output quality on complex tasks — but adds a significant thinking delay before any visible response.

Task Type	Thinking On — Time to First Visible Word	Thinking Off — Time to First Visible Word	Quality Difference	Recommendation
Simple question or fact lookup	10–25s	1–2s	Negligible	Turn off
Email drafting	10–30s	1–2s	Minimal	Turn off
Summarising a document	15–40s	1–3s	Small — occasionally better structure	Usually off
Translation	12–25s	1–2s	Negligible for most languages	Turn off
Code debugging (small function)	15–40s	1–3s	Moderate — sometimes catches more edge cases	Situational
Complex code architecture review	30–90s	2–5s	Significant — reasoning quality is markedly better	Keep on
Multi-step logical reasoning problem	40–90s	2–5s	Significant — fewer reasoning errors	Keep on
Research synthesis across long documents	30–70s	2–5s	Moderate to significant	Keep on for deep analysis

How to toggle extended thinking: In claude.ai, look for the thinking toggle in the conversation controls below the message box. On Haiku 4.5 and Sonnet 4.6, this is a toggle you control per conversation. On Claude Opus 4.8, you cannot turn it off — see Cause 2.

Cause 2: Using Opus 4.8 or a High-Tier Model

Claude Opus 4.8 uses adaptive thinking — a reasoning mode that is always active and cannot be disabled. Unlike extended thinking on Sonnet/Haiku (which you control), adaptive thinking on Opus engages automatically based on task complexity. Simple questions get lighter thinking. Complex tasks trigger longer thinking passes.

The practical result: Opus 4.8 will never match Sonnet 4.6 or Haiku 4.5 for time-to-first-token. If speed is your priority and you do not need Opus-level capability, switch to Sonnet 4.6 with extended thinking off. The latency difference is dramatic — from 5–15 seconds (Opus) to 1–2 seconds (Sonnet, thinking off).

Model	Thinking Type	Can Disable	TTFT (Standard)	TTFT (Complex Task)	Output Speed	Speed Rank
Claude Haiku 4.5	Extended (optional)	Yes	~1s (off)	15–45s (on)	Fastest	1st
Claude Sonnet 4.6	Extended (optional)	Yes	~1–2s (off)	20–60s (on)	Fast	2nd
Claude Opus 4.8	Adaptive (always on)	No	~5–15s	30–90s	Moderate	4th

Output speed vs time-to-first-token: These are different. Claude Haiku generates visible tokens fastest once streaming starts. Opus generates them slower but the total response length matters too — Opus tends to write longer, more detailed responses. For the same task, Haiku may feel faster start-to-finish even on moderate tasks despite streaming more words per second.

Cause 3: Long Conversation Threads and Large Context

This is the silent performance killer that compounds over time in long conversations. Claude processes the entire conversation context (all previous messages plus any uploaded files) during the prefill phase before each new reply.

How prefill time scales with context

Prefill time is roughly linear with token count. Here is what this looks like in practice:

Context Size	What It Looks Like	Approximate Prefill Time
5,000 tokens	Fresh conversation, a few short messages	<0.5s — imperceptible
20,000 tokens	Long conversation (15–20 messages) or short document upload	0.5–1.5s — barely noticeable
50,000 tokens	Long thread with a large file attachment	1–3s — noticeable
100,000 tokens	Multiple large PDFs + extended conversation	2–6s — adds to thinking delay
500,000 tokens	Large codebase or multiple book-length documents	8–20s — significant
900,000+ tokens	Near max context (Sonnet/Opus support up to 1M)	15–40s before first token

These prefill times add to any thinking time. If you are using extended thinking on Sonnet in a 100K token context, you might see prefill (3–5s) + thinking (20–40s) + decode start = 25–45 seconds before the first word appears. This is not a bug. It is the model doing real work on a large amount of information.

Prompt caching dramatically reduces prefill cost: If you store frequently used documents in a Claude Project, they are cached. Cached tokens do not require full reprocessing during the prefill phase — they are retrieved from a cache that costs approximately 90% less compute. A 100K token cached context takes nearly the same time to prefill as a 10K token uncached one.

Cause 4: Infrastructure Load and Active Incidents

Claude runs on shared cloud GPU infrastructure. Response latency varies with demand — US business hours drive the highest load, and active incidents further degrade performance.

Current status — June 24, 2026: Elevated error rate on Claude Opus 4.8, under active investigation. Recent incidents on June 22–23 caused widespread slowdowns and errors. If Claude is responding more slowly than usual today, check status.claude.com before troubleshooting your setup.

Typical latency increase during peak hours (US 9am–6pm Eastern): 1.5–2× slower TTFT compared to off-peak. For Opus 4.8 with extended thinking, this can mean the difference between a 20-second and a 40-second wait on the same query.

Claude vs competitor latency

Model (Standard Mode)	Avg TTFT — Peak Hours	Avg TTFT — Off Peak	Streaming Speed
Claude Haiku 4.5 (thinking off)	1–2s	<1s	Very fast
Claude Sonnet 4.6 (thinking off)	2–4s	1–2s	Fast
Claude Opus 4.8 (adaptive thinking)	8–20s	5–12s	Moderate
GPT-4o (ChatGPT Plus)	2–4s	1–2s	Fast
GPT-4o mini (ChatGPT fallback)	1–2s	<1s	Very fast
Gemini 1.5 Pro (Gemini Advanced)	2–5s	1–3s	Moderate
Claude Sonnet 4.6 (extended thinking on)	20–60s	15–45s	Fast (after thinking)

The table makes clear that standard Sonnet 4.6 is not slow relative to competitors. The perception of Claude being slow comes specifically from users who are using Opus 4.8 or have extended thinking enabled on Sonnet.

Cause 5: Claude Code and Agentic Tool Use

Claude Code and Research mode make multiple sequential model calls as part of their operation. Claude Code might run a terminal command, read the output, reason about it, then generate its next step — each step being its own inference pass. Research mode runs multiple web searches plus synthesis calls.

This is not Claude being slow. It is executing 3–10 model calls where a standard response uses 1. The total wall-clock time for a Research session or a complex Claude Code task is the sum of all individual inference steps. There is no way to make multi-step agentic tasks as fast as single-step responses — you are trading speed for capability.

6 Fixes Ranked by Impact

Turn off extended thinking — immediate and dramatic

This eliminates the single biggest source of pre-response delay. On Sonnet 4.6 and Haiku 4.5, the thinking toggle is in conversation controls. Switching from thinking-on to thinking-off changes TTFT from 20–60 seconds to 1–3 seconds for the same message. For most tasks (email, editing, quick Q&A, translation, summaries), the quality difference is negligible. Reserve thinking for complex code architecture, multi-step reasoning, and deep analysis.

Switch from Opus 4.8 to Sonnet 4.6 (thinking off)

Claude Opus 4.8 always uses adaptive thinking — there is no way to remove the thinking delay on Opus. If you need fast responses and your tasks do not require Opus-level reasoning depth, switch to Sonnet 4.6 with extended thinking off. TTFT drops from 5–20 seconds to 1–2 seconds. Sonnet 4.6 handles the vast majority of professional tasks — writing, editing, code, analysis — at a quality level most users cannot distinguish from Opus in blind testing.

Switch to Haiku 4.5 for high-volume quick tasks

Haiku 4.5 with extended thinking off produces the fastest responses of any Claude model — typically under 1 second TTFT in normal conditions. For tasks that do not require deep reasoning — reformatting text, short translations, quick lookups, light editing, generating structured data from templates — Haiku delivers fast, high-quality results at a fraction of the compute cost. Use the model picker in claude.ai to switch per conversation.

Start fresh conversations — reduces prefill overhead

When a conversation thread grows long (especially with file attachments), the prefill phase before each new response takes longer. Starting a new conversation with a one-paragraph recap of context resets the clock — prefill goes from potentially 5–10 seconds back down to under a second. You lose seamless continuity but gain faster, more efficient responses for the rest of the session. This is the second-most impactful fix after disabling thinking.

Move repeated context to Projects (prompt caching)

Documents stored in a Claude Project are cached. Cached context does not require full reprocessing during prefill — it is retrieved at ~10% of the original compute cost. If you regularly work with the same large document, codebase, or system prompt, Project storage can cut your prefill time by 80–90% on those repeated elements. The first call after caching takes normal time; every subsequent call in that session is dramatically faster.

Use off-peak hours or the API for latency-sensitive work

Response latency peaks during US business hours (9am–6pm Eastern). If you are not time-constrained, early morning or late evening sessions consistently show 30–50% lower TTFT. For latency-sensitive production applications, the Anthropic API typically shows lower and more consistent latency than the claude.ai web UI, because API calls bypass session management overhead. Developers building on Claude should always use the API, not scrape the UI.

Diagnosing What Is Actually Happening

What You See	Cause	Diagnosis Confidence	Fix
Thinking spinner for 10–90s, then normal streaming	Extended thinking or adaptive thinking active	Very high	Turn off thinking or switch to Sonnet/Haiku
Short spinner, then slow streaming that starts fine but decelerates	Server load or rate limiting mid-response	High	Check status.claude.com; try off-peak
Normal response speed on first messages, slows after message 20–30	Context length prefill scaling	High	Start fresh conversation
Every Opus 4.8 response takes 10–20s even for simple questions	Adaptive thinking always on — expected	Certain	Switch to Sonnet 4.6 for speed
Spinner runs indefinitely, no response appears	Server error or connection issue — not normal	High	Check status.claude.com; refresh page
Response starts then cuts off mid-sentence	Server-side error during generation	High	Refresh and retry; check status.claude.com
Everything is slower than usual today	Active infrastructure incident or peak load	Medium	Check status.claude.com first

Why Claude Can Feel Slower Than ChatGPT Even Though It Is Not

This perception comes from three factors:

1. Thinking is visible as delay. ChatGPT's default GPT-4o mode does not have extended thinking. ChatGPT o1 and o3 do have thinking modes with similar delays — but most Plus users are using GPT-4o without reasoning modes. Claude's more prominent placement of thinking-capable models in the default experience means users hit the thinking delay more often.

2. Claude Opus is the advertised flagship. Many users who hear "use the best model" try Opus 4.8, experience the 5–15 second TTFT from adaptive thinking, and conclude Claude is slow. Sonnet 4.6 with thinking off is not slow.

3. Longer, more structured responses. Claude tends to write more comprehensive, structured responses than ChatGPT by default. A 1,500-word structured answer takes longer to stream completely than a 300-word conversational reply, even at identical token-per-second rates. If you want faster-feeling responses, ask Claude to be more concise.

Frequently Asked Questions

Why does Claude pause for so long before responding? +

The most common cause is extended thinking mode being active. When enabled, Claude generates reasoning tokens internally before writing its visible response — this thinking phase takes 10–90 seconds depending on task complexity and model. The pause is not Claude being slow; it is working through the problem. Turn off extended thinking in conversation settings for faster responses on routine tasks.

How do I make Claude respond faster? +

Six fixes in order of impact: (1) Turn off extended thinking for non-complex tasks — removes the 10–90s thinking pause immediately; (2) Switch from Opus 4.8 to Sonnet 4.6 with thinking off — Opus always uses adaptive thinking you cannot disable; (3) Switch to Haiku 4.5 for quick tasks — fastest model; (4) Start a fresh conversation — long threads take longer to prefill; (5) Use off-peak hours; (6) Use the API directly for lower overhead than the web UI.

Is Claude slower than ChatGPT? +

Standard Sonnet 4.6 with thinking off begins streaming in 1–2 seconds — comparable to GPT-4o. Haiku 4.5 is typically faster than GPT-4o mini. The slowness users report is almost always from extended thinking on Sonnet/Haiku, or from using Opus 4.8 which always uses adaptive thinking. Claude is not inherently slower than ChatGPT, but its reasoning-capable models have a pre-response thinking delay that standard ChatGPT GPT-4o does not.

What is the difference between extended thinking and adaptive thinking? +

Extended thinking is an optional feature on Sonnet 4.6 and Haiku 4.5 — you toggle it on or off. Adaptive thinking is built into Opus 4.8 and cannot be turned off. Both involve generating internal reasoning tokens before the visible response. Adaptive thinking on Opus scales proportionally to task complexity — simple questions trigger lighter thinking passes. Extended thinking on Sonnet/Haiku runs regardless of task complexity when enabled.

Does Claude slow down in long conversations? +

Yes, measurably. Claude reprocesses the full conversation history during the prefill phase before each new response. The prefill time scales roughly linearly with context length. A 100-message conversation with file attachments can take 5–10 seconds just to prefill before thinking even begins. Starting a fresh conversation with a brief recap is the most effective fix for slowness in long threads.

🗺

Claude vs ChatGPT vs Gemini — Speed and Model Comparison

Latency benchmarks, model specs, pricing, and use-case verdicts — June 2026.

ChatGPT vs Claude — Full Comparison →

More in This Series

Why does Claude run out so fast? → Why does Claude have a limit but ChatGPT doesn't? → Why does Claude take so long to load? → Why does Claude say "this organization has been disabled"? → Why does Claude need my phone number? →