✓ Fact-checked June 24, 2026
Sources: Anthropic Docs · status.claude.com · platform.anthropic.com · Artificial Analysis benchmarks
Why Does Claude Take So Long to Respond?
You send a message and Claude sits there. The spinner runs. 20 seconds. 40 seconds. Nothing. Here is the complete technical breakdown of what is actually happening during that wait — every cause, with real latency numbers per model, and fixes ranked by impact.
Direct Answer
The most common cause is extended thinking mode being active. When enabled, Claude generates reasoning tokens internally before writing its visible response — this thinking phase takes 10–90 seconds on complex tasks. It is not broken; it is working. Standard Claude Sonnet 4.6 with thinking off begins streaming in 1–3 seconds, comparable to ChatGPT. Claude Opus 4.8 always uses adaptive thinking and cannot be made as fast as Sonnet. Turn off extended thinking for routine tasks and switch to Haiku 4.5 for maximum speed.
Claude Limits — Complete Guide Series
The Technical Pipeline: What Happens Between Send and Stream
Every Claude response goes through two distinct phases before you see any text. Understanding these phases explains most of what users experience as "slowness":
Phase 1
Prefill — processing your context
Before Claude generates any output, it must process all input tokens: your message, the entire conversation history, any uploaded documents, and the system prompt. This is called the prefill phase. It scales linearly with context length. A fresh conversation with one message prefills in milliseconds. A 40-message thread with two uploaded PDFs can take 3–8 seconds just to prefill. The prefill phase produces no visible output — the spinner runs silently.
Phase 2
Thinking tokens — internal reasoning (if enabled)
If extended thinking or adaptive thinking is active, Claude generates reasoning tokens after prefill but before its visible response. These tokens are not shown to you. They are Claude working through the problem — planning, considering alternatives, checking its reasoning. This phase adds 10–90 seconds before the first visible word appears. The longer and more complex the task, the more thinking tokens are generated. This is the dominant cause of long waits in the majority of user reports.
Phase 3
Decode — generating visible output
Once thinking is done, Claude begins generating visible response tokens. This is what you see streaming word by word. Decode speed depends on model (Haiku is fastest, Opus is slowest per token), current server load, and response length. Claude Haiku 4.5 generates tokens significantly faster than Opus 4.8, so even at identical task complexity, Haiku produces visible output faster after the first token appears.
Time-to-first-token (TTFT) is the metric that determines whether Claude feels fast or slow. It is the sum of prefill time + thinking time. The actual streaming speed after the first token appears is usually fine — users experience slowness as the wait before any text shows, not as slow streaming speed.
Cause 1: Extended Thinking Mode (Responsible for ~70% of complaints)
Extended thinking is available on Claude Sonnet 4.6 and Haiku 4.5 as an opt-in feature. When active, it dramatically improves output quality on complex tasks — but adds a significant thinking delay before any visible response.
| Task Type | Thinking On — Time to First Visible Word | Thinking Off — Time to First Visible Word | Quality Difference | Recommendation |
| Simple question or fact lookup | 10–25s | 1–2s | Negligible | Turn off |
| Email drafting | 10–30s | 1–2s | Minimal | Turn off |
| Summarising a document | 15–40s | 1–3s | Small — occasionally better structure | Usually off |
| Translation | 12–25s | 1–2s | Negligible for most languages | Turn off |
| Code debugging (small function) | 15–40s | 1–3s | Moderate — sometimes catches more edge cases | Situational |
| Complex code architecture review | 30–90s | 2–5s | Significant — reasoning quality is markedly better | Keep on |
| Multi-step logical reasoning problem | 40–90s | 2–5s | Significant — fewer reasoning errors | Keep on |
| Research synthesis across long documents | 30–70s | 2–5s | Moderate to significant | Keep on for deep analysis |
How to toggle extended thinking: In claude.ai, look for the thinking toggle in the conversation controls below the message box. On Haiku 4.5 and Sonnet 4.6, this is a toggle you control per conversation. On Claude Opus 4.8, you cannot turn it off — see Cause 2.
Cause 2: Using Opus 4.8 or a High-Tier Model
Claude Opus 4.8 uses adaptive thinking — a reasoning mode that is always active and cannot be disabled. Unlike extended thinking on Sonnet/Haiku (which you control), adaptive thinking on Opus engages automatically based on task complexity. Simple questions get lighter thinking. Complex tasks trigger longer thinking passes.
The practical result: Opus 4.8 will never match Sonnet 4.6 or Haiku 4.5 for time-to-first-token. If speed is your priority and you do not need Opus-level capability, switch to Sonnet 4.6 with extended thinking off. The latency difference is dramatic — from 5–15 seconds (Opus) to 1–2 seconds (Sonnet, thinking off).
| Model | Thinking Type | Can Disable | TTFT (Standard) | TTFT (Complex Task) | Output Speed | Speed Rank |
| Claude Haiku 4.5 | Extended (optional) | Yes | ~1s (off) | 15–45s (on) | Fastest | 1st |
| Claude Sonnet 4.6 | Extended (optional) | Yes | ~1–2s (off) | 20–60s (on) | Fast | 2nd |
| Claude Opus 4.8 | Adaptive (always on) | No | ~5–15s | 30–90s | Moderate | 4th |
Output speed vs time-to-first-token: These are different. Claude Haiku generates visible tokens fastest once streaming starts. Opus generates them slower but the total response length matters too — Opus tends to write longer, more detailed responses. For the same task, Haiku may feel faster start-to-finish even on moderate tasks despite streaming more words per second.
Cause 3: Long Conversation Threads and Large Context
This is the silent performance killer that compounds over time in long conversations. Claude processes the entire conversation context (all previous messages plus any uploaded files) during the prefill phase before each new reply.
How prefill time scales with context
Prefill time is roughly linear with token count. Here is what this looks like in practice:
| Context Size | What It Looks Like | Approximate Prefill Time |
| 5,000 tokens | Fresh conversation, a few short messages | <0.5s — imperceptible |
| 20,000 tokens | Long conversation (15–20 messages) or short document upload | 0.5–1.5s — barely noticeable |
| 50,000 tokens | Long thread with a large file attachment | 1–3s — noticeable |
| 100,000 tokens | Multiple large PDFs + extended conversation | 2–6s — adds to thinking delay |
| 500,000 tokens | Large codebase or multiple book-length documents | 8–20s — significant |
| 900,000+ tokens | Near max context (Sonnet/Opus support up to 1M) | 15–40s before first token |
These prefill times add to any thinking time. If you are using extended thinking on Sonnet in a 100K token context, you might see prefill (3–5s) + thinking (20–40s) + decode start = 25–45 seconds before the first word appears. This is not a bug. It is the model doing real work on a large amount of information.
Prompt caching dramatically reduces prefill cost: If you store frequently used documents in a Claude Project, they are cached. Cached tokens do not require full reprocessing during the prefill phase — they are retrieved from a cache that costs approximately 90% less compute. A 100K token cached context takes nearly the same time to prefill as a 10K token uncached one.
Cause 4: Infrastructure Load and Active Incidents
Claude runs on shared cloud GPU infrastructure. Response latency varies with demand — US business hours drive the highest load, and active incidents further degrade performance.
Current status — June 24, 2026: Elevated error rate on Claude Opus 4.8, under active investigation. Recent incidents on June 22–23 caused widespread slowdowns and errors. If Claude is responding more slowly than usual today, check
status.claude.com before troubleshooting your setup.
Typical latency increase during peak hours (US 9am–6pm Eastern): 1.5–2× slower TTFT compared to off-peak. For Opus 4.8 with extended thinking, this can mean the difference between a 20-second and a 40-second wait on the same query.
Claude vs competitor latency
| Model (Standard Mode) | Avg TTFT — Peak Hours | Avg TTFT — Off Peak | Streaming Speed |
| Claude Haiku 4.5 (thinking off) | 1–2s | <1s | Very fast |
| Claude Sonnet 4.6 (thinking off) | 2–4s | 1–2s | Fast |
| Claude Opus 4.8 (adaptive thinking) | 8–20s | 5–12s | Moderate |
| GPT-4o (ChatGPT Plus) | 2–4s | 1–2s | Fast |
| GPT-4o mini (ChatGPT fallback) | 1–2s | <1s | Very fast |
| Gemini 1.5 Pro (Gemini Advanced) | 2–5s | 1–3s | Moderate |
| Claude Sonnet 4.6 (extended thinking on) | 20–60s | 15–45s | Fast (after thinking) |
The table makes clear that standard Sonnet 4.6 is not slow relative to competitors. The perception of Claude being slow comes specifically from users who are using Opus 4.8 or have extended thinking enabled on Sonnet.
Cause 5: Claude Code and Agentic Tool Use
Claude Code and Research mode make multiple sequential model calls as part of their operation. Claude Code might run a terminal command, read the output, reason about it, then generate its next step — each step being its own inference pass. Research mode runs multiple web searches plus synthesis calls.
This is not Claude being slow. It is executing 3–10 model calls where a standard response uses 1. The total wall-clock time for a Research session or a complex Claude Code task is the sum of all individual inference steps. There is no way to make multi-step agentic tasks as fast as single-step responses — you are trading speed for capability.
6 Fixes Ranked by Impact
1
Turn off extended thinking — immediate and dramatic
This eliminates the single biggest source of pre-response delay. On Sonnet 4.6 and Haiku 4.5, the thinking toggle is in conversation controls. Switching from thinking-on to thinking-off changes TTFT from 20–60 seconds to 1–3 seconds for the same message. For most tasks (email, editing, quick Q&A, translation, summaries), the quality difference is negligible. Reserve thinking for complex code architecture, multi-step reasoning, and deep analysis.
2
Switch from Opus 4.8 to Sonnet 4.6 (thinking off)
Claude Opus 4.8 always uses adaptive thinking — there is no way to remove the thinking delay on Opus. If you need fast responses and your tasks do not require Opus-level reasoning depth, switch to Sonnet 4.6 with extended thinking off. TTFT drops from 5–20 seconds to 1–2 seconds. Sonnet 4.6 handles the vast majority of professional tasks — writing, editing, code, analysis — at a quality level most users cannot distinguish from Opus in blind testing.
3
Switch to Haiku 4.5 for high-volume quick tasks
Haiku 4.5 with extended thinking off produces the fastest responses of any Claude model — typically under 1 second TTFT in normal conditions. For tasks that do not require deep reasoning — reformatting text, short translations, quick lookups, light editing, generating structured data from templates — Haiku delivers fast, high-quality results at a fraction of the compute cost. Use the model picker in claude.ai to switch per conversation.
4
Start fresh conversations — reduces prefill overhead
When a conversation thread grows long (especially with file attachments), the prefill phase before each new response takes longer. Starting a new conversation with a one-paragraph recap of context resets the clock — prefill goes from potentially 5–10 seconds back down to under a second. You lose seamless continuity but gain faster, more efficient responses for the rest of the session. This is the second-most impactful fix after disabling thinking.
5
Move repeated context to Projects (prompt caching)
Documents stored in a Claude Project are cached. Cached context does not require full reprocessing during prefill — it is retrieved at ~10% of the original compute cost. If you regularly work with the same large document, codebase, or system prompt, Project storage can cut your prefill time by 80–90% on those repeated elements. The first call after caching takes normal time; every subsequent call in that session is dramatically faster.
6
Use off-peak hours or the API for latency-sensitive work
Response latency peaks during US business hours (9am–6pm Eastern). If you are not time-constrained, early morning or late evening sessions consistently show 30–50% lower TTFT. For latency-sensitive production applications, the Anthropic API typically shows lower and more consistent latency than the claude.ai web UI, because API calls bypass session management overhead. Developers building on Claude should always use the API, not scrape the UI.
Diagnosing What Is Actually Happening
| What You See | Cause | Diagnosis Confidence | Fix |
| Thinking spinner for 10–90s, then normal streaming | Extended thinking or adaptive thinking active | Very high | Turn off thinking or switch to Sonnet/Haiku |
| Short spinner, then slow streaming that starts fine but decelerates | Server load or rate limiting mid-response | High | Check status.claude.com; try off-peak |
| Normal response speed on first messages, slows after message 20–30 | Context length prefill scaling | High | Start fresh conversation |
| Every Opus 4.8 response takes 10–20s even for simple questions | Adaptive thinking always on — expected | Certain | Switch to Sonnet 4.6 for speed |
| Spinner runs indefinitely, no response appears | Server error or connection issue — not normal | High | Check status.claude.com; refresh page |
| Response starts then cuts off mid-sentence | Server-side error during generation | High | Refresh and retry; check status.claude.com |
| Everything is slower than usual today | Active infrastructure incident or peak load | Medium | Check status.claude.com first |
Why Claude Can Feel Slower Than ChatGPT Even Though It Is Not
This perception comes from three factors:
1. Thinking is visible as delay. ChatGPT's default GPT-4o mode does not have extended thinking. ChatGPT o1 and o3 do have thinking modes with similar delays — but most Plus users are using GPT-4o without reasoning modes. Claude's more prominent placement of thinking-capable models in the default experience means users hit the thinking delay more often.
2. Claude Opus is the advertised flagship. Many users who hear "use the best model" try Opus 4.8, experience the 5–15 second TTFT from adaptive thinking, and conclude Claude is slow. Sonnet 4.6 with thinking off is not slow.
3. Longer, more structured responses. Claude tends to write more comprehensive, structured responses than ChatGPT by default. A 1,500-word structured answer takes longer to stream completely than a 300-word conversational reply, even at identical token-per-second rates. If you want faster-feeling responses, ask Claude to be more concise.
Frequently Asked Questions
Why does Claude pause for so long before responding? +
The most common cause is extended thinking mode being active. When enabled, Claude generates reasoning tokens internally before writing its visible response — this thinking phase takes 10–90 seconds depending on task complexity and model. The pause is not Claude being slow; it is working through the problem. Turn off extended thinking in conversation settings for faster responses on routine tasks.
How do I make Claude respond faster? +
Six fixes in order of impact: (1) Turn off extended thinking for non-complex tasks — removes the 10–90s thinking pause immediately; (2) Switch from Opus 4.8 to Sonnet 4.6 with thinking off — Opus always uses adaptive thinking you cannot disable; (3) Switch to Haiku 4.5 for quick tasks — fastest model; (4) Start a fresh conversation — long threads take longer to prefill; (5) Use off-peak hours; (6) Use the API directly for lower overhead than the web UI.
Is Claude slower than ChatGPT? +
Standard Sonnet 4.6 with thinking off begins streaming in 1–2 seconds — comparable to GPT-4o. Haiku 4.5 is typically faster than GPT-4o mini. The slowness users report is almost always from extended thinking on Sonnet/Haiku, or from using Opus 4.8 which always uses adaptive thinking. Claude is not inherently slower than ChatGPT, but its reasoning-capable models have a pre-response thinking delay that standard ChatGPT GPT-4o does not.
What is the difference between extended thinking and adaptive thinking? +
Extended thinking is an optional feature on Sonnet 4.6 and Haiku 4.5 — you toggle it on or off. Adaptive thinking is built into Opus 4.8 and cannot be turned off. Both involve generating internal reasoning tokens before the visible response. Adaptive thinking on Opus scales proportionally to task complexity — simple questions trigger lighter thinking passes. Extended thinking on Sonnet/Haiku runs regardless of task complexity when enabled.
Does Claude slow down in long conversations? +
Yes, measurably. Claude reprocesses the full conversation history during the prefill phase before each new response. The prefill time scales roughly linearly with context length. A 100-message conversation with file attachments can take 5–10 seconds just to prefill before thinking even begins. Starting a fresh conversation with a brief recap is the most effective fix for slowness in long threads.
More in This Series