Claude 블로그 되짚어보기 #98 — 1M Context GA, RAG 시대의 끝 (2026)

panicdev·2026년 5월 3일

AI Anthropic Claude ContextWindow LLM Opus46 Sonnet46 블로그리뷰

원문 정보

제목: 1M context is now generally available for Opus 4.6 and Sonnet 4.6
링크: claude.com/blog/1m-context-ga
발행: 2026년 3월 13일
카테고리: Claude Platform / Claude Code (Agents, Coding)

글의 요지

1M token context window가 Claude Opus 4.6, Sonnet 4.6에 GA (Generally Available). Long-context premium 없음 — 9K 요청과 900K 요청 같은 per-token 가격. MRCR v2 1M variant 점수: Opus 4.6 78.3%, Gemini 3 Pro 대비 ~3×, 이전 Claude 대비 ~4×. Claude Code Max/Team/Enterprise 디폴트 Opus, Pro/Sonnet은 /extra-usage opt-in.

주요 변화 — 가격 동일

본문 인용 (Joe Njenga, Medium):

"There's no long-context premium. A 900K-token request costs the same per-token rate as a 9K one. For Opus 4.6, you're looking at $5 input and $25 output per million tokens. Sonnet 4.6 comes in at $3 input and $15 output per million tokens. Standard pricing across the full window."

Opus 4.6 1M context = beta
200K 초과 시 premium ($10/$37.50)
별도 가격

이제:

GA
전체 window 단일 가격
Opus: $5 / $25 per MTok
Sonnet: $3 / $15 per MTok

MRCR v2 벤치마크

본문 강조 (kara0zieminski):

"Claude Opus 4.6 scores nearly 3x higher than Gemini 3 Pro and over 4x higher than the previous best Claude model. Opus 4.6 finds roughly 4x more facts than the previous best Claude. And 3x more than Gemini at the same context length."

모델	1M MRCR v2 점수
Opus 4.6	78.3%
Sonnet 4.5 (이전)	18.5%
Gemini 3 Pro	~25%
다른 Claude	~20%

이게 "context rot" 의 정량 측정:

단순 "1M 가능" X
"1M에서 정확" O

1M Token = 무엇

본문 (Dev.to):

~750,000 단어
~10권 소설
거대 codebase
수천 페이지 계약서
long-running agent 전체 trace

Claude Code 디폴트

Boris Cherny (Claude Code 책임자):

"Opus 4.6 1M is now the default Opus model for Claude Code users on Max, Team, and Enterprise plans. Pro and Sonnet users can opt in with /extra-usage."

(Max/Team/Enterprise = 자동, Pro/Sonnet = /extra-usage opt-in)

자동 활성화:

Opus 사용자 (Max/Team/Enterprise) = 1M 디폴트
추가 비용 X
설정 불필요

Auto-Compact 커스터마이즈

CLAUDE_CODE_AUTO_COMPACT_WINDOW 환경 변수:

auto-compact 임계값 조정
사용 패턴별 최적화

사용 사례

본문 (Dev.to)이 식별:

Enterprise 팀 (대규모 proprietary 문서)
Legal tech (계약 분석, discovery)
Healthcare informatics (임상 노트)
Financial services (규제 filing, earnings)
AI-native startups (RAG 의존도 줄이기)

한계

본문 (kara0zieminski):

Cost adds up fast: 900K 세션 = $4.50 input만
Token trap: production loop에서 위험
"Dumb zone": facts 찾지만 이전 결정 무시 (HN 사용자)

2026년에 다시 읽으며 — 내가 본 것

1. "No Long-Context Premium" 의 가격 전쟁

이 글의 가장 중요한 변화 — 가격 모델 단순화.

이전 가격:

200K 까지: 표준
200K-1M: premium (2× 가격)
사용자 인지 부담

새 가격:

1M까지: 단일 가격
결정 단순
"마음껏 사용"

비교 — 다른 회사:

OpenAI (GPT-4o): 128K, premium pricing
Google Gemini: 1M 가능, 그러나 비싼 plan ($200/월)
Anthropic: 1M, 표준 가격 ← 가장 공격적

이게 가격 전쟁 시그널:

비싸지 않다는 메시지
사용자 = decision fatigue X
enterprise 신뢰

2. "Context Rot" 의 진짜 측정

Opus 4.6 출시 (#85) 글에서 언급:

"context rot" = 토큰 늘면 성능 ↓

이 글의 의미:

1M = 가능
1M에서 78.3% 정확 = 진짜 사용 가능

비교 — Gemini 1M (오랫동안 가능):

거대 window
그러나 정확도 ↓
"숫자만 큰"

Anthropic 차별:

모델 능력 + window 크기
둘 다 진화
"진짜 사용 가능 1M"

이 "big window + accuracy" 가 실제 가치.

3. "RAG 의존도 줄이기" 의 architecture 변화

Dev.to 인용:

"AI-native startups who want to simplify their architecture by reducing dependence on complex RAG pipelines."

(RAG 파이프라인 의존도 줄이려는 startup)

전통 architecture:

큰 문서 → vector DB (chunk)
query → 관련 chunk 검색
LLM에 chunk 전달
"Retrieval Augmented Generation (RAG)"

새 architecture:

큰 문서 → 직접 컨텍스트
query
LLM이 모두 봄
"Long Context Generation"

RAG의 한계:

chunk 경계 정보 손실
semantic search 정확도
복잡 인프라 (vector DB)

Long context의 우위:

모든 정보 보임
semantic 자동
단순 architecture

비교:

RAG = "색인된 책에서 페이지"
Long context = "책 전체 읽기"

이 변화가 AI architecture의 단순화 가속이다.

4. "$4.50 per 900K Session" 의 비용 계산

비용 분석:

900K input × $5/MTok = $4.50
- output 비용
회당 ~$10

이게 "experiment vs production" 분리:

1회: $10 = OK
100회/일: $1,000/일 = $30K/월
production 위험

해결:

Sonnet 4.6 ($3/MTok) = 60%
Prompt caching (90% 할인 가능)
Hybrid (큰 컨텍스트 한 번 + 캐시)

이 "비용 인지" 가 production AI engineer의 새 스킬:

API 가격 이해
캐시 전략
모델 선택

5. "Opus 4.7 후속" 의 cadence

타임라인:

2026년 2월 5일: Opus 4.6 (#85)
2026년 3월 13일: 이 글 (1M GA)
2026년 4월 9일?: Cowork GA + Opus 4.7
2026년 4월 17일: Claude Design

각 ~5-6주 간격. Anthropic 출시 cadence = 가속.

비교:

OpenAI: GPT-5 (수개월 간격)
Google: Gemini 3 (분기)
Anthropic: 6주 (가장 빠름)

이 cadence가 mindshare 전쟁의 정확한 동력:

매주 새 발표
Twitter, HN 점령
개발자 attention

6. "Coding Use Case First" 의 시장 우선

본문 강조 — 첫 use case가 거대 codebase:

전체 codebase 한 번에
전체 trace
수백 파일 동시 분석

이게 "coding-first" 시장 전략:

가장 paying 시장 (개발자)
가장 즉시 ROI
가장 vocal 사용자

비교 — 다른 모델:

Gemini 1M: 학생 + 일반 사용자
GPT 1M: enterprise 다양
Anthropic 1M: coding 우선

이 "개발자 우선" 이 "developer mindshare = 시장 점령" 의 정석.

7. "Pro vs Max 분화" 의 가격 정책

Plan별 1M 액세스:

Free/Pro: opt-in (/extra-usage)
Max/Team/Enterprise: 자동

이게 price discrimination의 정확한 사례:

가벼운 사용자: Pro = 작은 contexts
진지 사용자: Max+ = 1M 자동

비교 — Gemini:

Ultra ($200/월) = 1M
작은 plan = X
명확 분화

Anthropic 더 fluid:

Pro도 가능 (opt-in)
Max는 자동
사용자 결정

8. "Premier League 한 번에 분석" 의 use case

Boris Cherny 인용:

"Great for: analyzing an entire Premier League season in one shot, processing a full codebase."

(Premier League 시즌 전체, 전체 codebase = 1M 적합)

흥미로운 use case들:

스포츠 분석: 시즌 전체 통계
법무: 모든 계약서
연구: 수백 논문
finance: 분기 모든 filings

각 use case가 "이전 불가능" → "이제 가능".

이게 AI 활용 영역 확장의 시그널이다. 작업 자체가 "가능" 영역으로 진입.

9. "Context Pollution" 의 미묘한 도전

HN 댓글:

"Context pollution is real. Curate what goes in. ... Loading it with every MCP endpoint and skill you have? That's noise, not leverage. More options = harder decisions for the model. It still has to pick what to use and when."

(Context pollution은 실제. 들어가는 것 큐레이트. 모든 MCP endpoint + skill 로드 = noise. 더 많은 옵션 = 모델 결정 어려움)

이 nuance가 "big context = always better" 신화 깨기:

큰 컨텍스트 = 가능
그러나 신중한 사용 필수
Skills (#75 글)의 progressive disclosure 패턴

향후 best practice:

1M 가능
그러나 "필요한 것만"
"context engineering" 의 정점

10. "Independent Verification Pending" 의 솔직 nuance

kara0zieminski 인용:

"The 78.3% figure comes from Anthropic's own announcement. Independent verification is still pending."

(78.3%는 Anthropic 자체 발표. 독립 검증 pending)

이게 "enterprise 결정의 정직 평가" :

Anthropic 마케팅 ↑
그러나 독립 검증 X
사용자가 자기 평가 필요

비교 — 다른 산업:

학술 논문: peer review
의료: 임상 시험
AI 벤치마크: ?

AI 산업의 도전:

빠른 속도
독립 검증 늦음
사용자 = 자기 평가

이 "trust but verify" 가 AI 시대 일하기 방식.

마무리

이 글은 "context window 확장" 같지만, 실제로는 AI architecture의 본질 변화다.

No Long-Context Premium: 가격 단순화
Context Rot 78.3%: 정량 측정
RAG 의존도 ↓: architecture 단순화
$4.50/900K: 비용 인지
6주 cadence: 출시 가속
Coding First: 시장 전략
Pro vs Max: price discrimination
Premier League 사례: use case 확장
Context Pollution: 미묘한 도전
Verification Pending: trust but verify

2026년 3월 13일 시점은 "AI = 작은 chunk 기반" 시대가 끝난 시점이다. AI = whole document/codebase의 정착.

흥미로운 건 이 변화가 AI engineer 직무 변화 라는 점이다:

이전: RAG 파이프라인 빌드 (vector DB, embeddings, chunking)
이제: long context + skills + caching
"AI infrastructure engineer" 의 진화

비교:

2022: prompt engineer
2024: RAG engineer
2026: context engineer + skills engineer

각 wave가 새 기술 스택. 직업 시장 가속 변화.

다음 글 (#99): CSV #19 — Code with Claude San Francisco/London/Tokyo. AI 개발자 글로벌 컨퍼런스. 1M context 같은 능력 + 글로벌 커뮤니티. AI 시대 "Web Summit" 형성이 보인다.

panicdev

이전 포스트

Claude 블로그 되짚어보기 #97 — Inline Visuals, Generative UI 시대 (2026)

다음 포스트