🔧 Core idea
User Input
↓
Tokenizer
↓
Embedding + Positional Encoding
↓
┌─────────────────────────────┐
│ Transformer Stack (N) │
│ - Self Attention │
│ - MLP (Dense / MoE hybrid) │
└─────────────────────────────┘
↓
Logits → Sampling
↓
┌─────────────────────────────┐
│ Agent Layer (critical) │
│ - Tool calling │
│ - Function execution │
│ - Memory (short-term) │
│ - Self-reflection loop │
└─────────────────────────────┘
↓
Final Output
⚙️ Pseudo Code (simplified)
def GPT_Inference(input_text):
tokens = tokenize(input_text)
x = embed(tokens)
# Transformer forward
for layer in transformer_layers:
x = layer.self_attention(x)
x = layer.mlp(x)
logits = lm_head(x)
# sampling
output = sample(logits)
# --- Agent loop ---
while needs_tool(output):
tool_result = call_tool(output)
# re-inject context
tokens = tokens + tokenize(tool_result)
x = embed(tokens)
for layer in transformer_layers:
x = layer(x)
output = sample(lm_head(x))
return output
👉 핵심:
Multi-Modal Input (Text / Image / Audio)
↓
Unified Tokenizer
↓
Modality Encoder (Vision / Audio shared space)
↓
┌──────────────────────────────────────┐
│ Sparse Transformer (MoE) │
│ │
│ Router → Expert 선택 │
│ ↓ │
│ Expert FFN 실행 (Top-k) │
│ │
│ + Cross-modal Attention │
└──────────────────────────────────────┘
↓
Planner / Tool Reasoner (내장)
↓
Tool Execution (API / Search / Code)
↓
Response Decoder
def Gemini_Inference(multimodal_input):
tokens = multimodal_tokenize(multimodal_input)
x = embed(tokens)
for layer in moe_transformer_layers:
# Router decides experts
expert_ids = router(x)
# Sparse activation
expert_outputs = []
for e in expert_ids:
expert_outputs.append(experts[e](x))
x = combine(expert_outputs)
x = self_attention(x)
x = cross_modal_attention(x)
# --- Built-in planning ---
plan = internal_planner(x)
if plan.requires_tool:
tool_result = execute_tool(plan)
x = integrate(x, tool_result)
return decode(x)
👉 핵심:
LLM + Planner + Tools = 하나의 모델
User Input
↓
Tokenizer
↓
Embedding
↓
┌─────────────────────────────┐
│ Transformer Stack │
│ (Long Context optimized) │
└─────────────────────────────┘
↓
Initial Output
↓
┌─────────────────────────────┐
│ Alignment Layer │
│ (Constitutional AI) │
│ - Rule checking │
│ - Self critique │
│ - Revision loop │
└─────────────────────────────┘
↓
Final Output
def Claude_Inference(input_text):
tokens = tokenize(input_text)
x = embed(tokens)
for layer in transformer_layers:
x = layer(x)
draft = decode(x)
# --- Constitutional AI loop ---
critique = evaluate_with_rules(draft)
if critique.has_issues:
revised = revise(draft, critique)
return revised
return draft
👉 핵심:
LLM + Self-critique system
| 요소 | GPT | Gemini | Claude |
|---|---|---|---|
| Transformer | Dense 중심 | MoE 중심 | Dense |
| MoE | 일부 | 핵심 | 거의 없음 |
| Multimodal | 통합 (후기) | native | 제한적 |
| Tool 사용 | 외부 loop | 내부 통합 | 제한적 |
| Agent 구조 | 강함 | 매우 강함 | 약함 |
| Alignment | RLHF + system | RLHF + planning | Constitutional AI |
| Long Context | 강함 | 매우 강함 | 매우 강함 |
중요한 포인트 하나 짚고 가면:
👉 “모델 성능 차이”는 이제
Transformer 구조 차이 때문이 아니라
1. MoE routing quality
2. Tool integration depth
3. Inference orchestration
4. Alignment loop sophistication
에서 갈립니다.