🌱 파울로 프레이리: 누구인가
Paulo Freire는 1921년 브라질 레시페에서 태어난 교육학자이자 사회운동가였다.
그는 브라질의 문맹 퇴치 운동에 깊이 관여했으며, 1963년에는 국립 문맹퇴치 프로그램을 이끌었다.
그러나 1964년 군부 쿠데타 이후 반체제 인사로 지목되어 투옥되고 망명하게 되었으며, 이후 전 세계를 돌며 민중교육과 해방교육 운동에 참여했다.
그는 1997년 사망했지만, 그의 사상과 저작은 현재까지도 전 세계 교육이론과 사회운동에 큰 영향을 미치고 있다.
그의 가장 유명한 저작은 Pedagogy of the Oppressed (한국어판: 『피압박자의 교육학』 또는 『억압받는 이들과 페다고지』)이며, 이 책은 비판적 교육학의 고전으로 평가받는다.
✊ 비판적 교육학 — 핵심 철학과 방식
Freire의 교육철학은 단순한 지식 전달을 넘어 “교육을 통한 해방”을 목표로 삼는다.
• 기존 교육에 대한 비판: “은행저금식 교육 (Banking Model)”
그는 전통적인 교육 방식을 “은행저금식(banking) 교육”이라 비판했다. 이 방식에서는 교사가 지식을 일방적으로 전달하고, 학생은 수동적인 ‘빈 그릇’처럼 지식을 쌓기만 한다. 이 구조는 학생의 비판적 사고를 억압하고, 억압 구조를 재생산할 위험이 있다고 봤다.
• 대안으로서의 “문제제기식 교육 (Problem-Posing Education)”
대신 그는 “문제제기식 교육”을 제안했다. 이 방식에서는 학생과 교사가 수평적 관계에서 함께 대화하며, 현실을 비판적으로 성찰하고, 그 의미를 탐구하는 과정을 통해 학습이 이루어진다.
이런 대화적이고 참여적인 교육 방식은 단순한 지식 전달이 아니라, 학습자가 스스로 현실을 읽고, “다시 쓰는 (재해석하는)” 존재가 되도록 돕는다. 즉, 교육이 인간과 사회를 변화시킬 수 있는 실천적 수단이 된다.
• “의식화 (Conscientização)” & “실천 (Praxis)”
프레이리는 교육의 핵심으로 ‘의식화(conscientização)’를 강조했다. 이는 단순한 정보 습득이 아니라, 사회 구조 속에서 자신과 자신의 삶이 어떤 위치에 있는지를 성찰하고, 억압과 불평등의 원인을 인식하는 비판적 자각을 의미한다.
그리고 진정한 교육은 이런 의식화에 그치지 않고, ‘프락시스(praxis)’ — 즉, 생각(이론)과 행동(실천)의 반복을 통해 현실을 변화시키는 과정이어야 한다고 주장했다.
따라서 교육은 중립적이 아니라 본질적으로 가치지향적이며, 사회 변혁을 위한 정치적 실천과 연결될 수밖에 없다.
🌍 왜 중요한가: 비판적 교육학의 영향과 의의
비판적 교육학은 단지 학교 수업의 방법론을 넘어, 교육이 사회 구조, 권력, 계급, 억압과 어떻게 연결되는지를 해명함으로써 교육의 정치적·사회적 의미를 부각했다.
이 접근은 전 세계의 민중교육, 성인교육, 인권·사회정의 교육, 시민교육, 대안교육 등에 광범위하게 영향을 주었다.
특히 억압된 계층, 빈곤층, 비문해자, 사회적 소수자들에게 자기 현실을 비판적으로 성찰하고 변화를 모색할 수 있는 의식과 기회를 제공함으로써, 교육을 단순한 기술 전수에서 공동체의 해방과 사회 변혁의 전진 기제로 전환시켰다.
또한, 교육자에게도 “지식을 주입하는 자”가 아니라 “공동 탐구자”, “문화의 작업자 (cultural worker)” 로서 역할을 재정의하게 만들었다. 교사와 학생 사이의 수직적 권위 구조를 해체하고, 교육을 더 민주적이고 참여적인 과정으로 재구성한 것이다.
🔎 내 관점에서 적용해볼 수 있는 의미
정치경제, 사회 불평등, 공공정책, 인권, 미디어 분석 등을 고려할 때, Freire의 비판적 교육학은 다음과 같은 의미가 있다고 본다:
교육이나 정보 전달만이 아니라, 사람들이 자신의 위치와 현실 구조를 비판적으로 인식하길 바란다면, ‘문제제기식 교육 + 의식화 + 프락시스’ 방식은 매우 유용한 틀이다.
미디어, 사회 운동, 공공 보건, 정책 설계 등에서 “단순 전달”로 끝나는 것이 아니라, 수혜자나 시민이 주체가 되어 현실을 재해석하고 변화에 참여하는 방식 — Freire의 접근은 이런 주체화 과정을 설계할 때 강한 이론적 토대가 될 수 있다.
“정치경제와 사회 구조”, “공공 정책과 시민 참여”, “가짜 뉴스와 해방적 정보 접근”, “미디어 콘텐츠와 시민 의식 고양”의 맥락에서도 Freire의 교육철학은 유효할 수 있다.
How a High-School Dropout Became an OpenAI Researcher
(A Highly Relatable Guide to Learning in the AI Era)
Gabriel Peterson, who grew up in a rural Swedish village and dropped out of high school, explains how he became a researcher on OpenAI’s Sora team. His story offers a practical roadmap for how to learn and build a career in the age of AI. One message especially resonates: never let go of an AI tool until you reach your “aha!” moment.
Key Insights
Learn top-down, not bottom-up.
Instead of studying theory first (math → linear algebra → ML), he begins with a real project: build something, fix bugs, get it to work—then ask why it works.
Relentlessly ask AI questions (recursive gap-filling).
Whenever something is unclear, keep drilling down with AI:
“Why does this code work?” → “Because of matrix multiplication.” → “Explain that intuitively.” → “Show me a diagram.”
This accelerates learning dramatically.
Find real problems—your best teacher.
He learned by building recommendation systems for actual companies. Real pressure and real stakes make learning stick.
Show demos, not résumés.
Companies care about what you can build. Send a demo link, not a long self-intro. Gabriel even walked into client meetings with improved versions of their own systems.
Listen only to advice aligned with your incentives.
Filter out generic or fear-based advice. He ignored “play it safe” guidance and followed people whose goals aligned with his.
Opportunity comes from “density.”
He moved to San Francisco because that’s where people who think and work like him gather. Environment has massive leverage.
How Gabriel Uses AI for Learning
Ask for explanations “as if to a 12-year-old.”
This forces intuitive, real-world analogies for abstract concepts.
Ask, “Why not the other way?”
Challenge AI’s solutions. Understanding failed alternatives builds real mastery.
Summarize research papers by differences.
Have AI list exactly what’s new compared to prior work.
Request visualizations of intermediate code steps.
Ask how tensors change shape, how data flows—so you see how the code works.
The takeaway:
In the AI age, the most valuable people aren’t those who memorize knowledge, but those who know how to explore, question, and connect knowledge using AI.
It’s a fascinating interview—I highly recommend watching it this weekend. Here’s the link below.
https://youtu.be/vq5WhoPCWQ8?si=pyofygEvLXm3l972
I summarize five recent, high-impact AI papers (2025) and list exactly what’s new in each one compared to prior work. I pulled each paper/conference page so you can follow the sources. If you want more papers or deeper technical detail (math, pseudo-code, or tables of experimental numbers), say which paper and I’ll expand.
Source: arXiv / paper page. (arXiv)
Short summary: ReT-2 proposes a unified retrieval model that accepts multimodal queries (e.g., image + text) and retrieves from collections of multimodal documents. It uses recurrence-inspired mechanisms to preserve fine-grained features across layers and modalities.
What’s new vs prior work (clear, itemized list):
Source: ICLR 2025 accepted papers listing. (ICLR)
Short summary: ICLR 2025 included multiple papers proposing generalist embedding models (NV-Embed, MM-EMBED) trained to serve as universal multimodal embeddings usable for retrieval, classification, and as grounding for LLMs.
What’s new vs prior work:
Source: ICLR 2025 papers list. (ICLR)
Short summary: GenXD presents a generative architecture and training recipe that produces coherent 3D scene models (and time-varying 4D scenes) from multimodal prompts.
What’s new vs prior work:
Source: NeurIPS 2025 Best Paper announcement (conference blog). (NeurIPS Blog)
Short summary: The NeurIPS 2025 best papers include foundational advances in (a) diffusion model theory (new sampling/likelihood analysis) and (b) novel attention mechanisms that improve LLM efficiency and reasoning.
What’s new vs prior work (split by theme):
a) Diffusion theory & sampling
b) Attention & LLM architecture
Source: ICML 2025 papers list. (ICML)
Short summary: DAMA proposes techniques for aligning multimodal LLMs by explicitly modeling dataset bias and the model’s inductive tendencies, combining data augmentation, contrastive alignment, and model-aware calibration.
What’s new vs prior work:
If you’d like, I’ll now:
Which would you prefer?
I expanded ReT-2 (Recurrence Meets Transformers for Universal Multimodal Retrieval) into a detailed, difference-by-difference technical breakdown with the math intuition, the loss/optimization recipe, the implementation / hyperparameter details reported in the paper, and concrete code pointers so you can reproduce or adapt it.
I base this on the authors’ paper and the open-source release (paper + GitHub + Hugging Face). Key sources: the ReT-2 paper (arXiv / CVPR version) and the official repo / model hub. (arXiv)
ReT-2 fuses multi-layer features from vision & text backbones through a recurrent Transformer cell with LSTM-style gates, producing multi-token late-interaction representations; it replaces expensive full late-interaction computation with an efficient score-fusion / token-reduction option while keeping fine-grained matching power. (arXiv)
The paper frames the recurrence/gating as LSTM-like operations. The authors do not change the LSTM concept — they adapt it to fuse layer features and modalities. The typical LSTM-style equations (presented here as the conceptual form that ReT-2 follows) are:
Gates (sigmoid activations) and candidate update (tanh / linear) follow the standard pattern:

(Interpretation: the forget gate modulates carry from earlier layers; text/vision gates decide how much each modality contributes at this layer. The ReT-2 paper uses a Transformer-style processing inside the cell; the above is the canonical LSTM-style conceptualization the authors reference.) (arXiv)
Note: the paper’s text and figures describe modality-specific input gates and a forget gate; see the CVPR PDF for the exact notation and diagram. (CVF Open Access)
Concretely, the per-batch loss (InfoNCE) is:

From the Implementation Details / Appendix (paper + repo):
GPUs / runtime: trained in mixed precision on 4 × NVIDIA A100 64GB for up to ~24 hours (experimental runs differ by dataset). (arXiv)
Optimizer: Adam. Gradient checkpointing used when unfreezing heavy backbones. (arXiv)
Hidden size for the recurrent cell: 1,024 (biases set to zero as per ReT design). (arXiv)
Batching / steps:
Learning rates: the paper specifies learning-rate schedules (linear warmup to peak then cosine decay). The HTML rendering omits the explicit numeric peak LR in some places; however the paper describes that when unfreezing backbone parameters the backbone learning rate is downscaled by 0.05 relative to the recurrent cell for stability (i.e., backbone LR = 0.05 × recurrent cell LR). Use gradient checkpointing when unfreezing. (arXiv)
Indexing / retrieval: FAISS used for fast approximate indexing at test time. (arXiv)
Practical note: because the authors downscale the backbone LR by 0.05 (when unfreezing), a typical reproduction strategy is to set the recurrent cell LR in the range used for new trainable heads (e.g., mid 1e-4 → 1e-3 scales depending on batch size) and set backbone LR to 5% of that — the paper prescribes the 0.05 factor but the exact numeric LR value(s) as printed in the HTML were symbols; consult the repo scripts or the paper Appendix for the precise numbers used in each run. (arXiv)
(These are the precise architectural / training differences emphasized by the authors.)
Multi-layer feature use (vs only final layer)
Recurrent Transformer cell with modality gates (vs vanilla Transformer stacks)
Token reduction / score-fusion (vs many token late interaction)
Single-stage training recipe on M2KR
Efficiency & pruning analysis
requirements.txt, inference.py, SLURM examples, README usage examples. (GitHub)RetrieverModel.from_pretrained(...) as shown in the README to load models. (GitHub)train.sh + config.yaml tuned to reproduce the CLIP ViT-L results (I’ll propose numeric LR choices consistent with authors’ downscale rule).Tell me which follow-up you want and I’ll expand immediately (I’ve already fetched the paper, repo and HF models so I can dive into code or produce runnable configs right away).
ReT-2: Recurrence Meets Transformers for Universal Multimodal Retrieval (arXiv)
Unified model for multimodal queries (e.g., image+text) retrieving from mixed documents, using recurrence to preserve features.
New: Multimodal-query support; recurrence-gating in Transformers; multi-layer signals; interleaved training; benchmark gains.
NV-Embed / MM-EMBED: Universal Multimodal Embeddings (ICLR 2025)
Generalist embeddings for text/images/audio/video, usable in retrieval/classification/LLM grounding.
New: Shared space across modalities; contrastive + task-aware losses; heterogeneous datasets; LLM compatibility.
GenXD: Generating Any 3D and 4D Scenes (ICLR 2025)
Generative pipeline for coherent 3D/4D scenes from multimodal prompts.
New: Unified 3D/4D outputs; composable editing; cross-modal conditioning; efficient rendering tricks.
NeurIPS 2025 Best Papers: Diffusion Theory & LLM Attention (NeurIPS Blog)
Advances in diffusion sampling bounds/schedules and efficient attention for LLMs.
New (Diffusion): Tighter error bounds; faster samplers; score-matching links.
New (Attention): Resource-aware variants; stable long-range reasoning.
DAMA: Data- and Model-Aware Alignment of Multimodal LLMs (ICML 2025)
Alignment via bias modeling, augmentation, and calibration for safer multimodal outputs.
New: Model-adaptive data/objectives; counterfactual generation; calibration heads; safety benchmark wins.
ReT-2: Recurrence Meets Transformers for Universal Multimodal Retrieval
arXiv:2509.08897 [cs.IR] (Sep 2025).
Authors: Davide Caffagni et al.
Link: https://arxiv.org/abs/2509.08897
(Note: Related precursor paper at arXiv:2503.01980 [cs.CV] (Mar 2025).)
NV-Embed / MM-EMBED: Universal Multimodal Retrieval / Generalist Embedding Models (ICLR 2025)
GenXD: Generating Any 3D and 4D Scenes (ICLR 2025)
arXiv preprint (2024); OpenReview: https://openreview.net/forum?id=1ThYY28HXg; Proceedings: https://proceedings.iclr.cc/paper_files/paper/2025/file/ee2841db84cd09a5f6e3e313ce3d79d9-Paper-Conference.pdf; Project: https://gen-x-d.github.io/; Code: https://github.com/HeliosZhao/GenXD
Authors: Yuyang Zhao et al. (Microsoft Research).
NeurIPS 2025 Best Paper(s): Advances in Diffusion Model Theory & LLM Attention Mechanisms
Announcement: https://blog.neurips.cc/2025/11/26/announcing-the-neurips-2025-best-paper-awards/ (Nov 27, 2025).
Key winners (matching themes):
DAMA: Data- and Model-Aware Alignment of Multimodal LLMs (ICML 2025)
arXiv:2502.01943 [cs.CV] (Feb 2025); ICML Poster: https://icml.cc/virtual/2025/poster/43449; Code: https://github.com/injadlu/DAMA
Authors: Jinda Lu et al.
ReT-2: Recurrence + Transformers for Universal Retrieval
Hybrid model revives recurrence in Transformers for multimodal (text/image/video/audio) retrieval, preserving long sequences with low memory.
Why it matters: Universal backbone; efficient temporal grounding; excels on mixed-format tasks.
arXiv:2509.08897
NV-Embed / MM-EMBED: Generalist Embedding Models
NVIDIA models create shared embedding space for text/image/video/audio/code, via balanced batches and hierarchical contrastive training (beyond CLIP's language bias).
Why it matters: Robust zero-shot cross-modal tasks; one space for all modalities.
NV-Embed | MM-Embed
GenXD: Unified 3D + 4D Scene Generation
Diffusion-Transformer unifies static 3D and dynamic 4D scenes via cross-dimensional latents and multiview-temporal disentanglement.
Why it matters: Single-model "any" content generation; high-fidelity for AR/VR/robotics.
ICLR 2025
NeurIPS 2025 Best Papers: Diffusion Theory & Gated Attention
A. Implicit Overfitting Postponement: Diffusion's score-matching delays overfitting, explaining scalability.
B. Gated Attention: Sigmoid gates post-attention boost LLM stability/memory efficiency.
Why it matters: Predictable training; reduced sinks/collapse for long contexts.
Awards Blog
DAMA: Data- and Model-Aware Alignment for Multimodal LLMs
Dynamic DPO adjusts for data hardness and model states, curbing hallucinations via adaptive preferences.
Why it matters: Stable, grounded multimodal training; outperforms GPT-4V on safety.
arXiv:2502.01943
Overall Takeaways: Trends: Unification of modalities/tasks; efficiency via recurrence/gating; dynamic adaptation. AI gets general, robust, cheaper to scale.