This paper proposes a framework called "Inductive Bias Probe" to evaluate whether foundation models, particularly Transformer-based sequence prediction models, genuinely learn a "world model". This approach utilizes synthetic datasets generated according to specific physical laws or logical structures to assess whether the model has internalized the reasoning derived from those laws.
The most representative experiment involves providing time-series data of planets orbiting a sun to a Transformer model and evaluating if it has truly learned Newtonian mechanics. Although the model achieves high prediction accuracy with an R² > 0.9999, this merely demonstrates its excellent sequence prediction capability. Crucially, it fails at fine-tuning tasks involving the inference of force vectors or laws of gravity. Furthermore, the physical equations derived through symbolic regression bear no resemblance to Newton's law of universal gravitation. This indicates that despite high prediction accuracy, the model has not internalized a world model, suggesting it has simply overfit to task-specific heuristics within the data.
The framework is extended to other domains with clear state spaces, such as lattice problems and Othello games. A common phenomenon observed across all experiments is that while prediction accuracy is high, the inductive bias towards the underlying fundamental structure is weak.
While the analysis presented in the paper is interesting, its experimental design and interpretation exhibit several significant limitations.
Newton's law of universal gravitation, a universal law in physics, operates consistently across all conditions, not just specific situations or environments. As demonstrated in this paper, it applies not only to planetary orbits in our solar system but also consistently to various other physical scenarios, such as:
That is, the law of universal gravitation is not a localized rule specific to particular data or conditions; it possesses a general and deterministic structure that operates consistently across diverse scales and conditions.
However, the Transformer model used in this paper was trained solely on data from a single, restricted simulation environment: planets orbiting a sun. This environment is physically simplified, incorporating constraints such as a fixed central mass, absence of forces other than gravity, and 2D coordinate system. Consequently, the model could perform predictions very well within this specific environment, achieving a high prediction accuracy of R² > 0.9999.
Naturally, such performance is likely not due to the model internalizing physical laws, but rather to its overfitting to specific patterns operating within that simulation environment. In other words, the model merely learned task-specific heuristics that work well only for the given data, and it did not possess the ability to generalize universal physical structures. (This is precisely the point the paper raises as a problem.)
However, is it really valid to judge whether "LLMs learn world models" based on results derived from a model configured in this way? A model trained solely on one type of simulation (and synthetic data, not observational data) inherently induces overfitting, making it difficult to derive general laws. This suggests that the model might not have been suitable for measuring inductive bias in the first place.
The paper mentions the following about the LLM's architecture:
When processing input data, a Transformer model projects input tokens into Query (Q), Key (K), and Value (V) vectors using three main linear transformation matrices: W_q, W_k, and W_v. These projected vectors are crucial in determining the relationships between tokens, specifically which information attends to what. The core operation here is the dot product of Q and K, which quantifies how much one token "attends" to another.
This structure can be seen as forming a semantic space within the model, beyond mere similarity calculations between data points. Therefore, if this model truly understood and internalized the physical world, this semantic space should exhibit a certain alignment with the directions (axes) of actual physical quantities like velocity, distance, mass, and time. In other words, if specific attention heads or the principal directions (e.g., eigenvectors) of the QK space align well with particular physical quantities, it would serve as evidence that the model has learned the intrinsic structure corresponding to those quantities.
However, this paper does not analyze such internal structures (e.g., whether specific attention heads show specialized responses to elements like gravitational distance r or mass m, or whether the projection space of QK dot product is structured in physically meaningful directions). Concluding that the model lacks a world model solely based on fine-tuning failures and symbolic regression results, without a linear algebraic analysis of how the model's internal representations reflect physical concepts, requires careful consideration.
Furthermore, Transformer models employ multiple attention heads (this model uses 12 heads), each designed to focus on different meanings or types of information. This implies that the model forms a distributed semantic space, making it inappropriate to evaluate the entire model's semantic structure based on just one or two fine-tuning results.
In conclusion, asserting the absence of a world model solely based on the model's output without analyzing how the Transformer's internal linear transformation structure aligns with the axes of actual physical quantities is premature and necessitates more precise structural analysis. This is a crucial issue, especially for understanding how the internal representations of high-dimensional models like LLMs correspond to our physical concepts, particularly from the perspective of interpretability.
Physical laws intrinsically possess a deterministic structure. This means that when identical input conditions (e.g., mass, position, velocity) are provided, they consistently produce identical outputs (e.g., force, energy). For instance, Newton's law of universal gravitation can be expressed as a single function where, given only the masses and distance between two objects, the gravitational force between them can always be precisely calculated. Thus, laws of the physical world are composed of well-defined functional relationships that presuppose consistency and reproducibility.
In contrast, the Transformer-based model used in this paper was trained using a next-token prediction method. The goal of this method is to probabilistically predict the next token given a specific input sequence. For example, given previously observed positions, velocities, etc., it predicts what the next position might be. This training method fundamentally learns statistical patterns at the token level and does not necessarily require the model to output exactly one correct answer for a given state. In fact, it is a structure that allows for multiple possible outcomes for the same state, meaning the model operates probabilistically by nature.
This leads to the following problems:
For these reasons, even if the Transformer exhibits high prediction accuracy (e.g., R² > 0.9999), it is difficult to claim that the model "understood" physical laws. This is because the objective function was not designed to internalize a world model in the first place. For a model to truly internalize well-defined structures like physical laws, such structural constraints must be explicitly imposed during the learning process, which is challenging with the conventional next-token prediction method alone.
In conclusion, the observed result in this paper—"despite high prediction accuracy, physical laws were not learned"—is more accurately seen as a structural limitation of the learning objective function rather than a limitation of the model itself.
To overcome these limitations and verify whether Transformer-based prediction models can truly internalize universal physical laws, fundamental improvements in experimental design and model structure are necessary. Here are some personal suggestions for improvements:
(1) Ensure Generalizability with Diverse Physical Scenarios: The current experiment includes only one type of simulation: the sun-planet orbit. However, universal gravitation applies consistently in various situations, such as free fall, two-body interactions (binary systems), gravitational wells, and escape velocity. To truly verify generalizability as a world model, it is necessary to evaluate whether the model can infer a consistent force law across these diverse conditions.
(2) Analyze Layer-wise QK Subspace and Physical Quantity Basis Alignment: The Transformer's Q/K/V projection matrices form meaningful subspaces. It is necessary to verify how well these spaces align with actual physical quantity axes—such as mass, distance, and velocity—through eigenvector analysis, PCA, or weight probing techniques. If a particular attention head consistently shows sensitivity to gravitational distance r, mass m, or velocity v, it can be interpreted that this head carries meaningful physical implications.
(3) Induce Physical Consistency through PINN-based Loss Functions: While traditional fine-tuning uses MSE or cross-entropy-based losses, to guide the model towards compliance with actual physical laws, a PINN (Physics-Informed Neural Network) structure is needed. This involves adding the residual of differential equations as a loss term. For example, a physics-based loss term like the following could be used:
Such a structure can induce the model to internalize physically consistent representations in addition to prediction accuracy.
This paper is closer to answering the question, "Can general physical laws be discovered solely from data prediction capabilities?" Therefore, general physical laws were not considered in the Transformer's design. If we truly want to construct a "world model," what kind of configuration would be necessary? As an answer to this, I propose a structural inductive bias design through a Hybrid Basis.
The Hybrid Basis approach aims to design the Transformer's internal representation space to simultaneously reflect prior knowledge of physical laws (known physics) and newly derived structures from data (emergent structure). The core idea is to fix (freeze) parts of the Transformer's main linear transformation matrices—especially W_q, W_k, and W_v—as pre-defined physical quantity-based bases, while allowing the remaining parts to be freely learned.
For example, physically important quantities such as:
These physical quantities can be considered as orthonormal bases in a high-dimensional vector space, and their corresponding projection directions can be explicitly assigned to specific columns of W_q, W_k, and W_v. For instance, the first column of W_q could be fixed to correspond to the "velocity" physical quantity axis, and the second column to "position." Such fixation can be maintained not only during initialization but also throughout the training process by freezing these directions.
This approach would give the Transformer a hybrid structure:
This hybrid structure offers the following advantages:
Therefore, the Hybrid Basis approach structurally embeds semantic axes into the model's representation space. It offers the potential to connect the model's inductive bias with human-understandable physical laws while also retaining data-driven flexibility.
This paper clearly demonstrates that "good prediction" does not necessarily imply a "good world model". However, the experiments presented are overly restrictive, and structural interpretation of the model's representation space is lacking. Concluding the absence of a world model solely based on fine-tuning failures or symbolic regression failures is premature; linear algebraic structural analysis must be conducted in parallel. Defining physics-based bases and quantitatively analyzing how attention subspaces align with these axes is also essential. Ultimately, the core of future research should be to design models that can internalize how observed data is governed by the world's laws, and to make these internalized structures interpretable.