In traditional economic theory and econometrics, dimensionality reduction has played a central role. Classical models often distill numerous correlated factors into a small set of explanatory variables or principal components, aiming for tractable equations and clear interpretability. Yet as our computational capabilities grow and the availability of massive, granular datasets increases, this historically necessary simplification can inadvertently disregard the rich interdependencies present in the real world. Researchers have begun to explore more computationally intensive approaches—akin to word embedding or vectorization in natural language processing (NLP)—to capture these high-dimensional relationships among economic variables. Such methods promise a more holistic, multi-factor perspective on economic dynamics.
In this essay, we review some current literature on high-dimensional and embedding-based approaches to economics and discuss why maintaining and analyzing numerous co-occurring factors can lead to superior modeling fidelity. We also outline the technical, policy, and practical considerations for adopting such computationally intensive frameworks in economic analysis.
Economics has long relied on dimension-reducing frameworks for both theoretical and pragmatic reasons:
Theoretical Parsimony
Classic economic models, such as DSGE (Dynamic Stochastic General Equilibrium) frameworks, often rely on strong assumptions—rational agents, a representative consumer, small sets of state variables—to keep the mathematics tractable.
Data Constraints
Historical limitations in computational power and data availability made it impractical to include all possible economic series or cross-sectional variables. Reducing dimensionality (e.g., using Factor Models, PCA, or selecting a handful of regressors) was often the only option.
Interpretation and Communication
Economists, policymakers, and public audiences typically prefer results explained through a small set of interpretable factors, such as “the interest rate,” “the output gap,” or “two or three principal components” that summarize the data.
As the real world is inherently multi-causal and interconnected, dimension reduction can lead to:
Key takeaway: Classic dimensionality reduction can mask or distort the economic reality when numerous interacting variables matter in tandem.
In modern machine learning and data science—particularly in NLP—large, high-dimensional vector spaces have revealed patterns not readily visible in smaller feature sets. Word embeddings (e.g., Word2Vec, GloVe, BERT embeddings) are emblematic: they rely on co-occurrence relationships between words in huge corpora, capturing semantic and syntactic structure through dense, high-dimensional vectors.
A parallel logic applies to economics:
Several lines of research exemplify the move toward high-dimensional modeling:
Collectively, these works point to a common theme: with proper regularization, computational resources, and methodological frameworks, including more variables can improve predictive accuracy and reveal deeper structural insights—countering the longstanding narrative that “less is more” in economics.
While word embeddings learn vector representations based on contextual co-occurrence, an analogous approach in economics might:
Such an approach can detect, for instance, that a certain agricultural commodity’s price vector is close to that of a particular climate index in embedding space, highlighting a direct co-occurrence or correlation pattern relevant to pricing or risk.
From a classical econometrics standpoint, one can handle large sets of variables with shrinkage and regularization:
Modern high-dimensional economic modeling is computationally non-trivial:
Hence, the technical barrier to adopting these models is not insignificant. A robust environment (cloud or on-premise HPC) plus engineering expertise is often needed.
Real economies exhibit intricate webs of connections: consumption patterns, production interdependencies, global supply chains, psychological sentiment, policy ripple effects. Reducing these interdependent signals to a small handful of factors can oversimplify. Instead, scaling up to include more signals can surface non-linear and emergent behaviors, improving both forecasting accuracy and structural understanding.
Dimensionality reduction often masks regime changes or structural breaks, as it must maintain a small factor space that spans many epochs. With more variables in the model—and with flexible estimation—analysts can better detect abrupt changes in how certain variables interact (e.g., after a major technological shift or policy event).
If a government wants to test the impact of complex interventions (e.g., climate taxes, new forms of social assistance), a high-dimensional simulation environment can incorporate numerous secondary and tertiary effects. This is analogous to agent-based models with many attributes for each agent, or macro frameworks augmented by large data inputs. The more granular the model, the more it can reveal unexpected side effects and feedback loops.
Arguably, the success of massive language models highlights the value of “letting the data speak” without a priori restricting the dimensionality. Pre-transformer NLP techniques often hand-engineered features or used smaller embeddings. By scaling up both model size and training data, large language models captured semantics and context with unprecedented richness. Economics, in principle, can leverage a similar approach with time series, cross-sectional, and textual data.
High-dimensional models are often harder to interpret. When hundreds of latent factors or large neural embeddings underlie a forecast, policymakers may question the chain of causality. Improving interpretability (via methods like SHAP values, partial dependence plots, or attention mechanisms) remains an active research area.
While “more data is better” in principle, ensuring data is accurate, uniformly sampled, and relevant is crucial. Noise from low-quality sources can overwhelm signals, and “data dredging” can lead to spurious correlations. Rigorous data cleaning, quality checks, and domain-driven curation are non-negotiable.
Simply throwing thousands of series into a model can lead to overfitting if not properly regularized. Bayesian priors, cross-validation, and advanced methods for hyperparameter tuning are imperative to keep the model robust and generalizable.
Deploying high-dimensional modeling frameworks typically requires:
In an era where computational constraints have relaxed and data is abundantly available, the notion of forcibly reducing dimensionality in economic analysis is increasingly seen as outdated. While classical parsimony was once a necessity, it might now, in some contexts, be a liability. High-dimensional approaches—whether via neural embeddings, large BVARs, or advanced regularization strategies—open the door to more nuanced, accurate, and flexible economic models.
Such models can better capture co-occurring factors that jointly shape macro and micro trends, thereby reducing blind spots. They can handle complex interactions and structural shifts, potentially providing more robust forecasts and richer policy insights. Borrowing from recent successes in language modeling and large-scale machine learning, economists can harness a similarly expansive methodology to move beyond the simplified lens of a handful of variables.
Key takeaway: Our ability to incorporate extremely high-dimensional data is rapidly increasing. Embracing this capacity—even with its interpretability and resource challenges—allows the field of economics to model the real world more faithfully. This shift holds promise for forecasting, scenario testing, and policy design that fully acknowledge the overlapping, intricate causal pathways driving economic outcomes.