Reinforcement Learning Meets Macroeconomics: Building an RBC Model

Published on: 2025-05-02

Description: Exploring the implementation of a Real Business Cycle model using reinforcement learning techniques, with insights on economic agent behavior, capital dynamics, and steady-state equilibrium.

Written by: Fadi Atieh

#RL

What I’ve Learned So Far from Building an RBC Model in Reinforcement Learning

Over the past few weeks, I’ve been tinkering with a reinforcement learning (RL) simulation of a simple Real Business Cycle (RBC) model. I started from first principles and ran into a fascinating tangle of conceptual and technical questions. This post summarizes what I’ve learned so far.

1. Start Simple: The One-Man Economy

The simplest economy is a “one-man army.” Every day, our solitary economic agent makes a few key decisions:

He starts with some capital, adds labor, and produces output.
He can consume this output or reinvest it as capital for tomorrow.
Capital depreciates over time.
He enjoys consumption and dislikes labor.

This is the foundational intuition behind RBC models. From here, everything else grows in complexity.

2. Modeling Lending and Borrowing

This part still feels hazy.

If this man is the entire economy, then what exactly is he borrowing from or lending to?
Is there a ghostlike government entity setting the interest rate?
If not, and if he’s truly the entire system, then the interest rate must be an endogenous function of his own labor and capital—he is the market.
The debt quantity, $b_t$ , might represent total societal debt—so in some sense, the economy is borrowing from itself.

This leads to a deeper question about modeling local vs. global behavior. Can a society lend to itself? Can a lone agent simulate that structure? Still open questions for me.

3. Steady-State Without Borrowing

In a borrowing-free model (pure investment and consumption), I solved for a steady state with the following parameters:

\beta = 0.99,\quad \delta = 0.012,\quad \alpha = 0.36,\quad \theta = 0.4

Results:

Labor: $h \approx 0.301$
Capital: $k \approx 37.58$
Consumption: $c \approx 1.63$

These values make intuitive sense: high productivity leads to accumulation of capital and a relatively low labor requirement due to disutility.

4. What About Reinforcement Learning?

One big question I’m grappling with:

If I simulate this setup using RL, shouldn’t the agent eventually converge to the steady state?

My intuition says yes, because of the Banach fixed-point theorem—if a unique fixed point exists and the Bellman operator is a contraction, the agent should find its way there.

But here’s the catch…

5. Theoretical Issues: Unbounded Costs and Convergence

The mathematical form of the RBC problem in this setup is:

Discrete-time
Infinite-horizon
Discounted return

But here’s the problem: the per-stage cost is unbounded.

$\log(c) \to -\infty \quad \text{as} \quad c \to 0$
$\log(1 - h) \to -\infty \quad \text{as} \quad h \to 1$

So value iteration or policy iteration algorithms may not converge unless we manually bound the control space.

My workaround:

Constrain the action space:

\epsilon \leq c \leq B,\quad h \leq 1 - \delta

But even then, the state space is technically still infinite. Discretizing won’t help unless we bound it too, which means imposing some $K_{\text{max}}$ and truncating.

Otherwise, we’d need function approximation (e.g., neural nets) to estimate value functions in continuous space.

6. Debugging Non-Convergence

In practice, my RL agent doesn’t always converge to the steady state. Possible reasons:

Exploration noise: Even small epsilon-greedy policies destabilize optimal trajectories.
Too large of a state/action space: The RL algorithm gets stuck in sparse-reward regions.
Imbalanced updates: Reward scaling and learning rate affect convergence.

Next steps:

Plot capital/labor/consumption over time to visualize stability
Test deterministic policies as baselines
Slowly add back stochastic elements

Final Thoughts

This model started as a toy, but I’m realizing it exposes many foundational issues in applying RL to economic systems:

Endogeneity of prices
Modeling self-reference in single-agent economies
Numerical instability due to unbounded costs
Function approximation vs. discretization tradeoffs

Still early days, but I’m learning a lot—and that’s the whole point.