Reinforcement Learning Meets Macroeconomics: Building an RBC Model

Published on: 2025-05-02

Description: Exploring the implementation of a Real Business Cycle model using reinforcement learning techniques, with insights on economic agent behavior, capital dynamics, and steady-state equilibrium.

Written by: Fadi Atieh

#RL


What I’ve Learned So Far from Building an RBC Model in Reinforcement Learning

Over the past few weeks, I’ve been tinkering with a reinforcement learning (RL) simulation of a simple Real Business Cycle (RBC) model. I started from first principles and ran into a fascinating tangle of conceptual and technical questions. This post summarizes what I’ve learned so far.


1. Start Simple: The One-Man Economy

The simplest economy is a “one-man army.” Every day, our solitary economic agent makes a few key decisions:

This is the foundational intuition behind RBC models. From here, everything else grows in complexity.


2. Modeling Lending and Borrowing

This part still feels hazy.

This leads to a deeper question about modeling local vs. global behavior. Can a society lend to itself? Can a lone agent simulate that structure? Still open questions for me.


3. Steady-State Without Borrowing

In a borrowing-free model (pure investment and consumption), I solved for a steady state with the following parameters:

β=0.99,δ=0.012,α=0.36,θ=0.4\beta = 0.99,\quad \delta = 0.012,\quad \alpha = 0.36,\quad \theta = 0.4

Results:

These values make intuitive sense: high productivity leads to accumulation of capital and a relatively low labor requirement due to disutility.


4. What About Reinforcement Learning?

One big question I’m grappling with:

If I simulate this setup using RL, shouldn’t the agent eventually converge to the steady state?

My intuition says yes, because of the Banach fixed-point theorem—if a unique fixed point exists and the Bellman operator is a contraction, the agent should find its way there.

But here’s the catch…


5. Theoretical Issues: Unbounded Costs and Convergence

The mathematical form of the RBC problem in this setup is:

But here’s the problem: the per-stage cost is unbounded.

So value iteration or policy iteration algorithms may not converge unless we manually bound the control space.

My workaround:

Constrain the action space:

ϵcB,h1δ\epsilon \leq c \leq B,\quad h \leq 1 - \delta

But even then, the state space is technically still infinite. Discretizing won’t help unless we bound it too, which means imposing some KmaxK_{\text{max}} and truncating.

Otherwise, we’d need function approximation (e.g., neural nets) to estimate value functions in continuous space.


6. Debugging Non-Convergence

In practice, my RL agent doesn’t always converge to the steady state. Possible reasons:

Next steps:


Final Thoughts

This model started as a toy, but I’m realizing it exposes many foundational issues in applying RL to economic systems:

Still early days, but I’m learning a lot—and that’s the whole point.