by Owain Evans, Andreas Stuhlmüller, John Salvatier, and Daniel Filan

Modeling Agents with Probabilistic Programs

This book describes and implements models of rational agents for (PO)MDPs and Reinforcement Learning. One motivation is to create richer models of human planning, which capture human biases and bounded rationality.

Agents are implemented as differentiable functional programs in a probabilistic programming language based on Javascript. Agents plan by recursively simulating their future selves or by simulating their opponents in multi-agent games. Our agents and environments run directly in the browser and are easy to modify and extend.

The book assumes basic programming experience but is otherwise self-contained. It includes short introductions to “planning as inference”, MDPs, POMDPs, inverse reinforcement learning, hyperbolic discounting, myopic planning, and multi-agent planning.

For more information about this project, contact Owain Evans.

Table of contents

  1. Introduction
    Motivating the problem of modeling human planning and inference using rich computational models.

  2. Probabilistic programming in WebPPL
    WebPPL is a functional subset of Javascript with automatic Bayesian inference via MCMC and gradient-based variational inference.

  3. Agents as probabilistic programs
    One-shot decision problems, expected utility, softmax choice and Monty Hall.

    1. Sequential decision problems: MDPs
      Markov Decision Processes, efficient planning with dynamic programming.

    2. MDPs and Gridworld in WebPPL
      Noisy actions (softmax), stochastic transitions, policies, Q-values.

    3. Environments with hidden state: POMDPs
      Mathematical formalism for POMDPs, Bandit and Restaurant Choice examples.

    4. Reinforcement Learning to Learn MDPs
      RL for Bandits, Thomson Sampling for learning MDPs.

  4. Reasoning about agents
    Overview of Inverse Reinforcement Learning. Inferring utilities and beliefs from choices in Gridworld and Bandits.

  5. Cognitive biases and bounded rationality
    Soft-max noise, limited memory, heuristics and biases, motivation from intractability of POMDPs.

    1. Time inconsistency I
      Exponential vs. hyperbolic discounting, Naive vs. Sophisticated planning.

    2. Time inconsistency II
      Formal model of time-inconsistent agent, Gridworld and Procrastination examples.

    3. Bounded Agents– Myopia for rewards and updates
      Heuristic POMDP algorithms that assume a short horizon.

    4. Joint inference of biases and preferences I
      Assuming agent optimality leads to mistakes in inference. Procrastination and Bandit Examples.

    5. Joint inference of biases and preferences II
      Explaining temptation and pre-commitment using softmax noise and hyperbolic discounting.

  6. Multi-agent models
    Schelling coordination games, tic-tac-toe, and a simple natural-language example.

  7. Quick-start guide to the webppl-agents library
    Create your own MDPs and POMDPs. Create gridworlds and k-armed bandits. Use agents from the library and create your own.


Please cite this book as:

Owain Evans, Andreas Stuhlmüller, John Salvatier, and Daniel Filan (electronic). Modeling Agents with Probabilistic Programs. Retrieved from [bibtex]

  title = {{Modeling Agents with Probabilistic Programs}},
  author = {Evans, Owain and Stuhlm\"{u}ller, Andreas and Salvatier, John and Filan, Daniel},
  year = {2017},
  howpublished = {\url{}},
  note = {Accessed: }

Open source


We thank Noah Goodman for helpful discussions, all WebPPL contributors for their work, and Long Ouyang for webppl-viz. This work was supported by Future of Life Institute grant 2015-144846 and by the Future of Humanity Institute (Oxford).