Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 288 - Stochastic Systems Seminar

Angela Yu

UCSD

Three wrongs make a right: reward underestimation mitigates idiosyncrasies in human bandit behavior

Abstract:

Combining a multi-armed bandit task and Bayesian computational modeling, we find that humans systematically under-estimate reward availability in the environment. This apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision-making under uncertainty, such as a default tendency to assume non-stationarity in environmental statistics as well as the adoption of a simplistic decision policy. In particular, reward rate underestimation discourages the decision-maker from switching away from a ``good'' option, thus achieving near-optimal behavior (which never switches away after a win). Furthermore, we demonstrate that the Bayesian model that best predicts human behavior is equivalent to a particular class of reinforcement learning models, thus giving statistical, normative grounding to phenomenological models of human behavior.

Host: Ruth Williams

January 23, 2020

2:00 PM

AP&M 7218

****************************