Some Optimal Strategies for Bandit Problems with Beta Prior Distributions
Authors: Lin C-T.1; Shiau C.J.2
Source: Annals of the Institute of Statistical Mathematics, Volume 52, Number 2, June 2000 , pp. 397-405(9)
Publisher: Springer
Abstract:
A bandit problem with infinitely many Bernoulli arms is considered. The parameters of Bernoulli arms are independent and identically distributed random variables from a common distribution with beta(a, b). We investigate the k-failure strategy which is a modification of Robbins's stay-with-a-winner/switch-on-a-loser strategy and three other strategies proposed recently by Berry et al. (1997, Ann. Statist., 25, 21032116). We show that the k-failure strategy performs poorly when b is greater than 1, and the best strategy among the k-failure strategies is the 1-failure strategy when b is less than or equal to 1. Utilizing the formulas derived by Berry et al. (1997), we obtain the asymptotic expected failure rates of these three strategies for beta prior distributions. Numerical estimations and simulations for a variety of beta prior distributions are presented to illustrate the performances of these strategies.
Keywords: Bandit problems; sequential experimentation; dynamic allocation of Bernoulli processes; staying-with-a-winner; switching-on-a-loser; k-failure strategy; m-run strategy; non-recalling m-run strategy; N-learning strategy
Language: English
Document Type: Regular paper
Affiliations: 1: Department of Mathematics, Tamkang University, Tamsui, Taiwan 251, R.O.C. 2: Institute of Mathematical Statistics, National Chung-Cheng University, Chia-Yi, Taiwan 621, R.O.C.
Publication date: 2000-06-01
- In this: publication
- By this: publisher
- In this Subject: Mathematics and Statistics
- By this author: Lin C-T. ; Shiau C.J.

Shopping cart
Receive new issue alert