A Book About Bandit Algorithms « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 26, 2013

A Book About Bandit Algorithms

Filed under: Algorithms,Computer Science,Programming — Patrick Durusau @ 9:50 am

A Book About Bandit Algorithms by Noel Welsh.

From the introduction:

…The basic bandit problem is this:

We are alone in a room full of slot machines (fruit machines, or one-armed bandits). At every tick of the clock we must choose a machine to play. When we pull the arm on a machine we get a reward – perhaps a river of coins tumbles from the machine, or perhaps nothing at all happens – but we don’t know what reward we will get till we try the arm. By pulling arms we can learn about the machines, and we must use this knowledge to choose the best arms to pull.

Though I’ve tried to make the problem concrete (and also show how the name multi-armed bandit arises) my description leaves many things undefined:

I haven’t said how the machines behave. Do they always reward at the same rate, or does the rate of reward change arbitrarily, or even capriciously?

Do we know anything about the machines before we start? Are some of them, for example, from the same manufacturer, so we might assume they behave similarly?

What exactly is the goal? Are we playing this game forever, and if so do we want to make the most money possible? Alternatively, we might want to find the best machine as quickly as possible, so we can go do something more interesting with our time instead.

All of these different variations are bandit problems that we will discuss. The essence of a bandit problem is meeting two criteria. Firstly, we only learn about actions when we take them. Thus bandits are partial information problems. We also require that our actions do not change the machines’ behaviour, though their behaviour may change for other reasons. This important restriction means we don’t have to consider the effect our past actions might have on the present, and makes bandit algorithms tractable to solve. If we drop this restriction we have a problem known as reinforcement learning.

Is merging a “partial information problem (PIP)?”

Or does that answer depend upon the complexity of the merging criteria?

If the status of merging as a PIP rests on the complexity of merging criteria, what is the boundary where merging becomes a PIP?

A book in progress so be sure to sign up for new content and assist Noel with comments.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 26, 2013

A Book About Bandit Algorithms

No Comments