Make money with the cloud: Regret, Greed & Multi-armed bandit

Article
02/26/2012

If you are a student or professor, researcher or entrepreneur, we all want that edge. Being able to define greed and regret, emotions, with respect to advertisements generate money, make for interesting senior projects, motivate research grants or it could save money in advertisements. For a student or software engineer, it could create a system that you could store in the cloud and sell customized access, you rake off money for the access and computations, maybe a little consulting. Work from the beach, in Hawaii!

In economic game theory there is a scenario referred to as the Multi-armed bandit. The multi-armed bandit is defined as:

The multi-armed bandit problem derives its name from an imagined slot machine with k( 2) arms. The ith arm has a payoff probability
pi which is unknown. When arm i is pulled, the player wins a unit reward with payoff probability pi. The objective is to construct N successive pulls of the slot machines to maximize the total expected reward. This gives rise to the familiar explore/exploit dilemma where on one hand one would like to gather information on the unknown payoff probabilities, while on the other hand one would like to sample arms with the best payoff probabilities.

Quote from: https://research.yahoo.com/files/sdm-bandit.pdf

Bottom line: How do you optimize the marketing of your phone apps? It would appear that if you could build a simple model of the Multi-armed bandit then you could have an improved forecasting system to tell when your apps are doing well and then make bets on where to place your efforts.

Here is a paper that utilized a simulation of the Multi-armed bandit, but only described the simulation in formula, if you have access to Matlab you could use Matlab to assist you with your modeling. In this manner you would be able to tell if your advertisement campaign were trending the way you would want them to. Or you could just guess, likely about the same outcome, or would it? That would be the purpose of this simulation: Check your ability to guess against a model or simulation.

https://www.aladdinproject.org/wp-content/uploads/2011/01/PavlidisTH_2007.pdf

Or MIT, more formulas and proofs, be nice if someone would use pseudo-code so I wouldn’t have to interpret the formulas.

https://www.mit.edu/~jnt/Papers/J125-09-greedy-bandits.pdf

Harvard weights in with a simpler paper that defines terms carefully (meaning that I can put them into a program, if I get the motivation):

https://www.eecs.harvard.edu/~parkes/cs286r/spring06/papers/fostervohra_ref99.pdf

Share via

Make money with the cloud: Regret, Greed & Multi-armed bandit

Additional resources