Learning from Machine Learning
Monte Carlo Tree Search, UCB1, and the case for optimism under uncertainty.
I love Jeremy Kun’s writing on Math ∩ Programming:
In life and in business there is no correct answer on what to do, partly because we just don’t have a good understanding of how the world works. In mathematics, we can craft settings with solid answers.
So what does math reveal about optimal decision-making under uncertainty?
When we have complete information (that is, each player in the game is aware of the sequence, strategies, and payoffs throughout gameplay), we should set a threshold and take the first option that meets the threshold.
When we lack complete information (that is, in the real world), nothing matters more than the interval. The interval is defined as how long we have to benefit from new knowledge uncovered from exploration. Generally speaking, we should explore when we have time to use the resulting knowledge and exploit when we are ready to cash in.
Makes sense, right? If we’re moving to a new city and settling down for a while, we should explore the various parks, cafes, and neighborhoods it has to offer—since we’ll have plenty of time to reap the benefit of our favorite places. But if we’re just in town for a night, we should exploit what we already know (or can quickly glean from recommendations and crowdsourced reviews) because heck, it’s just one night.
There are three main approaches to machine learning:
Supervised (prediction) = learn by examples with right/wrong answers
Unsupervised (pattern recognition) = learn by inputs (no labels) to form clusters
Reinforcement (optimization) = learn by trial/error to maximize cumulative reward
In reinforcement learning, the agent generates its training data by interacting with the world. The agent makes decisions, then learns the consequences of its decisions through trial and error, rather than being told the correct decision explicitly.
Life is reinforcement learning.1
Upper Confidence Bound (UCB1) supports the case for optimism in the face of uncertainty. If we are uncertain about something, we should optimistically assume that it is the correct action. Put another way, the grass isn’t always greener—but we should at least see for ourselves. Optimism is probabilistically rational.
Corollary: We should assume the best in others when they’re unknown to us.
There’s a final concept from regression modeling which AWS explains well:
Overfitting occurs when the model cannot generalize and fits too closely to the training dataset instead. Overfitting happens due to several reasons, such as: the training data size is too small and does not contain enough data samples to accurately represent all possible input data values.
When we have high uncertainty and limited experiences, we should think less.
Go out, go out, and explore!