I recently re-watched the classic proto-hacker 80's film "WarGames" starring Mathew Broderick and Ally Sheedy. The famous final scene, where the supercomputer Joshua blazes through game scenarios to final come to the conclusion that "the best way to win is not to play" got me thinking and inspired me to try my hand at writing a Python script that pits a computer against itself in a game of Tic-Tac-Toe, using the basic behavioral science concepts of positive and negative reinforcement, to see if I could approximate the fundamental principle of machine learning. After I got the script up and running, I added the ability to dump the results of each game played to an Excel spreadsheet, and then wrote another script using Matplotlib to visual the results. The hypthothesis is that after a certain amount of games, applying a basic machine learning algorythm the program will eventually move towards a state of optimal play, where the game registers more draws than wins. Essential the program is rewarded for making "good" moves that end in a win and penalized for making "bad" moves that result in a loss. The program will reach a state of optimal play when every game results in a draw. Essentially, when both the X and O are playing in an optimal state, the game plays to prevent a loss by achieving a draw.
Building the AI-Powered Tic-Tac-Toe
The foundation of the program is Q-learning, a reinforcement learning algorithm. The AI starts with no knowledge of the game and gradually improves by assigning values to moves based on rewards and penalties.
Here’s a quick breakdown of the main components:
-
Q-learning: The AI maintains a dictionary of board states, updating its move choices based on rewards.
-
State Representation: Each board is stored as a flattened tuple, allowing the AI to recognize patterns.
-
Exploration vs. Exploitation: Initially, the AI picks random moves to explore, but over time, it relies on learned strategies.
-
Game Logic: The program enforces Tic-Tac-Toe rules and determines wins, losses, or draws.
With this in place, I let the AI play itself continuously, logging the results to an Excel file.
Tracking 15,000 Games in Excel
To measure progress, I recorded each game’s outcome in an Excel file using pandas and openpyxl. The data captured:
-
Game number
-
Result (X wins, O wins, or Draw)
The AI played nonstop, generating a dataset of over 15,000 games. The expectation was clear: over time, the number of draws should increase as the AI moves toward optimal play.
Visualizing Learning with Matplotlib
Once the data was collected, I used Matplotlib to analyze how the AI improved. The key visualization was a draw percentage over time graph, calculated as:
Draw Percentage = (Total Draws / Total Games) * 100
Plotting this gave us a clear trajectory of improvement. Initially, wins and losses fluctuated wildly, but as the AI refined its strategy, the percentage of draws steadily climbed. By the end, draws accounted for over 40% of games played—a strong indication that the AI was approaching a level of optimal play.
Results: AI in Action
The final graph showed exactly what we hoped for: a sharp increase in draws, meaning the AI was learning to avoid losing. In Tic-Tac-Toe, perfect play from both sides should always result in a draw. The fact that the was AI approaching 100% draws proved that it was learning over time to eventually solve the game.
Final Thoughts
This project was a cool dive into reinforcement learning, data tracking, and visualization. Watching the AI evolve from random flailing to calculated mastery was incredibly rewarding. If you’re interested in AI, I highly recommend trying something similar—watching an algorithm teach itself is a fascinating experience. While 15,000 games of Tic-Tac-Toe seems like a LOT of games for a computer to begin to teach itself such a simple game, there are a few tweaks to the code that could be made to accelerate the learning. I'll probably continue to tinker with this one for a bit, as it is a surprisingly satisfying to watch the machine start to rack up draws against itself.
No comments:
Post a Comment