You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Check list of files that are updated/not updated yet:
src/
agents/
agent.py
human.py
random.py
mcts.py
ppo.py
network/
network.py
environments/
game
uttt_env
tests/
│test_game.py
│test_mcts.py
main_flask.py
mcts_train.py
ppo_train.py
main_play.py
Create an overview of what we have done and why and in which order.
Example PPO:
Research done 15th and 16th Nov
Started code implementation 16th Nov
Issues to get PPO running with the example we're using (continuous action space vs. discrete action space => categorical distribution, not multivariate (16th and 17th)
Example MCTS:
We created the basic UTTT game
We build the MCTS structure
We build the MCTS class and Node class, the two main ingredients of the MCTS code
We let MCTS play against a random agent
Given that MCTS itself is no Reinforcement Learning algorithm (and learning about RL and building RL algorithms is one of our main goals for this project), we explored options to update MCTS into a "kind-of" RL model.
We updated MCTS with a memory function: We are now able to give MCTS_agent_1 a memory from previous MCTS runs (i.e. with 1000 iterations in 100 plays)
MCTS_agent_1 (with memory) played against random_agent (100 wins, 0 draws, 0 losses)
We let MCTS_agent_1 play against MCTS_no_memory (50 wins, 11 draws, 39 losses)
Updated value for wins and losses - before it was -1 (loss), 0 (draw), 1 (win), now it is 0 (loss), 0.5 (draw), 1 (win). This is best practise, it allows us to see the values as winning probabilities and it rewards draws more effectively.
MCTS_agent_2 (with memory, with updated values) played against MCTS_no_mem (40 wins, 26 draws, 34 losses)
MCTS_agent_2 (with once updated memory, with updated values) played against MCTS_no_mem (48 wins, 24 draws, 28 losses)
MCTS_agent_2 (with twice updated memory, with updated values) played against MCTS_no_mem (51 wins, 23 draws, 26 losses). On top, MCTS_agent_2 plays as starting move (finally!) 5,5 - and first time when playing against him by ourselves, we were forced to save us into a draw. Done with MCTS on the 15th.
Example model selection:- We researched the following models and algorithms: ...
We selected the following models: MCTS, PPO, and as a possible 3rd option (if time): a combination of MCTS with AlphaZero
Why did we choose these models? MCTS: ... | PPO: ... | possibly MCTS + AlphaZero
Check list of files that are updated/not updated yet:
Create an overview of what we have done and why and in which order.
Example PPO:
Example MCTS:
Example model selection:- We researched the following models and algorithms: ...
The text was updated successfully, but these errors were encountered: