Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation & Clean-Up of Project #34

Open
15 of 19 tasks
PaulZbigniew opened this issue Nov 14, 2023 · 0 comments
Open
15 of 19 tasks

Documentation & Clean-Up of Project #34

PaulZbigniew opened this issue Nov 14, 2023 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@PaulZbigniew
Copy link
Collaborator

PaulZbigniew commented Nov 14, 2023

Check list of files that are updated/not updated yet:

  • src/
    • agents/
      • agent.py
      • human.py
      • random.py
      • mcts.py
      • ppo.py
      • network/
        • network.py
    • environments/
      • game
      • uttt_env
    • tests/
      • │test_game.py
      • │test_mcts.py
    • main_flask.py
    • mcts_train.py
    • ppo_train.py
    • main_play.py

Create an overview of what we have done and why and in which order.

Example PPO:

  1. Research done 15th and 16th Nov
  2. Started code implementation 16th Nov
  3. Issues to get PPO running with the example we're using (continuous action space vs. discrete action space => categorical distribution, not multivariate (16th and 17th)

Example MCTS:

  1. We created the basic UTTT game
  2. We build the MCTS structure
  3. We build the MCTS class and Node class, the two main ingredients of the MCTS code
  4. We let MCTS play against a random agent
  5. Given that MCTS itself is no Reinforcement Learning algorithm (and learning about RL and building RL algorithms is one of our main goals for this project), we explored options to update MCTS into a "kind-of" RL model.
  6. We updated MCTS with a memory function: We are now able to give MCTS_agent_1 a memory from previous MCTS runs (i.e. with 1000 iterations in 100 plays)
  7. MCTS_agent_1 (with memory) played against random_agent (100 wins, 0 draws, 0 losses)
  8. We let MCTS_agent_1 play against MCTS_no_memory (50 wins, 11 draws, 39 losses)
  9. Updated value for wins and losses - before it was -1 (loss), 0 (draw), 1 (win), now it is 0 (loss), 0.5 (draw), 1 (win). This is best practise, it allows us to see the values as winning probabilities and it rewards draws more effectively.
  10. MCTS_agent_2 (with memory, with updated values) played against MCTS_no_mem (40 wins, 26 draws, 34 losses)
  11. MCTS_agent_2 (with once updated memory, with updated values) played against MCTS_no_mem (48 wins, 24 draws, 28 losses)
  12. MCTS_agent_2 (with twice updated memory, with updated values) played against MCTS_no_mem (51 wins, 23 draws, 26 losses). On top, MCTS_agent_2 plays as starting move (finally!) 5,5 - and first time when playing against him by ourselves, we were forced to save us into a draw. Done with MCTS on the 15th.

Example model selection:- We researched the following models and algorithms: ...

  • We selected the following models: MCTS, PPO, and as a possible 3rd option (if time): a combination of MCTS with AlphaZero
  • Why did we choose these models? MCTS: ... | PPO: ... | possibly MCTS + AlphaZero
  • What are the differences of our models? Sheet
@PaulZbigniew PaulZbigniew added the documentation Improvements or additions to documentation label Nov 14, 2023
@KaufmannLukas KaufmannLukas changed the title Documentation of Project Documentation & Clean-Up of Project Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant