This is the code for the project of CS489: Reinforce Learning.
This project requires us to implement two kinds of model-free RL methods, which are value-based RL and policy-based RL. We should choose RL algorithms to solve two benchmark environments: Atari Game and Mujoco Robots, which are discrete space control and continuous space control, respectively.
To install the dependencies, run:
$ pip install -r requirements.txt
Note that gym[atari], tb-nightly future and mujoco_py can not be installed by running the above code, for the first two, run:
$ pip install gym[atari]
$ pip install tb-nightly future
For mujoco_py, you need to get a license.
After installing all dependencies, you can run my code to train the models as:
$ python run.py --env_name BreakoutNoFrameskip-v4
Note that my code only support the following 7 environments!
I also provide trained models to do the demo task or testing task, you can do the demo like:
$ python test.py --env_name BreakoutNoFrameskip-v4 --num_episode 10
Note that when num_episode=1, we will do rendering during test. Make sure you have a graphical interface to run this.
For Atari Games, I chose DQN with some optimization, such as Losing-life-stopping (especially works for Breakout) and Skip-frame. For Mujoco Robots, I firstly tried A3C and PPO (PPO2) but got bad results. Finally I used SAC (Soft Actor-Critic) and got better results.
The results are presented as follows. Note that due to the shortage of time and computing resource, some environment should have reached better results. For example, after 3M steps, Humanoid-v2 model can still improve its performance if we continue to train it. But time is not enough for me to do that, so I stopped it at the point of 3M steps. (Update: Now I trained humanoid for 10M steps and it got a better result!)
Environment Name | Average Testing Score | Training Steps |
---|---|---|
BreakoutNoFrameskip-v4 | 416.4±38.6 | 10M |
PongNoFrameskip-v4 | 20.7±0.5 | 10M |
BoxingNoFrameskip-v4 | 96.3±3.1 | 10M |
Environment Name | Average Testing Score | Training Steps |
---|---|---|
Hopper-v2 | 4132.8±30.9 | 3M |
Humanoid-v2 | 7304.6±24.7 | 10M |
HalfCheetah-v2 | 15875.6±36.4 | 3M |
Ant-v2 | 6978.2±75.1 | 3M |