Shortcuts

Train Your First Agent

Training Environment

OpenRL provides users with a simple and easy-to-use way of using it. Here we take the CartPole environment as an example, to demonstrate how to use OpenRL for reinforcement learning training. Create a new file train_ppo.py and enter the following code:

# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent
env = make("CartPole-v1", env_num=9) # create environment, set environment parallelism to 9
net = Net(env) # create the neural network
agent = Agent(net) # initialize the trainer
# start training, set total number of training steps to 20000
agent.train(total_time_steps=20000)

Execute python train_ppo.py in the terminal to start training. On an ordinary laptop, it takes only a few seconds to complete the agent’s training.

Tip

OpenRL also provides command line tools that allow you to complete agent training with one command. Users only need to execute the following command in the terminal:

openrl --mode train --env CartPole-v1

Test Environment

After the agents have completed their training, we can use the agent.act() method to obtain actions. Just add this code snippet into your train_ppo.py file and visualize test results:

# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent
env = make("CartPole-v1", env_num=9) # create environment, set environment parallelism to 9
net = Net(env) # create neural network
agent = Agent(net) # initialize trainer
agent.train(total_time_steps=20000) # start training, set total number of training steps to 20000
# Create an environment for testing and set the number of environments to interact with to 9. Set rendering mode to group_human.
env = make("CartPole-v1", env_num=9, render_mode="group_human")
agent.set_env(env) # The trained agent sets up the interactive environment it needs.
# Initialize the environment and get initial observations and environmental information.
obs, info = env.reset()
while True:
    action, _ = agent.act(obs) # Based on environmental observation input, predict next action.
    obs, r, done, info = env.step(action)
    if any(done): break
env.close()

Execute python train_ppo.py in your terminal window to start training and visualize test results.

The train_ppo.py code can also be downloaded from openrl/examples .

Demonstration:

../_images/train_ppo_cartpole.gif

Note

If you run test code on a server machine you can not use a visualization interface. You can set the render_mode as “group_rgb_array”, then call env.render() after each step to get the environment image.

In the following sections, we will use a more complex multi-agent reinforcement learning task (MPE) as an example, to introduce how to set training hyperparameters, how to switch between parallelism mode and serial mode, and how to use wandb.

Read the Docs v: latest
Versions
latest
stable
main
v0.0.13
v0.0.6
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.