openrl.envs.wrappers package¶

Submodules¶

openrl.envs.wrappers.atari_wrappers module¶

class openrl.envs.wrappers.atari_wrappers.ClipRewardEnv(env, cfg=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reward(reward)[源代码]¶: Bin reward to {+1, 0, -1} by its sign.

class openrl.envs.wrappers.atari_wrappers.EpisodicLifeEnv(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reset(**kwargs)[源代码]¶: Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.atari_wrappers.FireResetEnv(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reset(**kwargs)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.atari_wrappers.NoopResetEnv(env, noop_max=30)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reset(**kwargs)[源代码]¶: Do no-op action for a number of steps in [1, noop_max].

class openrl.envs.wrappers.atari_wrappers.WarpFrame(env, width=84, height=84)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(frame)[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

openrl.envs.wrappers.base_wrapper module¶

class openrl.envs.wrappers.base_wrapper.BaseObservationWrapper(env, cfg=None, reward_class=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation: gymnasium.core.ObsType) → gymnasium.core.WrapperObsType[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

reset(*, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) → Tuple[gymnasium.core.WrapperObsType, Dict[str, Any]][源代码]¶: Modifies the env after calling reset(), returning a modified observation using self.observation().

step(action: gymnasium.core.ActType) → Tuple[gymnasium.core.WrapperObsType, SupportsFloat, bool, bool, Dict[str, Any]][源代码]¶: Modifies the env after calling step() using self.observation() on the returned observations.

class openrl.envs.wrappers.base_wrapper.BaseRewardWrapper(env, cfg=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reward(reward: openrl.envs.wrappers.base_wrapper.ArrayType) → openrl.envs.wrappers.base_wrapper.ArrayType[源代码]¶

Returns a modified environment reward.

Args:: reward: The env step() reward
Returns:: The modified reward

step(action: gymnasium.core.ActType) → Tuple[gymnasium.core.ObsType, SupportsFloat, bool, bool, Dict[str, Any]][源代码]¶: Modifies the env step() reward using self.reward().

class openrl.envs.wrappers.base_wrapper.BaseWrapper(env, cfg=None, reward_class=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property agent_num¶

property env_name¶

property has_auto_reset¶

set_render_mode(render_mode: Union[None, str])[源代码]¶

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

property use_monitor¶

openrl.envs.wrappers.extra_wrappers module¶

class openrl.envs.wrappers.extra_wrappers.AddStep(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation)[源代码]¶

Flattens an observation.

Args:: observation: The observation to flatten
Returns:: The flattened observation

reset(*, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) → Tuple[gymnasium.core.WrapperObsType, Dict[str, Any]][源代码]¶: Modifies the env after calling reset(), returning a modified observation using self.observation().

step(action: gymnasium.core.ActType) → Tuple[gymnasium.core.WrapperObsType, SupportsFloat, bool, bool, Dict[str, Any]][源代码]¶: Modifies the env after calling step() using self.observation() on the returned observations.

class openrl.envs.wrappers.extra_wrappers.AutoReset(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property has_auto_reset¶

class openrl.envs.wrappers.extra_wrappers.ConvertEmptyBoxWrapper(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation)[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

class openrl.envs.wrappers.extra_wrappers.DictWrapper(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation)[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

class openrl.envs.wrappers.extra_wrappers.FlattenObservation(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation)[源代码]¶

Flattens an observation.

Args:: observation: The observation to flatten
Returns:: The flattened observation

class openrl.envs.wrappers.extra_wrappers.FrameSkip(env, num_frames: int = 8)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.extra_wrappers.GIFWrapper(env, gif_path: str, fps: int = 30)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

close()[源代码]¶: Closes the wrapper and env.

reset(**kwargs)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.extra_wrappers.MoveActionMask2InfoWrapper(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reset(**kwargs)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.extra_wrappers.RecordReward(env, cfg=None, reward_class=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property has_auto_reset¶

class openrl.envs.wrappers.extra_wrappers.RemoveTruncated(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

step(action)[源代码]¶

Steps through the environment, returning 5 or 4 items depending on output_truncation_bool.

Args:: action: action to step through the environment with
Returns:: (observation, reward, terminated, truncated, info) or (observation, reward, done, info)

class openrl.envs.wrappers.extra_wrappers.ZeroRewardWrapper(env, cfg=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reward(reward: openrl.envs.wrappers.base_wrapper.ArrayType) → openrl.envs.wrappers.base_wrapper.ArrayType[源代码]¶

Returns a modified environment reward.

Args:: reward: The env step() reward
Returns:: The modified reward

openrl.envs.wrappers.extra_wrappers.convert_to_done_step_api(step_returns, is_vector_env: bool = False)[源代码]¶

openrl.envs.wrappers.extra_wrappers.step_api_compatibility(step_returns, output_truncation_bool: bool = True, is_vector_env: bool = False)[源代码]¶

openrl.envs.wrappers.flatten module¶

openrl.envs.wrappers.flatten.flatten(space: gymnasium.spaces.space.Space[openrl.envs.wrappers.flatten.T], agent_num: int, x: openrl.envs.wrappers.flatten.T) → Union[numpy.ndarray[Any, numpy.dtype[Any]], Dict[str, Any], Tuple[Any, ...], gymnasium.spaces.graph.GraphInstance][源代码]¶

Flatten a data point from a space.

This is useful when e.g. points from spaces must be passed to a neural network, which only understands flat arrays of floats.

Args:

space: The space that x is flattened by x: The value to flatten

Returns:

The flattened datapoint

For gymnasium.spaces.Box and gymnasium.spaces.MultiBinary, this is a flattened array

For gymnasium.spaces.Discrete and gymnasium.spaces.MultiDiscrete, this is a flattened one-hot array of the sample

For gymnasium.spaces.Tuple and gymnasium.spaces.Dict, this is a concatenated array the subspaces (does not support graph subspaces)

For graph spaces, returns GraphInstance where:

GraphInstance.nodes are n x k arrays

GraphInstance.edges are either:

m x k arrays

None

GraphInstance.edge_links are either:

m x 2 arrays

None

Raises:

NotImplementedError: If the space is not defined in gymnasium.spaces.

Example:

>>> from gymnasium.spaces import Box, Discrete, Tuple
>>> space = Box(0, 1, shape=(3, 5))
>>> flatten(space, space.sample()).shape
(15,)
>>> space = Discrete(4)
>>> flatten(space, 2)
array([0, 0, 1, 0])
>>> space = Tuple((Box(0, 1, shape=(2,)), Box(0, 1, shape=(3,)), Discrete(3)))
>>> example = ((.5, .25), (1., 0., .2), 1)
>>> flatten(space, example)
array([0.5 , 0.25, 1.  , 0.  , 0.2 , 0.  , 1.  , 0.  ])

openrl.envs.wrappers.image_wrappers module¶

class openrl.envs.wrappers.image_wrappers.TransposeImage(env=None, op=[2, 0, 1])[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(ob)[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

openrl.envs.wrappers.mat_wrapper module¶

class openrl.envs.wrappers.mat_wrapper.MATWrapper(env: openrl.envs.vec_env.base_venv.BaseVecEnv)[源代码]¶

基类：openrl.envs.vec_env.wrappers.base_wrapper.VectorObservationWrapper

observation(observation)[源代码]¶

Defines the observation transformation.

Args:: observation (object): the observation from the environment
Returns:: observation (object): the transformed observation

property observation_space¶: Return the Env observation_space unless overwritten then the wrapper observation_space is used.

openrl.envs.wrappers.monitor module¶

class openrl.envs.wrappers.monitor.Monitor(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

A monitor wrapper for Gym environments, it is used to know the episode reward, length, time and other data.

参数: env -- The environment

get_episode_lengths() → List[int][源代码]¶

Returns the number of timesteps of all the episodes

返回

get_episode_rewards() → List[float][源代码]¶

Returns the rewards of all the episodes

返回

get_episode_times() → List[float][源代码]¶

Returns the runtime in seconds of all the episodes

返回

get_total_steps() → int[源代码]¶

Returns the total number of timesteps

返回

reset(**kwargs)[源代码]¶

Calls the Gym environment reset.

参数: kwargs -- Extra keywords saved for the next episode. only if defined by reset_keywords
返回: the first observation of the environment

step(action: Union[numpy.ndarray, int])[源代码]¶

Step the environment with the given action

参数: action -- the action
返回: observation, reward, done, information or observation, reward, terminal, truncated, information

openrl.envs.wrappers.multiagent_wrapper module¶

class openrl.envs.wrappers.multiagent_wrapper.Single2MultiAgentWrapper(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property agent_num¶

reset(*, seed=None, options=None)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

openrl.envs.wrappers.pettingzoo_wrappers module¶

openrl.envs.wrappers.util module¶

openrl.envs.wrappers.util.is_wrapped(env: gymnasium.core.Env, wrapper_class: Type[openrl.envs.wrappers.base_wrapper.BaseWrapper]) → bool[源代码]¶

Check if a given environment has been wrapped with a given wrapper.

参数

env -- Environment to check
wrapper_class -- Wrapper class to look for

返回

True if environment has been wrapped with wrapper_class.

openrl.envs.wrappers.util.nest_expand_dim(input: Any) → Any[源代码]¶

openrl.envs.wrappers.util.unwrap_wrapper(env: gymnasium.core.Env, wrapper_class: Type[openrl.envs.wrappers.base_wrapper.BaseWrapper]) → Optional[openrl.envs.wrappers.base_wrapper.BaseWrapper][源代码]¶

Retrieve a BaseWrapper object by recursively searching.

参数

env -- Environment to unwrap
wrapper_class -- Wrapper to look for

返回

Environment unwrapped till wrapper_class if it has been wrapped with it

Module contents¶

class openrl.envs.wrappers.AutoReset(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property has_auto_reset¶

class openrl.envs.wrappers.BaseObservationWrapper(env, cfg=None, reward_class=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation: gymnasium.core.ObsType) → gymnasium.core.WrapperObsType[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

reset(*, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) → Tuple[gymnasium.core.WrapperObsType, Dict[str, Any]][源代码]¶: Modifies the env after calling reset(), returning a modified observation using self.observation().

step(action: gymnasium.core.ActType) → Tuple[gymnasium.core.WrapperObsType, SupportsFloat, bool, bool, Dict[str, Any]][源代码]¶: Modifies the env after calling step() using self.observation() on the returned observations.

class openrl.envs.wrappers.BaseRewardWrapper(env, cfg=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reward(reward: openrl.envs.wrappers.base_wrapper.ArrayType) → openrl.envs.wrappers.base_wrapper.ArrayType[源代码]¶

Returns a modified environment reward.

Args:: reward: The env step() reward
Returns:: The modified reward

step(action: gymnasium.core.ActType) → Tuple[gymnasium.core.ObsType, SupportsFloat, bool, bool, Dict[str, Any]][源代码]¶: Modifies the env step() reward using self.reward().

class openrl.envs.wrappers.BaseWrapper(env, cfg=None, reward_class=None)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property agent_num¶

property env_name¶

property has_auto_reset¶

set_render_mode(render_mode: Union[None, str])[源代码]¶

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

property use_monitor¶

class openrl.envs.wrappers.DictWrapper(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation)[源代码]¶

Returns a modified observation.

Args:: observation: The env observation
Returns:: The modified observation

class openrl.envs.wrappers.FlattenObservation(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

observation(observation)[源代码]¶

Flattens an observation.

Args:: observation: The observation to flatten
Returns:: The flattened observation

class openrl.envs.wrappers.GIFWrapper(env, gif_path: str, fps: int = 30)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

close()[源代码]¶: Closes the wrapper and env.

reset(**kwargs)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.MoveActionMask2InfoWrapper(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

reset(**kwargs)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.

class openrl.envs.wrappers.RemoveTruncated(env: gymnasium.core.Env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

step(action)[源代码]¶

Steps through the environment, returning 5 or 4 items depending on output_truncation_bool.

Args:: action: action to step through the environment with
Returns:: (observation, reward, terminated, truncated, info) or (observation, reward, done, info)

class openrl.envs.wrappers.Single2MultiAgentWrapper(env)[源代码]¶

基类：gymnasium.core.Env[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType], Generic[gymnasium.core.WrapperObsType, gymnasium.core.WrapperActType, gymnasium.core.ObsType, gymnasium.core.ActType]

property agent_num¶

reset(*, seed=None, options=None)[源代码]¶: Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[源代码]¶: Uses the step() of the env that can be overwritten to change the returned data.