openrl.utils package¶

Subpackages¶

openrl.utils.callbacks package

Submodules¶

openrl.utils.custom_data_structure module¶

class openrl.utils.custom_data_structure.ListDict[源代码]¶

基类：collections.OrderedDict

append(key: str, value: Any)[源代码]¶

get_by_index(index)[源代码]¶

openrl.utils.evaluation module¶

openrl.utils.evaluation.evaluate_policy(agent: openrl.utils.type_aliases.AgentActor, env: Union[gym.core.Env, openrl.envs.vec_env.base_venv.BaseVecEnv], n_eval_episodes: int = 10, deterministic: bool = True, render: bool = False, callback: Optional[Callable[[Dict[str, Any], Dict[str, Any]], None]] = None, reward_threshold: Optional[float] = None, return_episode_rewards: bool = False, warn: bool = True) → Union[Tuple[numpy.ndarray, numpy.ndarray], Tuple[float, float], Tuple[List[float], List[int]]][源代码]¶

Runs policy for n_eval_episodes episodes and returns average reward. If a vector env is passed in, this divides the episodes to evaluate onto the different elements of the vector env. This static division of work is done to remove bias.

注解

If environment has not been wrapped with Monitor wrapper, reward and episode lengths are counted as it appears with env.step calls. If the environment contains wrappers that modify rewards or episode lengths (e.g. reward scaling, early episode reset), these will affect the evaluation results as well. You can avoid this by wrapping environment with Monitor wrapper before anything else.

参数

agent -- The RL agent you want to evaluate. This can be any object that implements a predict method, such as an RL algorithm (BaseAlgorithm) or policy (BasePolicy).
env -- The gym environment or BaseVecEnv environment.
n_eval_episodes -- Number of episode to evaluate the agent
deterministic -- Whether to use deterministic or stochastic actions
render -- Whether to render the environment or not
callback -- callback function to do additional checks, called after each step. Gets locals() and globals() passed as parameters.
reward_threshold -- Minimum expected reward per episode, this will raise an error if the performance is not met
return_episode_rewards -- If True, a list of rewards and episode lengths per episode will be returned instead of the mean.
warn -- If True (default), warns user about lack of a Monitor wrapper in the evaluation environment.

返回

Mean reward per episode, std of reward per episode. Returns ([float], [int]) when return_episode_rewards is True, first list containing per-episode rewards and second containing per-episode lengths (in number of steps).

openrl.utils.file_tool module¶

openrl.utils.file_tool.copy_files(source_files: Union[List[str], List[pathlib.Path]], target_dir: Union[str, pathlib.Path])[源代码]¶

openrl.utils.file_tool.link_files(source_files: Union[List[str], List[pathlib.Path]], target_dir: Union[str, pathlib.Path])[源代码]¶

openrl.utils.logger module¶

class openrl.utils.logger.Logger(cfg, project_name: str = 'openrl', scenario_name: str = 'openrl', wandb_entity: str = 'openrl', exp_name: Optional[str] = None, log_path: Optional[str] = None, use_wandb: bool = False, use_tensorboard: bool = False, log_level: int = 10, log_to_terminal: bool = True)[源代码]¶

基类：object

close()[源代码]¶

info(msg: str)[源代码]¶

log_info(infos: Dict[str, Any], step: int) → None[源代码]¶

log_learner_info(leaner_id: int, infos: Dict[str, Any], step: int) → None[源代码]¶

openrl.utils.type_aliases module¶

Common aliases for type hints

class openrl.utils.type_aliases.AgentActor(*args, **kwargs)[源代码]¶

基类：Protocol

act(observation: Union[numpy.ndarray, Dict[str, numpy.ndarray]], deterministic: bool = False) → Tuple[numpy.ndarray, Optional[Tuple[numpy.ndarray, ...]]][源代码]¶

Get the policy action from an observation (and optional hidden state). Includes sugar-coating to handle different observations (e.g. normalizing images).

参数

observation -- the input observation
deterministic -- Whether to return deterministic actions.

返回

the model's action and the next hidden state (used in recurrent policies)

openrl.utils.util module¶

openrl.utils.util.check(input)[源代码]¶

openrl.utils.util.check_v2(input, use_half=False, tpdv=None)[源代码]¶

openrl.utils.util.get_system_info() → Dict[str, str][源代码]¶

Retrieve system and python env info for the current system.

返回: Dictionary summing up the version for each relevant package and a formatted string.

openrl.utils.util.set_seed(seed)[源代码]¶