Utils#

Visualization#

class gym.utils.play.PlayableGame(env: Env, keys_to_action: Dict[Tuple[int, ...], int] | None = None, zoom: float | None = None)#

Wraps an environment allowing keyboard inputs to interact with the environment.

__init__(env: Env, keys_to_action: Dict[Tuple[int, ...], int] | None = None, zoom: float | None = None)#

Wraps an environment with a dictionary of keyboard buttons to action and if to zoom in on the environment.

Parameters:
  • env – The environment to play

  • keys_to_action – The dictionary of keyboard tuples and action value

  • zoom – If to zoom in on the environment render

process_event(event: Event)#

Processes a PyGame event.

In particular, this function is used to keep track of which buttons are currently pressed and to exit the play() function when the PyGame window is closed.

Parameters:

event – The event to process

class gym.utils.play.PlayPlot(callback: callable, horizon_timesteps: int, plot_names: List[str])#

Provides a callback to create live plots of arbitrary metrics when using play().

This class is instantiated with a function that accepts information about a single environment transition:
  • obs_t: observation before performing action

  • obs_tp1: observation after performing action

  • action: action that was executed

  • rew: reward that was received

  • terminated: whether the environment is terminated or not

  • truncated: whether the environment is truncated or not

  • info: debug info

It should return a list of metrics that are computed from this data. For instance, the function may look like this:

>>> def compute_metrics(obs_t, obs_tp, action, reward, terminated, truncated, info):
...     return [reward, info["cumulative_reward"], np.linalg.norm(action)]

PlayPlot provides the method callback() which will pass its arguments along to that function and uses the returned values to update live plots of the metrics.

Typically, this callback() will be used in conjunction with play() to see how the metrics evolve as you play:

>>> plotter = PlayPlot(compute_metrics, horizon_timesteps=200,
...                    plot_names=["Immediate Rew.", "Cumulative Rew.", "Action Magnitude"])
>>> play(your_env, callback=plotter.callback)
__init__(callback: callable, horizon_timesteps: int, plot_names: List[str])#

Constructor of PlayPlot.

The function callback that is passed to this constructor should return a list of metrics that is of length len(plot_names).

Parameters:
  • callback – Function that computes metrics from environment transitions

  • horizon_timesteps – The time horizon used for the live plots

  • plot_names – List of plot titles

Raises:

DependencyNotInstalled – If matplotlib is not installed

callback(obs_t: ObsType, obs_tp1: ObsType, action: ActType, rew: float, terminated: bool, truncated: bool, info: dict)#

The callback that calls the provided data callback and adds the data to the plots.

Parameters:
  • obs_t – The observation at time step t

  • obs_tp1 – The observation at time step t+1

  • action – The action

  • rew – The reward

  • terminated – If the environment is terminated

  • truncated – If the environment is truncated

  • info – The information from the environment

gym.utils.play.display_arr(screen: Surface, arr: ndarray, video_size: Tuple[int, int], transpose: bool)#

Displays a numpy array on screen.

Parameters:
  • screen – The screen to show the array on

  • arr – The array to show

  • video_size – The video size of the screen

  • transpose – If to transpose the array on the screen

gym.utils.play.play(env: Env, transpose: bool | None = True, fps: int | None = None, zoom: float | None = None, callback: Callable | None = None, keys_to_action: Dict[Tuple[str | int] | str, ActType] | None = None, seed: int | None = None, noop: ActType = 0)#

Allows one to play the game using keyboard.

Example:

>>> import gym
>>> from gym.utils.play import play
>>> play(gym.make("CarRacing-v1", render_mode="rgb_array"), keys_to_action={
...                                                "w": np.array([0, 0.7, 0]),
...                                                "a": np.array([-1, 0, 0]),
...                                                "s": np.array([0, 0, 1]),
...                                                "d": np.array([1, 0, 0]),
...                                                "wa": np.array([-1, 0.7, 0]),
...                                                "dw": np.array([1, 0.7, 0]),
...                                                "ds": np.array([1, 0, 1]),
...                                                "as": np.array([-1, 0, 1]),
...                                               }, noop=np.array([0,0,0]))

Above code works also if the environment is wrapped, so it’s particularly useful in verifying that the frame-level preprocessing does not render the game unplayable.

If you wish to plot real time statistics as you play, you can use gym.utils.play.PlayPlot. Here’s a sample code for plotting the reward for last 150 steps.

>>> def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):
...        return [rew,]
>>> plotter = PlayPlot(callback, 150, ["reward"])
>>> play(gym.make("ALE/AirRaid-v5"), callback=plotter.callback)
Parameters:
  • env – Environment to use for playing.

  • transpose – If this is True, the output of observation is transposed. Defaults to True.

  • fps – Maximum number of steps of the environment executed every second. If None (the default), env.metadata["render_fps""] (or 30, if the environment does not specify “render_fps”) is used.

  • zoom – Zoom the observation in, zoom amount, should be positive float

  • callback – If a callback is provided, it will be executed after every step. It takes the following input: obs_t: observation before performing action obs_tp1: observation after performing action action: action that was executed rew: reward that was received terminated: whether the environment is terminated or not truncated: whether the environment is truncated or not info: debug info

  • keys_to_action

    Mapping from keys pressed to action performed. Different formats are supported: Key combinations can either be expressed as a tuple of unicode code points of the keys, as a tuple of characters, or as a string where each character of the string represents one key. For example if pressing ‘w’ and space at the same time is supposed to trigger action number 2 then key_to_action dict could look like this:

    >>> {
    ...    # ...
    ...    (ord('w'), ord(' ')): 2
    ...    # ...
    ... }
    
    or like this:
    >>> {
    ...    # ...
    ...    ("w", " "): 2
    ...    # ...
    ... }
    
    or like this:
    >>> {
    ...    # ...
    ...    "w ": 2
    ...    # ...
    ... }
    

    If None, default key_to_action mapping for that environment is used, if provided.

  • seed – Random seed used when resetting the environment. If None, no seed is used.

  • noop – The action used when no key input has been entered, or the entered key combination is unknown.

Save Rendering Videos#

gym.utils.save_video.capped_cubic_video_schedule(episode_id: int) bool#

The default episode trigger.

This function will trigger recordings at the episode indices 0, 1, 4, 8, 27, …, \(k^3\), …, 729, 1000, 2000, 3000, …

Parameters:

episode_id – The episode number

Returns:

If to apply a video schedule number

gym.utils.save_video.save_video(frames: list, video_folder: str, episode_trigger: Callable[[int], bool] | None = None, step_trigger: Callable[[int], bool] | None = None, video_length: int | None = None, name_prefix: str = 'rl-video', episode_index: int = 0, step_starting_index: int = 0, **kwargs)#

Save videos from rendering frames.

This function extract video from a list of render frame episodes.

Parameters:
  • frames (List[RenderFrame]) – A list of frames to compose the video.

  • video_folder (str) – The folder where the recordings will be stored

  • episode_trigger – Function that accepts an integer and returns True iff a recording should be started at this episode

  • step_trigger – Function that accepts an integer and returns True iff a recording should be started at this step

  • video_length (int) – The length of recorded episodes. If it isn’t specified, the entire episode is recorded. Otherwise, snippets of the specified length are captured.

  • name_prefix (str) – Will be prepended to the filename of the recordings.

  • episode_index (int) – The index of the current episode.

  • step_starting_index (int) – The step index of the first frame.

  • **kwargs – The kwargs that will be passed to moviepy’s ImageSequenceClip. You need to specify either fps or duration.

Example

>>> import gym
>>> from gym.utils.save_video import save_video
>>> env = gym.make("FrozenLake-v1", render_mode="rgb_array_list")
>>> env.reset()
>>> step_starting_index = 0
>>> episode_index = 0
>>> for step_index in range(199):
...    action = env.action_space.sample()
...    _, _, done, _ = env.step(action)
...    if done:
...       save_video(
...          env.render(),
...          "videos",
...          fps=env.metadata["render_fps"],
...          step_starting_index=step_starting_index,
...          episode_index=episode_index
...       )
...       step_starting_index = step_index + 1
...       episode_index += 1
...       env.reset()
>>> env.close()

Old to New Step API Compatibility#

gym.utils.step_api_compatibility.convert_to_terminated_truncated_step_api(step_returns: Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, bool | ndarray, dict | list], is_vector_env=False) Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, bool | ndarray, dict | list]#

Function to transform step returns to new step API irrespective of input API.

Parameters:
  • step_returns (tuple) – Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)

  • is_vector_env (bool) – Whether the step_returns are from a vector environment

gym.utils.step_api_compatibility.convert_to_done_step_api(step_returns: Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, dict | list], is_vector_env: bool = False) Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, dict | list]#

Function to transform step returns to old step API irrespective of input API.

Parameters:
  • step_returns (tuple) – Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)

  • is_vector_env (bool) – Whether the step_returns are from a vector environment

gym.utils.step_api_compatibility.step_api_compatibility(step_returns: Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, dict | list], output_truncation_bool: bool = True, is_vector_env: bool = False) Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, bool | ndarray, dict | list] | Tuple[ObsType | ndarray, float | ndarray, bool | ndarray, dict | list]#

Function to transform step returns to the API specified by output_truncation_bool bool.

Done (old) step API refers to step() method returning (observation, reward, done, info) Terminated Truncated (new) step API refers to step() method returning (observation, reward, terminated, truncated, info) (Refer to docs for details on the API change)

Parameters:
  • step_returns (tuple) – Items returned by step(). Can be (obs, rew, done, info) or (obs, rew, terminated, truncated, info)

  • output_truncation_bool (bool) – Whether the output should return two booleans (new API) or one (old) (True by default)

  • is_vector_env (bool) – Whether the step_returns are from a vector environment

Returns:

step_returns (tuple) – Depending on output_truncation_bool bool, it can return (obs, rew, done, info) or (obs, rew, terminated, truncated, info)

Examples

This function can be used to ensure compatibility in step interfaces with conflicting API. Eg. if env is written in old API,

wrapper is written in new API, and the final step output is desired to be in old API.

>>> obs, rew, done, info = step_api_compatibility(env.step(action), output_truncation_bool=False)
>>> obs, rew, terminated, truncated, info = step_api_compatibility(env.step(action), output_truncation_bool=True)
>>> observations, rewards, dones, infos = step_api_compatibility(vec_env.step(action), is_vector_env=True)

Seeding#

gym.utils.seeding.np_random(seed: int | None = None) Tuple[Generator, Any]#

Generates a random number generator from the seed and returns the Generator and seed.

Parameters:

seed – The seed used to create the generator

Returns:

The generator and resulting seed

Raises:

Error – Seed must be a non-negative integer or omitted

Environment Checking#

Invasive#

gym.utils.env_checker.data_equivalence(data_1, data_2) bool#

Assert equality between data 1 and 2, i.e observations, actions, info.

Parameters:
  • data_1 – data structure 1

  • data_2 – data structure 2

Returns:

If observation 1 and 2 are equivalent

gym.utils.env_checker.check_reset_seed(env: Env)#

Check that the environment can be reset with a seed.

Parameters:

env – The environment to check

Raises:

AssertionError – The environment cannot be reset with a random seed, even though seed or kwargs appear in the signature.

gym.utils.env_checker.check_reset_options(env: Env)#

Check that the environment can be reset with options.

Parameters:

env – The environment to check

Raises:

AssertionError – The environment cannot be reset with options, even though options or kwargs appear in the signature.

gym.utils.env_checker.check_reset_return_info_deprecation(env: Env)#

Makes sure support for deprecated return_info argument is dropped.

Parameters:

env – The environment to check

Raises:

UserWarning

gym.utils.env_checker.check_seed_deprecation(env: Env)#

Makes sure support for deprecated function seed is dropped.

Parameters:

env – The environment to check

Raises:

UserWarning

gym.utils.env_checker.check_reset_return_type(env: Env)#

Checks that reset() correctly returns a tuple of the form (obs , info).

Parameters:

env – The environment to check

Raises:

AssertionError depending on spec violation

gym.utils.env_checker.check_space_limit(space, space_type: str)#

Check the space limit for only the Box space as a test that only runs as part of check_env.

gym.utils.env_checker.check_env(env: Env, warn: bool | None = None, skip_render_check: bool = False)#

Check that an environment follows Gym API.

This is an invasive function that calls the environment’s reset and step.

This is particularly useful when using a custom environment. Please take a look at https://www.gymlibrary.dev/content/environment_creation/ for more information about the API.

Parameters:
  • env – The Gym environment that will be checked

  • warn – Ignored

  • skip_render_check – Whether to skip the checks for the render method. True by default (useful for the CI)

Passive#

gym.utils.passive_env_checker.check_space(space: Space, space_type: str, check_box_space_fn: Callable[[Box], None])#

A passive check of the environment action space that should not affect the environment.

gym.utils.passive_env_checker.check_obs(obs, observation_space: Space, method_name: str)#

Check that the observation returned by the environment correspond to the declared one.

Parameters:
  • obs – The observation to check

  • observation_space – The observation space of the observation

  • method_name – The method name that generated the observation

gym.utils.passive_env_checker.env_reset_passive_checker(env, **kwargs)#

A passive check of the Env.reset function investigating the returning reset information and returning the data unchanged.

gym.utils.passive_env_checker.env_step_passive_checker(env, action)#

A passive check for the environment step, investigating the returning data then returning the data unchanged.

gym.utils.passive_env_checker.env_render_passive_checker(env, *args, **kwargs)#

A passive check of the Env.render that the declared render modes/fps in the metadata of the environment is declared.