This environment is part of the Box2D environments. Please read that page first for general information.
[1.5 1.5 5. 5. 3.14 5. 1. 1. ]
[-1.5 -1.5 -5. -5. -3.14 -5. -0. -0. ]
This environment is a classic rocket trajectory optimization problem. According to Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discrete actions: engine on or off.
There are two environment versions: discrete or continuous. The landing pad is always at coordinates (0,0). The coordinates are the first two numbers in the state vector. Landing outside of the landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt.
To see a heuristic landing, run:
There are four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine.
The state is an 8-dimensional vector: the coordinates of the lander in
y, its linear
y, its angle, its angular velocity, and two booleans
that represent whether each leg is in contact with the ground or not.
Reward for moving from the top of the screen to the landing pad and coming to rest is about 100-140 points. If the lander moves away from the landing pad, it loses reward. If the lander crashes, it receives an additional -100 points. If it comes to rest, it receives an additional +100 points. Each leg with ground contact is +10 points. Firing the main engine is -0.3 points each frame. Firing the side engine is -0.03 points each frame. Solved is 200 points.
The lander starts at the top center of the viewport with a random initial force applied to its center of mass.
The episode finishes if:
the lander crashes (the lander body gets in contact with the moon);
the lander gets outside of the viewport (
xcoordinate is greater than 1);
the lander is not awake. From the Box2D docs, a body which is not awake is a body which doesn’t move and doesn’t collide with any other body:
When Box2D determines that a body (or group of bodies) has come to rest, the body enters a sleep state which has very little CPU overhead. If a body is awake and collides with a sleeping body, then the sleeping body wakes up. Bodies will also wake up if a joint or contact attached to them is destroyed.
To use to the continuous environment, you need to specify the
continuous=True argument like below:
import gym env = gym.make( "LunarLander-v2", continuous: bool = False, gravity: float = -10.0, enable_wind: bool = False, wind_power: float = 15.0, turbulence_power: float = 1.5, )
continuous=True is passed, continuous actions (corresponding to the throttle of the engines) will be used and the
action space will be
Box(-1, +1, (2,), dtype=np.float32).
The first coordinate of an action determines the throttle of the main engine, while the second
coordinate specifies the throttle of the lateral boosters.
Given an action
np.array([main, lateral]), the main engine will be turned off completely if
main < 0 and the throttle scales affinely from 50% to 100% for
0 <= main <= 1 (in particular, the
main engine doesn’t work with less than 50% power).
-0.5 < lateral < 0.5, the lateral boosters will not fire at all. If
lateral < -0.5, the left
booster will fire, and if
lateral > 0.5, the right booster will fire. Again, the throttle scales affinely
from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
gravity dictates the gravitational constant, this is bounded to be within 0 and -12.
enable_wind=True is passed, there will be wind effects applied to the lander.
The wind is generated using the function
tanh(sin(2 k (t+C)) + sin(pi k (t+C))).
k is set to 0.01.
C is sampled randomly between -9999 and 9999.
wind_power dictates the maximum magnitude of linear wind applied to the craft. The recommended value for
wind_power is between 0.0 and 20.0.
turbulence_power dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for
turbulence_power is between 0.0 and 2.0.
v2: Count energy spent and in v0.24, added turbulance with wind power and turbulence_power parameters
v1: Legs contact with ground added in state vector; contact with ground give +10 reward points, and -10 if then lose contact; reward renormalized to 200; harder initial random push.
v0: Initial version
Created by Oleg Klimov