This environment is part of the Toy Text environments. Please read that page first for general information.
This is a simple implementation of the Gridworld Cliff reinforcement learning task.
Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction by Sutton and Barto.
With inspiration from: [https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py] (https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py)
The board is a 4x12 matrix, with (using NumPy matrix indexing):
[3, 0] as the start at bottom-left
[3, 11] as the goal at bottom-right
[3, 1..10] as the cliff at bottom-center
If the agent steps on the cliff, it returns to the start. An episode terminates when the agent reaches the goal.
There are 4 discrete deterministic actions:
0: move up
1: move right
2: move down
3: move left
There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results in the end of the episode). It remains all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.
Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.
v0: Initial version release