# It is a line of uniquely appearing states with a goal at both ends. discount: 1.0 values: reward states: 4 actions: left right observations: nothing T: left 1.0 0.0 0.0 0.0 0.8 0.1 0.1 0.0 0.0 0.8 0.1 0.1 0.0 0.0 0.0 1.0 T: right 1.0 0.0 0.0 0.0 0.1 0.1 0.8 0.0 0.0 0.1 0.1 0.8 0.0 0.0 0.0 1.0 O: * : * : * 1.0 R: left : 1 : 0 : * 1.0 R: right : 2 : 3 : * 1.0