# Inspired by Sondik's infinite-horizon paper # discount: 1 values: reward states: interested bored actions: tv radio nothing observations: want-to-go dont-want-to-go # transition probabilities T: tv : interested 0.9 0.1 T: tv : bored 0.6 0.4 T: radio : interested 0.8 0.2 T: radio : bored 0.3 0.7 T: nothing : interested 0.5 0.5 T: nothing : bored 0.1 0.9 # observations O: tv : interested 0.8 0.2 O: tv : bored 0.7 0.3 O: radio : interested 0.7 0.3 O: radio : bored 0.4 0.6 O: nothing : interested 0.9 0.1 O: nothing : bored 0.1 0.9 # cost structure R: tv : * : *: * -10 R: radio : 1 : *: * -4 R: nothing : * : *: * 0