We investigate the consequencesof assuming preferences are based upon partial return when they actually arisefrom regret. We argue that the learned function is an approximation of theoptimal advantage ...