arisefrom - 搜索 News

Learning Optimal Advantage from Preferences and Mistaking it for Reward

We investigate the consequencesof assuming preferences are based upon partial return when they actually arisefrom regret. We argue that the learned function is an approximation of theoptimal advantage ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

反馈

今日热点