Using computational models, the researchers studied how the brain’s reward-learning system functions in those with depression ...
Thus, dopamine neurons code errors in the prediction of both the occurrence and the time of rewards. In this respect, their responses resemble the teaching signals that have been employed in ...
力大砖飞,简洁优雅。 我觉得最大的价值是证明了:基于一个很强的模型(deepseekv3-base),用最简单的rule-based reward来做rl,经过大量训练(8k steps * bs 512/1024),也能达到目前reasoning model的sota。
A brain signal that lights up when we anticipate rewards may hold the secret to helping people overcome depression, and Virginia Tech researchers are working to unlock its potential.
New research highlights how the brain's reward-learning system can guide personalized treatments for depression.