Suggested further readings#
Overview#
Sutton, R. S., and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Links to neuroscience#
Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275(5306): 1593-1599. doi: 10.1126/science.275.5306.1593 (preprint: cs.utexas.edu/~dana/Reward.pdf ).
Daw, N. D., Niv, Y., and Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience 8(12): 1704-1711. doi: 10.1038/nn1560 .
Dayan, P., and Niv, Y. (2008). Reinforcement learning: the good, the bad and the ugly. Current opinion in neurobiology 18(2): 185-196. doi: 10.1016/j.conb.2008.08.003 .
Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., … and Botvinick, M. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature neuroscience 21(6): 860-868. doi: 10.1038/s41593-018-0147-8 (preprint: biorXiv doi: 10.1101/295964 ).
Mattar, M. G., and Daw, N. D. (2018). Prioritized memory access explains planning and hippocampal replay. Nature neuroscience 21(11): 1609-1617. doi: 10.1038/s41593-018-0232-z (postprint: europepmc.org/articles/pmc6203620 ).
State of the art#
Dabney, W., Kurth-Nelson, Z., Uchida, N., Starkweather, C. K., Hassabis, D., Munos, R., and Botvinick, M. (2020). A distributional code for value in dopamine-based reinforcement learning. Nature 577(7792): 671-675. doi: 10.1038/s41586-019-1924-6 (postprint: europepmc.org/articles/pmc7476215 ).
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature 518(7540): 529-533. doi: 10.1038/nature14236 .
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature 529(7587): 484-489. doi: 10.1038/nature16961 .