Tampere University of Technology

TUTCRIS Research Portal

Projections for Approximate Policy Iteration Algorithms

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

Projections for Approximate Policy Iteration Algorithms. / Akrour, Riad; Pajarinen, Joni; Neumann, Gerhard; Peters, Jan.

36th International Conference on Machine Learning, ICML 2019. PMLR, 2019. p. 267-276.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Akrour, R, Pajarinen, J, Neumann, G & Peters, J 2019, Projections for Approximate Policy Iteration Algorithms. in 36th International Conference on Machine Learning, ICML 2019. PMLR, pp. 267-276, International Conference on Machine Learning, Long Beach, United States, 9/06/19.

APA

Akrour, R., Pajarinen, J., Neumann, G., & Peters, J. (2019). Projections for Approximate Policy Iteration Algorithms. In 36th International Conference on Machine Learning, ICML 2019 (pp. 267-276). PMLR.

Vancouver

Akrour R, Pajarinen J, Neumann G, Peters J. Projections for Approximate Policy Iteration Algorithms. In 36th International Conference on Machine Learning, ICML 2019. PMLR. 2019. p. 267-276

Author

Akrour, Riad ; Pajarinen, Joni ; Neumann, Gerhard ; Peters, Jan. / Projections for Approximate Policy Iteration Algorithms. 36th International Conference on Machine Learning, ICML 2019. PMLR, 2019. pp. 267-276

Bibtex - Download

@inproceedings{df86223456014e499de0db3b3bfc4233,
title = "Projections for Approximate Policy Iteration Algorithms",
abstract = "Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.",
author = "Riad Akrour and Joni Pajarinen and Gerhard Neumann and Jan Peters",
year = "2019",
month = "6",
language = "English",
pages = "267--276",
booktitle = "36th International Conference on Machine Learning, ICML 2019",
publisher = "PMLR",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Projections for Approximate Policy Iteration Algorithms

AU - Akrour, Riad

AU - Pajarinen, Joni

AU - Neumann, Gerhard

AU - Peters, Jan

PY - 2019/6

Y1 - 2019/6

N2 - Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

AB - Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

M3 - Conference contribution

SP - 267

EP - 276

BT - 36th International Conference on Machine Learning, ICML 2019

PB - PMLR

ER -