Browse/search for people

Publication - Dr Tim Kovacs

    Variance-based Learning Classifier System without Convergence of Reward Estimation

    Citation

    Tatsumi, T, Komine, T, Nakata, M, Sato, H, Kovacs, TMD & Keiki, T, 2016, ‘Variance-based Learning Classifier System without Convergence of Reward Estimation’. in: GECCO '16 Companion: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. Association for Computing Machinery (ACM), New York, NY, USA, pp. 67-68

    Abstract

    Learning Classifier System (LCS) is an evolutionary machine learning method that is constituted by reinforcement learning and genetic algorithm. As an important feature of LCS, LCS can acquire generalized rules that match multiple states using # symbol. Among LCSs, Accuracy-based LCS (XCS) [4] can acquire\accurate"generalized rules by reducing the difference between the predicted reward and the acquired reward, but XCS is hard to correctly estimate such difference in noisy environments. To address this issue, our previous research proposed XCS-SAC (XCS with Self-adaptive Accuracy Criterion) for noisy environments. Since the estimated standard deviation of the rewards of the inaccurate rules is larger than that of the accurate ones, the fitness of rules in XCS-SAC is calculated according to the estimated standard deviation of the rewards.

    However, XCS-SAC needs to wait until convergence of the estimated standard deviation of all state-action pairs. This paper pays attention that the average value of rewards is distributed around a true value. To overcome this problem, this paper proposes XCS without Convergence of Reward Estimation (XCS-CRE) that can determine the accuracy of rules according to the distribution range of the average value of rewards of the matched state-action pair.

    Full details in the University publications repository