TY - GEN
T1 - MOOR
T2 - 24th International Congress on Modelling and Simulation, MODSIM 2021
AU - Ju, Jun
AU - Kurniawati, Hanna
AU - Kroese, Dirk
AU - Ye, Nan
N1 - Publisher Copyright:
© 2021 Proceedings of the International Congress on Modelling and Simulation, MODSIM. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Fisheries play multi-faceted roles in our society, economy, and environment, and the management decisions often involve competing driving forces. The need to account for multiple and possibly objectives make sustainable fishery management a highly challenging task. This is further compounded by the large amount of uncertainties present in the problem: in particular, our knowledge of the fishery system is limited, and the state of the fishery system is not directly observable. The Partially Observable Markov Decision Processes (POMDPs) - a general principled framework for sequential decision making for partially observable environments - is well-suited for sustainable fishery management: it is able to account for the long-term effect of actions, and it can conveniently take uncertainties into account. A few recent works have explored the potential of using POMDPs for sustainable fishery management. In this paper, we leverage recent advances in two sub-fields of machine learning, namely, deep learning and reinforcement learning, to develop a novel approach for sustainable fishery management using POMDPs. We first propose an offline reinforcement learning approach for sustainable fishery management. While typical reinforcement learning approaches learn an optimal policy by directly interacting with the environment, offline reinforcement learning approaches learn an optimal policy using a dataset of past interactions with the environment. The use of past data instead of direct interventions is a highly desirable feature for fishery management - this has been exploited in the literature of management strategy evaluation too. We believe this perspective will allow us to tap into recent advances in offline reinforcement learning. Our second contribution is a new algorithm, MOOR, which stands for MOdel-based Offline Reinforcement learning algorithm for sustainable fishery management. MOOR first learns a POMDP fishery dynamics model using catch and effort data, and then solves the POMDP using a state-of-the-art solver. In the model learning step, we view the POMDP fishery dynamics model as a recurrent neural net (RNN), and leverage RNN learning techniques to learn the model. This presents some new challenges, but we show that these can be overcome with a few tricks to yield a very effective learning algorithm. Finally, MOOR demonstrates strong performance in preliminary simulation studies. The learned models are generally very similar to the true models. In addition, the management policies obtained using the learned models perform similarly as the optimal management policies for the true models. While previous POMDP studies for fishery management evaluate policy performance in the learned model, we evaluate the policy in the true model, thus our results suggest that it is possible to develop a POMDP approach that can be robust against mild model learning error. Moreover, although this paper focuses on fisheries applications, the approach is general enough for other problems where the dynamics are nonlinear, though further research are needed to understand the extent and efficiency of the method on other domains. Our source code will be made available after the publication of the work.
AB - Fisheries play multi-faceted roles in our society, economy, and environment, and the management decisions often involve competing driving forces. The need to account for multiple and possibly objectives make sustainable fishery management a highly challenging task. This is further compounded by the large amount of uncertainties present in the problem: in particular, our knowledge of the fishery system is limited, and the state of the fishery system is not directly observable. The Partially Observable Markov Decision Processes (POMDPs) - a general principled framework for sequential decision making for partially observable environments - is well-suited for sustainable fishery management: it is able to account for the long-term effect of actions, and it can conveniently take uncertainties into account. A few recent works have explored the potential of using POMDPs for sustainable fishery management. In this paper, we leverage recent advances in two sub-fields of machine learning, namely, deep learning and reinforcement learning, to develop a novel approach for sustainable fishery management using POMDPs. We first propose an offline reinforcement learning approach for sustainable fishery management. While typical reinforcement learning approaches learn an optimal policy by directly interacting with the environment, offline reinforcement learning approaches learn an optimal policy using a dataset of past interactions with the environment. The use of past data instead of direct interventions is a highly desirable feature for fishery management - this has been exploited in the literature of management strategy evaluation too. We believe this perspective will allow us to tap into recent advances in offline reinforcement learning. Our second contribution is a new algorithm, MOOR, which stands for MOdel-based Offline Reinforcement learning algorithm for sustainable fishery management. MOOR first learns a POMDP fishery dynamics model using catch and effort data, and then solves the POMDP using a state-of-the-art solver. In the model learning step, we view the POMDP fishery dynamics model as a recurrent neural net (RNN), and leverage RNN learning techniques to learn the model. This presents some new challenges, but we show that these can be overcome with a few tricks to yield a very effective learning algorithm. Finally, MOOR demonstrates strong performance in preliminary simulation studies. The learned models are generally very similar to the true models. In addition, the management policies obtained using the learned models perform similarly as the optimal management policies for the true models. While previous POMDP studies for fishery management evaluate policy performance in the learned model, we evaluate the policy in the true model, thus our results suggest that it is possible to develop a POMDP approach that can be robust against mild model learning error. Moreover, although this paper focuses on fisheries applications, the approach is general enough for other problems where the dynamics are nonlinear, though further research are needed to understand the extent and efficiency of the method on other domains. Our source code will be made available after the publication of the work.
KW - Offline reinforcement learning
KW - POMDPs
KW - fishery management
UR - http://www.scopus.com/inward/record.url?scp=85161496347&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the International Congress on Modelling and Simulation, MODSIM
SP - 771
EP - 777
BT - Proceedings of the 24th International Congress on Modelling and Simulation, MODSIM 2021
A2 - Vervoort, R. Willem
A2 - Voinov, A. Alexey
A2 - Evans, Jason P.
A2 - Marshall, Lucy
PB - Modelling and Simulation Society of Australia and New Zealand Inc (MSSANZ)
Y2 - 5 December 2021 through 10 December 2021
ER -