TY - GEN
T1 - Some experiments in automated identification of Australian plants using convolutional neural networks
AU - Boston, Tony
AU - Van Dijk, Albert
N1 - Publisher Copyright:
Copyright © 2019 The Modelling and Simulation Society of Australia and New Zealand Inc. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Accurate plant identification is a skill that generally requires considerable knowledge and advanced training. However, plant identification is useful to a broad range of people within society, from conservationists and farmers to citizen scientists. Access to accurate, widely available knowledge about the identity and distribution of living species is critical for biodiversity conservation and sustainable development. Automated plant identification has undergone major advances since 2012 with the application of convolutional neural networks (CNNs) from the emerging field of deep learning. This branch of machine learning has shown remarkable accuracy in image classification and visual object recognition when applied to still images through competitions such as the ImageNet Large Scale Visual Recognition Challenge. This research project used transfer learning to fine-tune pre-trained deep learning CNNs originally developed for the ImageNet challenge, such as Inception and ResNet, which are publicly available through Tensorflow Hub. The models were applied to the automated identification of images of plants extracted from the Australian National Botanic Gardens Australian Plant Image Index and validated using additional images from the Atlas of Living Australia (ALA) and other Internet sources. A comparison of model performance was undertaken using three different datasets: Whole plant images (9,612 images of 392 species with at least 20 images per species), images of flowers (3,384 images of 271 species with at least 10 images per species) and scanning electron microscopy images of liverwort spores from Fossombronia spp. (322 images of 12 species with at least 10 images per species). To decrease the risk of overfitting and extend the training dataset, data augmentation techniques such as scaling and reflection were tested to identify a high performing method, which also improved overall model performance. The best performing model for the All-plants (80.6% accuracy) and Flower datasets (88.4% accuracy) was Inception-V3 pre-trained on the iNaturalist dataset of plants and animals. For the Fossombronia spp. dataset, the best performing model (81.2% accuracy) was ResNet-V2-50 pre-trained on ImageNet 2012, using the 50-layer implementation of ResNet-V2. The best performing flower identification model was also shown to have some proficiency in identifying the genus of an unknown species, where the genus but not species was represented in the dataset, with a Top-5 accuracy of 66%. The Flower dataset's best model performance was further tested using 1,000 images (20 images of 50 randomly selected species) downloaded from the Atlas of Living Australia and the Internet which produced a Top-1 accuracy of 85.9%. Questions that remain to be addressed include further testing of data augmentation approaches and more comprehensive analysis to exclude overfitting. An interesting future extension of this study would be to train the best performing model on a larger dataset of Australian plant images, which could be used to aid scientists and the general public in identifying unknown species through image upload using an online website or phone app.
AB - Accurate plant identification is a skill that generally requires considerable knowledge and advanced training. However, plant identification is useful to a broad range of people within society, from conservationists and farmers to citizen scientists. Access to accurate, widely available knowledge about the identity and distribution of living species is critical for biodiversity conservation and sustainable development. Automated plant identification has undergone major advances since 2012 with the application of convolutional neural networks (CNNs) from the emerging field of deep learning. This branch of machine learning has shown remarkable accuracy in image classification and visual object recognition when applied to still images through competitions such as the ImageNet Large Scale Visual Recognition Challenge. This research project used transfer learning to fine-tune pre-trained deep learning CNNs originally developed for the ImageNet challenge, such as Inception and ResNet, which are publicly available through Tensorflow Hub. The models were applied to the automated identification of images of plants extracted from the Australian National Botanic Gardens Australian Plant Image Index and validated using additional images from the Atlas of Living Australia (ALA) and other Internet sources. A comparison of model performance was undertaken using three different datasets: Whole plant images (9,612 images of 392 species with at least 20 images per species), images of flowers (3,384 images of 271 species with at least 10 images per species) and scanning electron microscopy images of liverwort spores from Fossombronia spp. (322 images of 12 species with at least 10 images per species). To decrease the risk of overfitting and extend the training dataset, data augmentation techniques such as scaling and reflection were tested to identify a high performing method, which also improved overall model performance. The best performing model for the All-plants (80.6% accuracy) and Flower datasets (88.4% accuracy) was Inception-V3 pre-trained on the iNaturalist dataset of plants and animals. For the Fossombronia spp. dataset, the best performing model (81.2% accuracy) was ResNet-V2-50 pre-trained on ImageNet 2012, using the 50-layer implementation of ResNet-V2. The best performing flower identification model was also shown to have some proficiency in identifying the genus of an unknown species, where the genus but not species was represented in the dataset, with a Top-5 accuracy of 66%. The Flower dataset's best model performance was further tested using 1,000 images (20 images of 50 randomly selected species) downloaded from the Atlas of Living Australia and the Internet which produced a Top-1 accuracy of 85.9%. Questions that remain to be addressed include further testing of data augmentation approaches and more comprehensive analysis to exclude overfitting. An interesting future extension of this study would be to train the best performing model on a larger dataset of Australian plant images, which could be used to aid scientists and the general public in identifying unknown species through image upload using an online website or phone app.
KW - Convolutional neural networks
KW - Deep learning
KW - Plant identification
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85086461023&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019
SP - 15
EP - 21
BT - 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making
A2 - Elsawah, S.
PB - Modelling and Simulation Society of Australia and New Zealand Inc (MSSANZ)
T2 - 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019
Y2 - 1 December 2019 through 6 December 2019
ER -