search for




 

Classification of Soybean [Glycine max (L.) Merr.] Seed Based on Deep Learning Using the YOLOv5 Model
Plant Breed. Biotech. 2022;10:75-80
Published online March 1, 2022
© 2022 Korean Society of Breeding Science.

Yu-Hyeon Park1, Tae-Hwan Jun1,2*

1Department of Plant Bioscience, Pusan National University, Miryang 50463, Korea
2Life and Industry Convergence Research Institute, Pusan National University, Miryang 50463, Korea
Corresponding author: Tae-Hwan Jun, thjun76@pusan.ac.kr, Tel: +82-55-350-5507, Fax: +82-55-350-5509
Received February 10, 2022; Revised February 17, 2022; Accepted February 17, 2022.
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
From an agricultural point of view, deep learning models can be used in a variety of way to study the agricultural properties of soybean. Object detection can be performed using image or video data on phenotypic traits of soybean. In this project, a study on the phenotype analysis about soybean seed was conducted by artificial intelligence (AI) based on the YOLOv5 model. In model summary, layers and parameters were calculated as 243 and 7020913, respectively. Means of average precision (mAP)@[0.5:0.95] was recorded as 0.835, 0.739, 0.785 for each class, and Daewonkong (DW) with yellow seed coat color was calculated as the highest value, and landrace with black seed coat color (NG2) revealed the lowest value. As a result of prediction performance in the confusion matrix, each class of DW, NG2, and inbreeding line with green seed coat color (NGT) showed significant correlation of true positive (TP) in the matrix with the same output value for the input value.
Keywords : Soybean, Seed, Classification, Object detection, Deep learning, Agronomic trait
INTRODUCTION

From an agricultural point of view, deep learning models can be used in a variety of way to study the agricultural properties of soybean (Glycine max (L.) Merr.). The way of examining agronomic traits through own man eyes is labor-intensive and time-consuming for farm managers. Therefore, it is desirable to use the labor and time efficiently by adopting deep learning-based digitization and automation. Object detection can be performed using image or video data on phenotypic traits on soybean (Rangarajan et al. 2018; Keya et al. 2020; Zhong et al. 2020). In addition, object detection analysis can be available to predict plant disease in agricultural manage-ments. In the soybean seed counting study using convolu-tional neural network (CNN), a temporal convolutional neural network (TCNN) based deep learning model was applied. In this experiment, labeling was performed on the seeds inside the soybean pod, and the number of soybean seeds among the pods displayed in the image was output (Li et al. 2019). Machine vision research conducted on corn seeds was implemented based on OpenCV 4.4.5 and support vector machine (SVM) in Visual Studio C++ environment. Visual research was performed on various phenotypes observed in 14 types of corn seed samples (Kiratiratanapruk et al. 2011). A CNN-based Object detection study for a broad-spectrum seed population was performed using the Canadian seed data set and the Cagliari seed data set. Artificial intelligence (AI) learning was performed on seeds with different shapes and colors (Loddo et al. 2021). Studies on the interpretation of CNN for phenotype data with various crops has been actively conducted across the world. Therefore, if soybean’s phenotype is analyzed through the CNN logic circuit, it will be useful in that it can be quantified on the data table as a visualization of the phenotype.

The phenotype data of the soybean image in the field can be used to predict the quantity through objects counting method using numpy 1.16.0 in Python (Li et al. 2019). Representatively, the “You Only Look Once” (YOLO) model (Gerovichev et al. 2021), the latest model among YOLO (Redmon et al. 2016) models using Faster Regional proposal-CNN (Faster-RCNN), has various uses in phenotype analysis. In this project, a study on the phenotype analysis about soybean seed was conducted by AI based on the YOLOv5 model. The YOLOv5 model was trained by providing annotations and image data for each seed species, and an AI model that performs phenotype classification for sample seeds was implemented. Using the model, it showed high prediction by detecting the seeds with the phenotype that fit for each annotation among the seeds scattered on the paper.

This study will serve as a cornerstone for various attempts in the phenotype analysis of agronomy in the future. Our results can provide valuable agricultural information to farmers through phenotype analysis, and deep learning model can be helpful for farm managers to cultivate crops efficiently.

MATERIALS AND METHODS

Soybean seed classification model

Daewonkong (DW) with yellow seed coat color, landrace with black seed coat color (NG2), and inbreeding line with green seed coat color (NGT) were used as soybean seed sample. Each seed shows similar size but each seed has a different color. Images of the samples were taken using a Go Pro Hero Black 8 camera in studio without outdoor light, and each image was used as an input data set for training the YOLOv5 model. The size of the data set for DW, NG2, and NGT consists of 200 images for each class, and the data set is divided into a training set, a validation set, and a test set. Object detection was performed using the test set for multiple objects combined with DW, NG2, and NGT on the paper and collected by 3, 5, and 7 repeated, respectively. The YOLOv5 model was implemented in the in the Google Colaboratory (Colab) environment serviced by Google. Learning was carried out with batch size: 16, epochs: 500.

Programs for annotation processing

The programs used in the annotation process are Labelimg 1.8.1 and RoboFlow. Labelimg is a graphical image annotation tool. It is a Python-based library and a program that accepts Qt to run a graphical interface. It is compatible with python 3.0 version or higher in Windows environment, and it can be downloaded from Python Notebook. Also, Labelimg can build from Linux, Unbuntu, Mac. Linux, Unbuntu, and Mac require at least python 2.6 version and run on PyQT 4.8 version. After creating a Rect box for the object on the background and giving a class, the dataset for the annotation is automatically created in the designated directory (Vogel et al. 2017). RoboFlow is a platform that provides a dataset in a format suitable for processing few dozen sample images. It provides the function to annotate by creating a Rect box according to the sample image by accessing the RoboFlow site without a separate download process. Because RoboFlow provides annotations in various formats and argumentation function, it would help to prevent overfitting during model learning and to save running times (Oh et al. 2009).

RESULTS AND DISCUSSION

Running time for soybean seed classification

The soybean classification based on the YOLOv5 model implemented in the Google Colab. In model summary, layers and parameters were calculated as 243 and 7020913, respectively. The time required for one learning was 2.7 s/Iteration, and the iteration per one epoch was 15 times. A total of 343 epochs were performed to generate model weights, and the running time was 3hours 51minutes 31seconds. Since there was no significant change in the loss score from 243 epochs to 343 epochs, so an early stopping process was performed and learning was automatically terminated at 343 epochs. After AI learning started, train/box_loss, train/obj_loss, and train/cls_loss at 0 epoch were calculated as 0.1258, 0.0162, and 0.0466, respectively. At 343 epochs after training was completed, 0.0254, 0.0072, and 0.0014 were calculated.

Soybean seed classification with each value

As the learning progressed, the loss curve was observed to decrease, while the error rate decreased and the model performance improved. Val/box_loss, val/obj_loss, and val/cls_loss were calculated as 0.0888, 0.0203, and 0.0296 at 0 epoch, respectively. After training was completed, 0.0187, 0.0083, and 0.0005 were calculated at 343 epochs. A gradual decrease in the loss curve was recorded in the validation test. Metrics/precision and metrics/recall were calculated as 0.0000 and 0.0000 at 0 epoch, respectively. At 343 epochs after training was completed, 0.9990 and 1.0000 were calculated in each value. Precision and recall increased as learning progressed, indicating that the results have high reliability.

After learning was completed, the means of average precision (mAP)@0.5 (mAP@0.5) value was calculated as 0.9950, and the mAP@[0.5: 0.95] value was calculated as 0.7861. Due to the operation of the early stopping process, it seems that the recording stopped while the mAP was drawing an upward curve. Thus, it is needed to increase the size of the data set used for model training through Argumentation. For all classes, precision was calculated as 1.00 at confidence 0.872, and recall was calculated as 0.00 at confidence 1.00. The F1 score was recorded as 1.00 when the confidence was 0.862, and the precision/recall was calculated as 0.995 for all classes (Fig. 1, 2). For seed classification, the precision of DW, NG2, and NGT was calculated as 0.999, 0.997, and 1, respectively, and all three species have high values. mAP@[0.5: 0.95] was recorded as 0.835, 0.739, 0.785, and DW was calculated as the highest value, and NG2 was the lowest value observed (Supplementary Fig. S1).

Figure 1. Time series curve of logged loss and means of average precision (mAP) data during training. The loss value, a performance indicator of the model, draws a decreasing curve, and the mAP score draws an increasing curve.
Figure 2. Seed classification results curve composed on tensorboard. The correlation between precision, recall, confidence, F1 score, and means of average precision (mAP) as a result of classification for three soybean seeds with different seed coat color (Daewonkong with yellow color; DW, landrace with black color; NG2, inbreeding line with green color; NGT) is calculated and visualized using curves and areas. F1-score indicates the harmonic mean of precision and recall.

Confusion matrix with each class

As a result of comparing the predicted and the true value to measure the prediction performance in the confusion matrix, each class of DW, NG2, and NGT showed significant correlation of true positive (TP) in the matrix with the same output value for the input value. In the case of NGT, NGT was predicted to have a correlation with background false positive (FP), but background FP is seen as false-positive with NGT because the value of the correlation coefficient is shown insignificantly. There was no confusion between the background and the soybean seed in the process of model performance, and DW, NG2, and NGT each class did not have confusion between the true value and the predicted value (Fig. 3, 4, Supplementary Fig. S2).

Figure 3. Heat map about confusion matrix with each class. To measure the performance of prediction in the confusion matrix, the predicted value and the true value are compared. FP stands for False Positive. Daewonkong (DW) with yellow seed coat color, landrace with black seed coat color (NG2), and inbreeding line with green seed coat color (NGT) were used as soybean seed sample.
Figure 4. Feature map of seed classification with mixed seeds. After mixing three soybean seeds with different seed coat color (Daewonkong with yellow color; DW, landrace with black color; NG2, inbreeding line with green color; NGT) on the background, and conduct classification using the seed classification model.

Phenotypic analysis of crop units through AI calculations

In general, phenotypic traits of plants have been investigated by visual inspection (Bortnem et al. 2003). As deep learning began to be introduced as a method of phenotypic analysis, quantitative visualization became possible for various agronomic traits. Deep learning-based phenotypic analysis was applied to crops in various ways, such as insect pest detection (Oh et al. 2009) and growth prediction (Hong et al. 2020), leaf disease classification caused by fungi, bacteria, and viruses (Hassan et al. 2021). The object recognition task for fruit (Koirala et al. 2020) and investigation of plant distribution through satellite imaging have also been introduced (Engen et al. 2021). However, the task of classification on the seeds of soybean has not been introduced yet. The seeds of crops show very different phenotypes depending on the species. In the case of soybean seeds, there are wide variations in volume, color, height, and width depending on the characteristics of the species. By applying deep learning to the phenotypic traits of soybean seeds, variations that are difficult to identify with the human eye can be analyzed from the perspective of AI. Phenotypic analysis about crop unit can suggest a new approach about development and growth in crop.

Supplemental Materials
pbb-10-1-75-supple.pdf
ACKNOWLEDGEMENTS

This work was supported by a 2-year Research Grant of Pusan National University.

References
  1. Alexander G, Achiad S, Vlad W, Avi BM, Tamar K, Chen K. 2021. High throughput data acquisition and deep learning for insect ecoinformatics. Front. Ecol. Evol. 9: 309.
    CrossRef
  2. Bortnem R, Boe A. 2003. Color index for red clover seed. Crop Sci. 43(6): 2279-2283.
    CrossRef
  3. Engen M, Sando E, Sjolander BLO, Arenberg S, Gupta RM, Goodwin M. 2021. Farm-Scale Crop Yield Prediction from Multi-Temporal Data Using Deep Hybrid Neural Networks. Agronomy 11(12): 2576.
    CrossRef
  4. Hassan S, M, Jasinski M, Leonowicz Z, Jasinska E, Maji AK. 2021. Plant Disease Identification Using Shallow Convolutional Neural Network. Agronomy 11(12): 2388.
    CrossRef
  5. Hong SJ, Kim SY, Kim EC, Lee CH, Lee JS, Lee DS, et al. 2020. Moth detection from pheromone trap images using deep learning object detectors. Agriculture 10(5): 170.
    CrossRef
  6. Keya, M, Majumdar B, Islam MS. 2020. A robust deep learning segmentation and identification approach of different bangladeshi plant seeds using CNN. 2020 11th ICCCNT. IEEE. pp.1-6.
    CrossRef
  7. Kiratiratanapruk, Kantip, Wasin S. 2011. Color and texture for corn seed classification by machine vision. 2011 ISPACS. IEEE. pp.1-5.
    CrossRef
  8. Koirala A, Walsh KB, Wang Z, Anderson N. 2020. Deep learning for mango (Mangifera indica) panicle stage classification. Agronomy 10(1): 143.
    CrossRef
  9. Loddo, Andrea, Cecilia DR. 2021. On the efficacy of handcrafted and deep features for seed image classifica-tion. J. Imaging 7(9): 171.
    Pubmed KoreaMed CrossRef
  10. Oh YJ, Cho SK, Kim KH, Paik CH, Cho YK, Kim HS, et al. 2009. Responses of Growth Characteristics of Soybean [Glycine max (L.) Merr.] Cultivars to Riptortus clavatus Thunberg (Hemiptera: Alydidae). Korean J. Breed. Sci. 41(4): 488-495.
  11. Rangarajan, Aravind K, Raja P, Aniirudh R. 2018. Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Comput. Sci. 133: 1040- 1047.
    CrossRef
  12. Redmon J, Divvala S, Girshick R, Farhadi A. 2016. You only look once: Unified, real-time object detection. 2016 CVPR. IEEE. pp.779-788.
    CrossRef
  13. Vogel P, Klooster T, Andrikopoulos V, Lungu M. 2017. A low-effort analytics platform for visualizing evolving Flask-based Python web services. 2017 VISSOFT. IEEE. pp.109-113.
    KoreaMed CrossRef
  14. Yue L, Jingdun J, Li Z, Abdul MK, Shi S, Wanlin G, et al. 2019. Soybean seed counting based on pod image using two-column convolution neural network. IEEE 7: 64177-64185.
    CrossRef
  15. Zhong, Yong, Zhao M. 2020. Research on deep learning in apple leaf disease recognition. Comput. Electron. Agric. 168: 105146.
    CrossRef


March 2022, 10 (1)
Full Text(PDF) Free
Supplementary File

Cited By Articles
  • CrossRef (0)

Funding Information

Social Network Service
Services
  • Science Central