
From an agricultural point of view, deep learning models can be used in a variety of way to study the agricultural properties of soybean (
The phenotype data of the soybean image in the field can be used to predict the quantity through objects counting method using numpy 1.16.0 in Python (Li
This study will serve as a cornerstone for various attempts in the phenotype analysis of agronomy in the future. Our results can provide valuable agricultural information to farmers through phenotype analysis, and deep learning model can be helpful for farm managers to cultivate crops efficiently.
Daewonkong (DW) with yellow seed coat color, landrace with black seed coat color (NG2), and inbreeding line with green seed coat color (NGT) were used as soybean seed sample. Each seed shows similar size but each seed has a different color. Images of the samples were taken using a Go Pro Hero Black 8 camera in studio without outdoor light, and each image was used as an input data set for training the YOLOv5 model. The size of the data set for DW, NG2, and NGT consists of 200 images for each class, and the data set is divided into a training set, a validation set, and a test set. Object detection was performed using the test set for multiple objects combined with DW, NG2, and NGT on the paper and collected by 3, 5, and 7 repeated, respectively. The YOLOv5 model was implemented in the in the Google Colaboratory (Colab) environment serviced by Google. Learning was carried out with batch size: 16, epochs: 500.
The programs used in the annotation process are Labelimg 1.8.1 and RoboFlow. Labelimg is a graphical image annotation tool. It is a Python-based library and a program that accepts Qt to run a graphical interface. It is compatible with python 3.0 version or higher in Windows environment, and it can be downloaded from Python Notebook. Also, Labelimg can build from Linux, Unbuntu, Mac. Linux, Unbuntu, and Mac require at least python 2.6 version and run on PyQT 4.8 version. After creating a Rect box for the object on the background and giving a class, the dataset for the annotation is automatically created in the designated directory (Vogel
The soybean classification based on the YOLOv5 model implemented in the Google Colab. In model summary, layers and parameters were calculated as 243 and 7020913, respectively. The time required for one learning was 2.7 s/Iteration, and the iteration per one epoch was 15 times. A total of 343 epochs were performed to generate model weights, and the running time was 3hours 51minutes 31seconds. Since there was no significant change in the loss score from 243 epochs to 343 epochs, so an early stopping process was performed and learning was automatically terminated at 343 epochs. After AI learning started, train/box_loss, train/obj_loss, and train/cls_loss at 0 epoch were calculated as 0.1258, 0.0162, and 0.0466, respectively. At 343 epochs after training was completed, 0.0254, 0.0072, and 0.0014 were calculated.
As the learning progressed, the loss curve was observed to decrease, while the error rate decreased and the model performance improved. Val/box_loss, val/obj_loss, and val/cls_loss were calculated as 0.0888, 0.0203, and 0.0296 at 0 epoch, respectively. After training was completed, 0.0187, 0.0083, and 0.0005 were calculated at 343 epochs. A gradual decrease in the loss curve was recorded in the validation test. Metrics/precision and metrics/recall were calculated as 0.0000 and 0.0000 at 0 epoch, respectively. At 343 epochs after training was completed, 0.9990 and 1.0000 were calculated in each value. Precision and recall increased as learning progressed, indicating that the results have high reliability.
After learning was completed, the means of average precision (mAP)@0.5 (mAP@0.5) value was calculated as 0.9950, and the mAP@[0.5: 0.95] value was calculated as 0.7861. Due to the operation of the early stopping process, it seems that the recording stopped while the mAP was drawing an upward curve. Thus, it is needed to increase the size of the data set used for model training through Argumentation. For all classes, precision was calculated as 1.00 at confidence 0.872, and recall was calculated as 0.00 at confidence 1.00. The F1 score was recorded as 1.00 when the confidence was 0.862, and the precision/recall was calculated as 0.995 for all classes (Fig. 1, 2). For seed classification, the precision of DW, NG2, and NGT was calculated as 0.999, 0.997, and 1, respectively, and all three species have high values. mAP@[0.5: 0.95] was recorded as 0.835, 0.739, 0.785, and DW was calculated as the highest value, and NG2 was the lowest value observed (Supplementary Fig. S1).
As a result of comparing the predicted and the true value to measure the prediction performance in the confusion matrix, each class of DW, NG2, and NGT showed significant correlation of true positive (TP) in the matrix with the same output value for the input value. In the case of NGT, NGT was predicted to have a correlation with background false positive (FP), but background FP is seen as false-positive with NGT because the value of the correlation coefficient is shown insignificantly. There was no confusion between the background and the soybean seed in the process of model performance, and DW, NG2, and NGT each class did not have confusion between the true value and the predicted value (Fig. 3, 4, Supplementary Fig. S2).
In general, phenotypic traits of plants have been investigated by visual inspection (Bortnem
This work was supported by a 2-year Research Grant of Pusan National University.
![]() |
![]() |