Effectiveness of machine learning and deep learning models at county-level soybean yield forecasting

  • Nizom Farmonov Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Szeged, Hungary
  • Khilola Amankulova Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Szeged, Hungary
  • Shahid Nawaz Khan Geospatial Sciences Center of Excellence, Department of Geography and Geospatial Sciences, South Dakota State University, Brookings, USA
  • Mokhigul Abdurakhimova Department of State Cadastre, Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, National Research University, Tashkent, Uzbekistan
  • József Szatmári Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Szeged, Hungary
  • Tukhtaeva Khabiba Department of Hydrology and Ecology, “TIIAME” NRU Bukhara Institute of Natural Resources Management, Bukhara, Uzbekistan
  • Radjabova Makhliyo Department of Hydrology and Ecology, “TIIAME” NRU Bukhara Institute of Natural Resources Management, Bukhara, Uzbekistan
  • Meiliyeva Khodicha Department of Land Resources, Cadastre and Geoinformatics, Karshi Institute of Irrigation and Agrotechnology, “TIIAME” National Research University, Karshi, Uzbekistan
  • László Mucsi Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Szeged, Hungary
Keywords: agriculture, remote sensing, farmers, random forest, soybean, machine learning

Abstract

Crop yield forecasting is critical in modern agriculture to ensure food security, economic stability, and effective resource management. The main goal of this study was to combine historical multisource satellite and environmental datasets with a deep learning (DL) model for soybean yield forecasting in the United States’ Corn Belt. The following Moderate Resolution Imaging Spectroradiometer (MODIS) products were aggregated at the county level. The crop data layer (CDL) in Google Earth Engine (GEE) was used to mask the data so that only soybean pixels were selected. Several machine learning (ML) models were trained by using 5 years of data from 2012 to 2016: random forest (RF), least absolute shrinkable and selection operator (LASSO) regression, extreme gradient boosting (XGBoost), and decision tree regression (DTR) as well as DL-based one-dimensional convolutional neural network (1D-CNN). The best model was determined by comparing their performances at forecasting the soybean yield in 2017–2021 at the county scale. The RF model outperformed all other ML models with the lowest RMSE of 0.342 t/ha, followed by XGBoost (0.373 t/ha), DTR (0.437 t/ha), and LASSO (0.452 t/ha) regression. However, the 1D-CNN model showed the highest forecasting accuracy for the 2018 growing season with RMSE of 0.280 t/ha. The developed 1D-CNN model has great potential for crop yield forecasting because it effectively captures temporal dependencies and extracts meaningful input features from sequential data.

References

Barbosa dos Santos, V., Moreno Ferreira dos Santos, A., da Silva Cabral de Moraes, J.R., de Oliveira Vieira, I.C. and de Souza Rolim, G. 2022. Machine learning algorithms for soybean yield forecasting in the Brazilian Cerrado. Journal of the Science of Food and Agriculture 102. (9): 3665-3672. https://doi.org/10.1002/jsfa.11713

Boryan, C., Yang, Z., Mueller, R. and Craig, M. 2011. Monitoring US agriculture: The US Department of Agriculture, National Agricultural Statistics Service. Cropland data layer program. Geocarto International 26. (5): 341-358. https://doi.org/10.1080/10106049.2011.562309

Cai, Y., Guan, K., Lobell, D., Potgieter, A.B., Wang, S., Peng, J. Xu, T. et al. 2019. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agricultural and Forest Meteorology 274. (August): 144-159. https://doi.org/10.1016/j.agrformet.2019.03.010

Chen, T. and Guestrin, C. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco CA, USA, ACM. 785-794. https://doi.org/10.1145/2939672.2939785

Dhillon, M.S., Dahms, T., Kuebert-Flock, C., Rummler, T., Arnault, J., Steffan-Dewenter, I. and Ullmann, T. 2023. Integrating random forest and crop modelling improves the crop yield prediction of winter wheat and oil seed rape. Frontiers in Remote Sensing 3. (January): 1010978. https://doi.org/10.3389/frsen.2022.1010978

Didan, K. 2021. MODIS/terra vegetation indices 16-day L3 global 250 m SIN grid V061. NASA EOSDIS Land Processes DAAC.

Farmonov, N., Amankulova, K., Szatmári, J., Urinov, J., Narmanov, Z., Nosirov, J. and Mucsi, L. 2023. Combining planet scope and Sentinel-2 images with environmental data for improved wheat yield estimation. International Journal of Digital Earth 16. (1): 847-867. https://doi.org/10.1080/17538947.2023.2186505

Fernandes, J.L., Ebecken, N.F. and dalla Mora Esquerdo, J.C. 2017. Sugarcane yield prediction in Brazil using NDVI time series and neural networks ensemble. International Journal of Remote Sensing 38. (16): 4631-4644. https://doi.org/10.1080/01431161.2017.1325531

Green, T.R., Kipka, H., David, O. and McMaster, G.S. 2018. Where is the USA Corn Belt, and how is it changing? Science of The Total Environment 618. (March): 1613-1618. https://doi.org/10.1016/j.scitotenv.2017.09.325

Hunt, M.L., Blackburn, G.A., Carrasco, L., Redhead, J.W. and Rowland, C.S. 2019. High resolution wheat yield mapping using Sentinel-2. Remote Sensing of Environment 233. (November): 111410. https://doi.org/10.1016/j.rse.2019.111410

Ji, Z., Pan, Y., Zhu, X., Zhang, D. and Wang, J. 2022. A generalized model to predict large-scale crop yields integrating satellite-based vegetation index time series and phenology metrics. Ecological Indicators 137. (April): 108759. https://doi.org/10.1016/j.ecolind.2022.108759

Jones, J.W., Hoogenboom, G., Porter, C.H., Boote, K.J., Batchelor, W.D., Hunt, L.A., Wilkens, P.W., Singh, U., Gijsman, A.J. and Ritchie, J.T. 2003. The DSSAT Cropping System Model. European Journal of Agronomy 18. (3): 235-265. https://doi.org/10.1016/S1161-0301(02)00107-7

Kang, Y., Ozdogan, M., Zhu, X., Ye, Z., Hain, C. and Anderson, M. 2020. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environmental Research Letters 15. (6): 064005. https://doi.org/10.1088/1748-9326/ab7df9

Keating, B.A., Carberry, P.S., Hammer, G.L., Probert, M.E., Robertson, M.J., Holzworth, D.N., Huth, I. et al. 2003. An overview of APSIM, a model designed for farming systems simulation. European Journal of Agronomy 18. (3): 267-288. https://doi.org/10.1016/S1161-0301(02)00108-9

Ketkar, N. 2017. Introduction to Keras. In Deep Learning with Python. Berkeley, CA, USA. Apress, 97-111. https://doi.org/10.1007/978-1-4842-2766-4_7

Khaki, S. and Wang, L. 2019. Crop yield prediction using deep neural networks. Frontiers in Plant Science 10. (May): 621. https://doi.org/10.3389/fpls.2019.00621

Khaki, S., Wang, L. and Archontoulis, S.V. 2020. A CNN-RNN framework for crop yield prediction. Frontiers in Plant Science 10. (January): 1750. https://doi.org/10.3389/fpls.2019.01750

Khan, K., Iqbal, J., Ali, A. and Khan, S.N. 2020. Assessment of Sentinel-2-derived vegetation indices for the estimation of above-ground biomass/carbon stock, temporal deforestation and carbon emission estimation in the moist temperate forests of Pakistan. Applied Ecology and Environmental Research 18. (1): 783-815. https://doi.org/10.15666/aeer/1801_783815

Khan, S.N., Li, D. and Maimaitijiang, M. 2022. A geographically weighted random forest approach to predict corn yield in the US Corn Belt. Remote Sensing 14. (12): 2843. https://doi.org/10.3390/rs14122843

Khan, S.N., Khan, A.N., Tariq, A., Lu, L., Malik, N.A., Umair, M., Hatamleh, W.A. and Zawaideh, F.H. 2023. County-level corn yield prediction using supervised machine learning. European Journal of Remote Sensing 56. (1): 2253985. https://doi.org/10.1080/22797254.2023.2253985

Khosla, E., Dharavath, R. and Priya, R. 2020. Crop yield prediction using aggregated rainfall-based modular artificial neural networks and support vector regression. Environment, Development and Sustainability 22. (6): 5687-5708. https://doi.org/10.1007/s10668-019-00445-x

Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M. and Inman, D.J. 2021. 1D convolutional neural networks and applications: A survey. Mechanical Systems and Signal Processing 151. (April): 107398. https://doi.org/10.1016/j.ymssp.2020.107398

Kuwata, K. and Shibasaki, R. 2016. Estimating cord yield in the United States with MODIS EVI and machine learning methods. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-8. (June): 131-136. https://doi.org/10.5194/isprsannals-III-8-131-2016

Li, Y., Zeng, H., Zhang, M., Wu, B., Zhao, Y., Yao, X., Cheng, T., Qin, X. and Wu, F. 2023. A county-level soybean yield prediction framework coupled with XGBoost and multidimensional feature engineering. International Journal of Applied Earth Observation and Geoinformation 118. (April): 103269. https://doi.org/10.1016/j.jag.2023.103269

Liakos, K., Busato, P., Moshou, D., Pearson, S. and Bochtis, D. 2018. Machine learning in agriculture: A review. Sensors 18. (8): 2674. https://doi.org/10.3390/s18082674

Ma, Y., Zhang, Z., Kang, Y. and Özdoğan, M. 2021. Corn yield prediction and uncertainty analysis based on remotely sensed variables using a Bayesian neural network approach. Remote Sensing of Environment 259. (June): 112408. https://doi.org/10.1016/j.rse.2021.112408

Mirhoseini, N., Mahdi, S., Abbasi-Moghadam, D., Sharifi, A., Farmonov, N., Amankulova, K. and Mucsi, L. 2022. Multi-spectral crop yield prediction using 3D-convolutional neural networks and attention convolutional LSTM approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 1-14.

Molinaro, A.M., Simon, R. and Pfeiffer, R.M. 2005. Prediction error estimation: A comparison of resampling methods. Bioinformatics 21. (15): 3301-3307. https://doi.org/10.1093/bioinformatics/bti499

Panda, S.S., Ames, D.P. and Panigrahi, S. 2010. Application of vegetation indices for agricultural crop yield prediction using neural network techniques. Remote Sensing 2. (3): 673-696. https://doi.org/10.3390/rs2030673

Paudel, D., Boogaard, H. de Wit, A., Janssen, S., Osinga, S., Pylianidis, C. and Athanasiadis, I.N. 2021. Machine learning for large-scale crop yield forecasting. Agricultural Systems 187. 103016. https://doi.org/10.1016/j.agsy.2020.103016

Pede, T., Mountrakis, G. and Shaw, S.B. 2019. Improving corn yield prediction across the US Corn Belt by replacing air temperature with daily MODIS land surface temperature. Agricultural and Forest Meteorology 276-277. (October): 107615. https://doi.org/10.1016/j.agrformet.2019.107615

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M. et al. 2012. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12. 2825-2830.

Piekutowska, M., Niedbała, G., Piskier, T., Lenartowicz, T., Pilarski, K., Wojciechowski, T., Pilarska, A.A. and Czechowska-Kosacka, A. 2021. The application of multiple linear regression and artificial neural network models for yield prediction of very early potato cultivars before harvest. Agronomy 11. (5): 10.3390. https://doi.org/10.3390/agronomy11050885

San Millan-Castillo, R., Morgado, E. and Goya-Esteban, R. 2020. On the use of decision tree regression for predicting vibration frequency response of handheld probes. IEEE Sensors Journal 20. (8): 4120-4130. https://doi.org/10.1109/JSEN.2019.2962497

Saravanan, V. and Tamburi, V.N. 2022. Assessment of land surface temperature (LST) using MODIS MOD11A2 thermal satellite images using zero to null pixel averaging method for the Bengaluru urban district. Preprint. In Review. https://doi.org/10.21203/rs.3.rs-1932983/v1

Shahhosseini, M., Hu, G. and Archontoulis, S.V. 2020. Forecasting corn yield with machine learning ensembles. Frontiers in Plant Science 11. 1120. https://doi.org/10.3389/fpls.2020.01120

Song, Y., Jiao, X., Qiao, Y., Liu, X., Qiang, Y., Liu, Z. and Zhang, L. 2019. Prediction of double-high biochemical indicators based on LightGBM and XGBoost. In Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science. Wuhan, Hubei Province, China, ACM, 189-193. https://doi.org/10.1145/3349341.3349400

Sun, J., Di, L., Sun, Z., Shen, Y. and Lai, Z. 2019. County-level soybean yield prediction using deep CNN-LSTM model. Sensors 19. (20): 4363. https://doi.org/10.3390/s19204363

Tantalaki, N., Souravlas, S. and Roumeliotis, M. 2019. Data-driven decision making in precision agriculture: The rise of Big Data in agricultural systems. Journal of Agricultural & Food Information 20. (4): 344-380. https://doi.org/10.1080/10496505.2019.1638264

Thornton, M.M., Shrestha, R., Wei, Y., Thornton, P.E., Kao, S-C. and Wilson, B.E. 2022. Daymet: Daily surface weather data on a 1-km grid for North America. Version 4 R1. NetCDF, November, 0 MB.

Tibshirani, R. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58. (1): 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Tripathy, R., Chaudhari, K.N., Bairagi, G.D., Pal, O., Das, R. and Bhattacharya, B.K. 2022. Towards fine-scale yield prediction of three major crops of India using data from multiple satellite. Journal of the Indian Society of Remote Sensing 50. (2): 271-284. https://doi.org/10.1007/s12524-021-01361-2

Vermote, E. 2021. MODIS/terra surface reflectance 8-day L3 global 500 m SIN grid V061. NASA EOSDIS Land Processes DAAC.

Wan, Z., Hook, S. and Hulley, G. 2021. MODIS/terra land surface temperature/emissivity 8-day L3 global 1 km SIN grid V061. NASA EOSDIS Land Processes DAAC. Available at

Wang, H., Yang, F. and Luo, Z. 2016. Once measures. BMC Bioinformatics 17. 60. https://doi.org/10.1186/s12859-016-0900-5

Zeng, W., Xu, C., Gang, Z., Wu, J. and Huang, J. 2018. Estimation of sunflower seed yield using partial least squares regression and artificial neural network models. Pedosphere 28. (5): 764-774. https://doi.org/10.1016/S1002-0160(17)60336-9

Published
2024-01-12
How to Cite
FarmonovN., AmankulovaK., KhanS. N., AbdurakhimovaM., SzatmáriJ., KhabibaT., MakhliyoR., KhodichaM., & MucsiL. (2024). Effectiveness of machine learning and deep learning models at county-level soybean yield forecasting. Hungarian Geographical Bulletin, 72(4), 383-398. https://doi.org/10.15201/hungeobull.72.4.4
Section
Articles