Challenges and opportunities in crop simulation modelling under seasonal and projected climate change scenarios for crop production in South Africa

A broad scope of crop models with varying demands on data inputs is being used for several purposes, such as possible adaptation strategies to control climate change impacts on future crop production, management decisions, and adaptation policies. A constant challenge to crop model simulation, especially for future crop performance projections and impact studies under varied conditions, is the unavailability of reliable historical data for model calibrations. In some cases, available input data may not be in the quantity and quality needed to drive most crop models. Even when a suitable choice of a crop simulation model is selected, data limitations hamper some of the models’ effective role for projections. To date, no review has looked at factors inhibiting the effective use of crop simulation models and complementary sources for input data in South Africa. This review looked at the barriers to crop simulation, relevant sources from which input data for crop models can be sourced, and proposed a framework for collecting input data. Results showed that barriers to effective simulations exist because, in most instances, the input data, like climate, soil, farm management practices, and cultivar characteristics, were generally incomplete, poor in quality, and not easily accessible or usable. We advocate a hybrid approach for obtaining input data for model calibration and validation. Recommended methods depending on the intended outputs and end use of model results include remote sensing, field, and greenhouse experiments, secondary data, engaging with farmers to model actual on-farm conditions. Thus, employing more than one method of data collection for input data for models can reduce the challenges faced by crop modellers due to the unavailability of data. The future of modelling depends on the goodness and availability of the input data, the readiness of modellers to cooperate on modularity and standardization, and potential user groups’ ability to communicate.


Introduction
According to the United Nations (U.N) (2019) projections, the population of South Africa is expected to grow to about 68 million by the year 2035 and 75 million by 2050. The South African population has increased between 2002 and 2017 and the estimated overall growth rate increased from around 1.17% between 2002 and 2003 to 1.61% for the period 2016 to 2017 [171] and 1.28% between 2019 and 2020 [180]. The import for this increase is that food production in South Africa will have to be increased by such measures to be able to meet and sustain the demands of the rapidly growing population. Based on this scenario, the question which arises is how is South Africa going to solve the issues of meeting the increase in the demand for agricultural products given

Open Access
Agriculture & Food Security *Correspondence: kprissy@gmail.com 1 Department of Geography and Environmental Studies, University of Limpopo, Polokwane, South Africa Full list of author information is available at the end of the article that the country is already facing challenges due to the increased pressures on land use, water scarcity, and other natural resources? To complicate matters further, they have to solve this by depending on the same natural resources in the country [49]. Similarly, climate change has been cited to be a key concern within South Africa due to the significant threat it poses to South Africa's water resources, food security, ecosystem services, and biodiversity [196]. Due to global emissions of greenhouse gases (IPCC, 2013), climate change affects areas worldwide, with places, like South Africa, experiencing temperature increases and decreasing rainfall patterns [24,111]. The average annual temperatures over South Africa has increased by at least 1.5 times above the observed global average of 0. 65 °C over the past five decades and extreme rainfall events have increased in frequency [196] with the projected likelihood of continuance. According to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5) for Representative Concentration Pathway (RCP) under RCP 4.5, the near-surface mean temperature is indicated at 1-1. 5 °C on the coast and around 3 °C inland for South Africa for the period 2081-2100 as opposed to the reference period 1986-2005. RCP 8.5 suggests warming relative to 1986-2005 of 3-6 °C by 2081-2100 in the interior of South Africa with less certainty with regard to precipitation changes in terms of both direction and magnitude. The long-term adaptation scenario flagship research programme (LTAS) (Department of Environmental Affairs (DEA)(2013) describes South Africa's future climate up to 2050 and beyond using four fundamental climate scenarios with different degrees of change and likelihood that capture the impacts of global mitigation over time. These scenarios include a warmer (3 °C above 1961-2000) and wetter with substantially greater frequency of extreme rainfall events; a warmer (< 3 °C above  and drier, with an increase in the frequency of drought events and somewhat greater frequency of extreme rainfall events; hotter (> 3 °C above 1961-2000) and wetter, with substantially greater frequency of extreme rainfall events; and hotter (> 3 °C above 1961-2000) and drier, with a substantial increase in the frequency of drought events and greater frequency of extreme rainfall events. A higher frequency of flooding and drought extremes is projected in both wetter and drier futures, with the range of extremes worsened significantly under unconstrained emissions scenarios (DEA 2013). This will have a negative impact on food production and the productivity of croplands [24] and South Africa's food production and sustainability of food production systems will be seriously threatened. Therefore, understanding and predicting crop production outcomes under various climate scenarios and farm management practices geared towards adaptation and sustainability is of the essence. This, therefore, presents a scenario where the information needed for agricultural decision-making at all levels from farm management to adaptation strategies and relief schemes are also increasing and a method of supplying such information in relatively shorter time frames is needed.
Traditional agronomic research, such as field experiments, has been and is being used as reliable information sources for establishing causal relationships between agricultural land management patterns and real-world, observed measurements [50,99]. However, such traditional agronomic research methods are becoming insufficient to meet the rate of increasing needs and demands for data to guide policies and decision-making processes. Furthermore, traditional agronomic experiments often present only results from trials conducted at points in time and place, thereby creating season-specific as well as site-specific results [132]. Besides, these trials are usually labour intensive, time consuming, and expensive [34] and cause delays in the period of information acquisition. Again, it is hardly possible to run trials for several years and on multiple sites on the season-and site-specific recommendations for a wide range of parameters [74].
Furthermore, these trails may not provide sufficient data in space and time to identify appropriate and effective management practices [81,84]. Due to these shortcomings of traditional field experiments, there is an urgent need for a tool/method where new data and research findings are quickly and easily obtained and results are made rapidly available to end-users in various sectors dependent on such data and results for decision-making. By fast-tracking these plant-environment processes, intended results and outcomes will quickly become available, minimizing the constraints on only farm experiments for data. Hence, the need for the development and efficient use of tools such as crop simulation models to project food crop cultivation under various scenarios and time scales. It must be emphasized here that the usage of crop models cannot replace field trials. On the contrary, there is a growing awareness, especially in the crop modelling community, that good field trials, even though scarce, are crucial to improve and test these crop models [156]. With this in mind, we look at how crop models fill in the gap for traditional field experiments.
Crop simulation modelling offers an opportunity for exploring cultivar potential for new areas before establishing expensive and time-consuming field experiments [12]. Lengthy and costly agronomic and modelling field trials with a high number of treatments could be preevaluated by conducting, in minutes, experiments on a desktop computer or laptop [172]. Crop models have been shown to be vital tools in decision-making, in assessing the impacts of climate change/variability and management practices on productivity and environmental performance of alternative cropping systems, to promote better and sustainable agriculture [4,88,190]. They are a quicker and less expensive alternative for investigating the effects of agricultural land management practices on crop yields and the environment [34] and showing the optimum level of management for attaining economically efficient yields [191]. Crop simulation models can be used as decision support systems to assess the risk and economic impacts of management strategies in agriculture. According to Zhao [195], the modelling approach can provide reasonably reliable results in developing agricultural land management strategies if the models are calibrated and validated using reliable observed field data. For example, crop models have been applied to refine management practices, such as fertilizer application and water usage at the farm and plot scales [92], and to test the effectiveness of alternative agricultural land management practices under varying climate change scenarios [34]. The modelling of crops is critical to developing, implementing, and maintaining food security and policy in South Africa. Given that within each model they are modules built for specific crops, they can assimilate the understanding of specific crop physiology collected from many years of laboratory and field experiments and provide an effective means for investigating crop responses to climate change and alternative management scenarios [7] they are essential for projections.
Even though there are variations in the figures relating to the effects of climate change on crops, the negative impacts projected are expected to affect the basic food basket-wheat, rice, maize, and grain legumes as well as significant cash crops (e.g. sugarcane, coffee, and cocoa)-at moderate or low (≤+3 °C) levels of warming [141] if no adaptation actions are taken [29,135,136]. Evidence from regional and local studies and global meta-analyses of modelling studies indicates that adaptation strategies are critical in countering any adverse or capitalizing positive effects that may arise due to climate change [29,35]. Adaptation strategies are probably the only means by which food availability and stability can be maintained or increased to meet future food security needs. Recent model-based global estimates show that even incremental adaptation strategies could result in mean yield increases of ~ 7% at any level of warming [29,135,136]. This suggests that substantial opportunities may exist if more significant (i.e. systemic and transformational) changes in cropping systems are implemented [141] through simulation with crop models.
In areas, such as South Africa, with unfavourable growth conditions in arid and semi-arid regions, water is a limiting factor coupled with low soil fertility or poor agronomic practices, the crop growth model's use is still a challenge [198]. A point shared by Gaiser et al. [62], citing such challenges, will be felt in tropical Africa or Latin America. This limitation can be because of inadequate input data. Data needed for crop model calibration and simulation include climatic data, such as precipitation, maximum and minimum, temperature, solar radiation, and relative humidity. Also, soil data on soil physical and chemical properties (bulk density, cation exchange capacity, texture, and electrical conductivity), location (site, altitude, weather station, latitude, and longitude), crop management practices (cultivar genetic coefficients, irrigation, fertilizer type and amounts, row spacing, planting date, planting depth, plant population, tillage operations and dates, weed control, and leaf area index (LAI)) are also needed. Sometimes at experimental stations, crop experiments are rarely performed for crop model set-up and use, and input data might not be in the desired model format.
Furthermore, Motha [122] stated that it is common to have sufficient data collected on aboveground biomass in cropping systems but inadequate data on soil characterization and root growth. Also, faulty instruments, insufficient data entries in logbooks, and the absence of climate stations in some areas (e.g. [33]) could influence data quality and availability. Therefore, input data, such as climate records, soil physical and chemical properties, soil characteristics, crops, cultivars types, agronomic, and management practices, are often not continuous in time, scale wise, or crop model format. These challenges only increase when the study is upscaled beyond the experimental field to the district, regional, or provincial scales. In order to overcome this challenge and yield meaningful results, Folberth et al. [60] suggested it is essential to calibrate crop growth parameters to local conditions or use parameters from local field studies when applying large-scale crop growth models specifically for low yield regions, such as South Africa before their application. Therefore, models can only help identify management options for maximizing sustainability goals to land managers and policymakers across space and time as long as the needed soil, management, climate, and data are available [81,84].
The very nature of both climate change and the agrarian system is complex [58]. This, therefore, means that insufficiencies in input data in a given area pose a significant challenge to the accuracy and reliability of the models' outputs. The needs of models, such as DSSAT and APSIM, which are very robust tools for crop production projections, need complete climatic and phenological data for them to be effective.
In South Africa, however, there is the challenge that the trials carried out are more focused on attainable yields and less attention paid to other data that are vital to these models. Hence, there is that shortage of information, especially regarding planting dates, dates of emergence, flowering, maturity and biomass, and grain yields. Most of the data always found reports on a percentage to the milestone in the crop phenological state. Such data cannot be fed into any model because it does meet the required format for input data. It will then require the modeller to use assumptions that can be biased. There is a need to integrate methods by which such assumptions can be based with minimal room for error.
Using crop simulation models to assess agronomic practices and yield changes under varying climates and management regimes is of specific importance to farmers, most especially those in the dry land systems (summer rainfall areas) of South Africa. These farmers in the summer rainfall areas have been cited as vulnerable to climate change impacts since they usually operate under suboptimal conditions (e.g. [111]). More reliable estimations of season-to-season variation, as well as future climate change in areas, such as the summer rainfall regions, which are dependent on rain-fed agriculture, are therefore essential.
Effortlessly carrying out simulations on different agriculture and food systems means, if possible, there will be all needed input data sets to conduct studies that evaluate outcomes and tradeoffs among alternative farm management practices, technologies, policies, or climate scenarios. This scenario does not exist in South Africa. But where are we currently compared to this ideal situation? There is a shortage of information on the barrier to crop simulation modelling in South Africa due to data limitation or alternative sources of obtaining input data for models. Understanding the barriers to using crop models in South Africa vis-à-vis, the availability of input data can be a panacea to developing an integrated strategy that could enhance crop model use for agricultural production adaptation strategies. Furthermore as stated by Jones et al. [81,84], generally, sufficient data on the biophysical, environmental, and socio-economic conditions of each farm or for a range of farm typologies in this regions are not available. Although some data, such as climate and soil data, are available, generally these are not organized nor are they sufficiently site specific that agricultural systems models can readily access them for analysis of specific farms [81,84]. Although research has shown that some analyses needed to advise a farmer can be made, the availability of input data for agricultural systems models remains a major limitation [81,84].
A potential pitfall in using the crop model is that users may not familiarize themselves with the model's intended use and limitations before using it and may well be unaware of the uncertainty associated with results that they incorporate into decision-making processes [188]. While studies have explored the strengths and weaknesses of these models [41,145,165]; Willcock et al. 2016), the number of studies seeking to recommend, explore, and validate sources for input data remains limited (e.g. [108,160]). Such studies are vital in providing user communities with the information required to choose the most appropriate tools for their particular situation, use them correctly, and understand associated uncertainties (Willcock et al. 2016). They can also provide valuable information on potential data sources for parameterizing models and focus data acquisition by revealing which parameters have the most influence on model accuracy [144]. Studies, such as that of Hoffmann et al. [74] and Mathobo et al. [115], have carried out studies in South Africa using crop models but did not highlight the challenges of running CSM due to unavailable input data as well as the opportunities for using other sources of input data thereof. Chisanga et al. [33], on their part, highlighted challenges due to unavailable data from experiments. Regarding unavailable climate data, they cited Motha's [122] recommendations on how to get climate data from weather generators.
The purpose of this paper is to address the question of where South Africa is in terms of data availability for crop modelling by identifying the challenges and opportunities involved in using crop simulation models in South Africa with input data requirements. We aim to provide a robust approach to obtaining input data for crop models in the face of limited data to simulate effortlessly various phenomena and estimate the potential implications of climate change for South African crops. Furthermore, the study aims for data acquired through the proposed methodology to meet the standard for minimum input data for models. Hence, input data obtained should be at the appropriate spatial scale, are relevant to the cropping system being explored, are agronomically suitable, and apt for the proper calibration of the intended crop models to be used. As a critical first step towards providing an integrated approach to obtaining input data for crop models, we propose various types of crop simulation models and the risk of their use ("Components of crop simulation models" section); minimum data requirements for running crop models ("Model calibration -minimum data requirements versus a full set of data" section); and the challenges involved in running of crop models in South Africa ("Challenges in using crop simulation models in South Africa" section). By combining our empirical knowledge of data acquisition sources with the identified gaps in model applications, we propose a conceptual model of possible sources for input data acquisition for crop models ("Towards an integrated method in obtaining suitable input data for a crop simulation model" section) and show how they can be incorporated. We conclude with a recommendation for a forward-looking assessment of how various data sources can be better used to improve crop simulation. The key methods identified in our review are presented in Fig. 1.

Literature review process
This research follows a critical review process. According to Grant and Booth [66], a critical review aims to demonstrate that the writer has extensively researched the literature and critically evaluated its quality. It goes beyond a mere description of identified articles and includes a degree of analysis and conceptual innovation typically manifest in a hypothesis or a model. The resultant model may constitute an interpretation of the existing data. A critical review provides an opportunity to 'take stock' and evaluate what is of value from the previous body of work. As such, it may provide a 'launch pad' for a new phase of conceptual development and subsequent 'testing' [66]. However, critical reviews do not typically demonstrate the systematicity of other more structured approaches to the literature. While there is considerable value in identifying all the available literature on a topic under review, there is no formal requirement to present the search, synthesis, and analysis methods explicitly [66]. The emphasis is on the conceptual contribution of each item of included literature, not on formal quality assessment. Such a review does serve to aggregate the literature on a topic and the resulting product is the starting point for further evaluation [66].
The review process was carried out through four phases: the design, conducting, data abstraction and analysis, and structuring and writing of the review (Fig. 1). The process started with a search in various databases following keywords and phrases. Criteria were set as follows: the articles should be written in English, published by January 2012, and should have input data sources, challenges in simulating, validation, and simulation results from the crop model simulations conducted, especially in South Africa. The aim of the review guided the inclusion criteria. The results from various searches were sorted by relevance after scanning article abstracts. Data abstracted was in the form of descriptive information, such as authors, years published, topic, or type of study, or in the form of effects and findings. Validity and reliability were ensured through the checklist given in Table 1. A total of 108 studies had material that was considered suitable for this review.

Components of crop simulation models
Crop models are essentially collections of mathematical equations that represent the various processes occurring within the plant and the interactions between the plant and its environment [161] and have become an indispensable tool for estimating future impacts of climate change on crop yield [59]. Crop simulation models use Fig. 1 Steps in the review process quantitative descriptions (as model input data) of ecophysiological processes to predict plant growth and development in relation to factors influencing the production, such as environmental conditions and crop management practices [73]. Crop models are generally designed around four key components (Fig. 2).
To run a crop model and to conduct a simulation, a set of input data is required, sometimes referred to as a 'Minimum Data Set' [78]. These data are needed for model evaluation, model application, and sometimes for model development and improvement. Such required data include site specific weather data for the duration of the growing season (preferably for the complete year); soil surface characteristics and soil profile data; crop management information from the experiment that was conducted for model calibration (including at a minimum two key phenological phases, i.e. flowering or anthesis and physiological or harvest maturity, yield, and yield components) are needed as observational data [77,78]. This is similar to what Craufurd et al. [38] proposed that data should include data on plant development, carbon capture, water capture, and nitrogen and phosphorus capture. This is because plant development or phenology determines the timing and duration of key developmental events, notably flowering (anthesis). As such, it provides the framework within which processes of carbon, water, and nutrient capture and use occur. Minimum required weather data include: Latitude and longitude of the weather station; Daily values of incoming solar radiation (MJ/m 2 -day); Maximum and minimum daily air temperature (°C), and Daily total rainfall (mm). Also, the dry and wet bulb temperatures and wind speed, which allows for simulating evapotranspiration with more robust methods, can be added [77,78]. The length of weather records for evaluation must, at minimum, cover the duration of the experiment and should preferably begin a few weeks before planting and continue a few weeks after harvest so that 'what-if ' type of analyses can be performed [77,78]. Soil data include upper and lower horizon depths (cm), percentage sand, silt, clay content, bulk density, organic carbon, pH in water, aluminum saturation, and root abundance information [77,78]. Management data include planting dates and dates when soil conditions were measured before planting, planting density, row

Phase 1: Design
In relationship to the overall research field of crop simulation and data requirement, is this literature review needed and does it make a substantial, practical, or theoretical contribution? spacing, planting depth, crop variety, irrigation, and fertilizer practices. These data are needed for both model evaluation and strategy analysis [77,78]. According to Hoogenboom et al. [78] and Hoogenboom et al. [77], in addition to the site, soil, and weather data, experimental data should include observed data, such as crop growth data, soil water, and fertility measurements, given that they are needed for model evaluation.
It is worth noting that this principle of input data set holds across a wide variety of models and crops. These crop and site-specific details should be calibrated [47] so as to set model simulation runs to local conditions. Even though some parameters are considered conservative and seldom need to be adjusted during calibration, other parameters should be calibrated against useful field data [184]. The models also include empirically derived parameters that simulate different varieties' performance from the other crop modules found with the models and should be set according to specific crops. Some of the model approaches used in simulation include empirical models (e.g. [64,72]), regional suitability models (e.g. [109,194]), biophysical models (e.g. [34], meta-models (e.g. [9,147,176]), and decision models [192]. Empirical crop models use empirical time series and/or panel data sets of spatial and temporal variation in yield and climate variables to estimate climate-yield relationships [75]. According to Lobell et al. [100] and Lobell et al. [101], the empirical crop model approach is mostly applied for agricultural climate impact assessments. The approach is advantageous because it can be applied to fit yield response functions to available data, even if these data are scarce or only available in an aggregated form, such as monthly climate data. It can also be applied analytically for identifying region-specific main climatic drivers of yield and yield changes [75]. Predictions of validated models are considered valid within the range of data used to fit the empirical models. However, their ability to provide correct predictions beyond observed conditions may be hampered by the fact that causal relationships hypothesized based on observed data may not represent the process relationships beyond observations [75].
On the other hand, regional suitability models are usually applied to quantify biophysical land use potential under current and future climatic conditions at a regional scale [22,134]. This approach quantifies the land potential for particular crops and highlights regions of increasing or decreasing suitability for distinct land use types and shifts in cultivation zones. It integrates information collected from different sources to give a broad basis for multi-criteria decision-making for meta-models. This model is beneficial because it has a reduced run time, which increases the model's feasibility to be applied for explorative analyses, evaluating many alternative scenarios and their integrative capacity [75]. Decision models, also termed bio-economic models, are usually based on a coupling between a process-based biophysical model or an empirical production model and an economic farm optimization model [75]. Biophysical models, on their part, simulate biophysical processes, such as plant growth, nutrient and carbon dynamics, water cycling, and flood inundation based on mechanistic process understanding, which is mathematically formalized [75]. Given that these models integrate various biophysical processes, they provide Several models, both static and dynamic, are still being developed and used to simulate agricultural processes. These models exist at the scale level of the individual plant through to the crop as well as the field scale [52]. They range from models that focus mostly on representing crop growth processes with different degrees of complexity, simulating water limitations to potential crop yields, simulating crop responses to the dynamics of soil water, nutrients, and soil carbon, and integrating the effects of climate and management (e.g. [65,142]). For example, models, like the Agricultural Production Systems Simulator Model (APSIM) [76,91]; Cropping Systems Simulation Model (CropSyst) [173]; Decision Support System for Agrotechnology Transfer (DSSAT) [83] with its crop models CROPGRO for major grain legumes, CERES for cereal crops, and SUBSTOR for crops with belowground storage organs contained in DSSAT; CROPWAT/AquaCrop [61], Simulateur multidiscplinaire pour Les Cultures Standard)(STICS) [18,21]; and the Environmental Policy Integrated Climate (EPIC) [189] model, simulate biophysical processes at the plot level. In contrast, models, like LPJmL, SWAT/SWIM, and MIKE, are spatially distributed models, usually applied at a regional or even global scale.
Some of the widely used models include APSIM, CROPSYST, EPIC, STICS, System Approach to Land Use Sustainability (SALUS) [10,51], AquaCrop, Environment Resource Synthesis (CERES) models [152] (see [89], and the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model that has been developed to enable decision-makers to assess trade-offs among ecosystem services and to estimate changes in biodiversity under different demographic, land use, and climate scenarios [164]. These crop growth models have been used in various studies to help farmers worldwide carry out management decisions, like sowing time, plant population density, and irrigation regime (timing, frequency) under many conditions. In South Africa, some of the models that have been applied widely in the field of agricultural management have been the Agricultural Catchments Research Unit (ACRU) model [95], DSSAT, the Soil Water Balance (SWB) model [5], EPIC, APSIM, and AquaCrop. However, some of the models have mostly been hydrological models that have been adapted for agricultural water management and lack robust crop growth and fertilizer management [34].
Most studies in South Africa have used the AquaCrop model. This might be because the model is particularly suited to conditions where water is a key limiting factor in crop production, as is common to most areas in South Africa. The model is a water productivity model that simulates biomass production based on the amount of water transpired by the green canopy cover. Canopy cover development (biomass production) is based on thermal time. Water stress affects the transpiration rate via the crop water productivity parameter, which is a measure of water use efficiency. However, like most crop models, the model is also well suited for the analysis of climate change impacts on crop productivity, water requireand water consumption. The model allows for assessing crop responses under different climate change scenarios in terms of altered water and temperature regimes and elevated CO2 concentration in the atmosphere. AquaCrop has been utilized in many studies (such as that of [1,6,14,67,90,119,121,124,131,133,143,175,193] to assess yield response of crops, such as Beta vulgaris (sugar beet), wheat (Triticum spp.), Hordeum vulgare (barley), potato (Solanum tuberosum), maize (Zea mays), sunflower (Helianthus annuus), oats (Avena sativa), cabbage (Brassica oleracea), Sorghum bicolor (sorghum), Crocus sativus (saffron), and Solanum lycopersicum (tomato) to water stress. These studies showed that AquaCrop was capable of simulating canopy cover for maize, cabbage, and potato, but not under water-stressed conditions. In South Africa, studies, such as Bello and Walker [17], Bello and Walker [17]; Nyathi et al. [128]; Hadebe et al. [68]; Walker et al. [184]; Mabhaudhi et al. [104,105]; Chibarabada et al. [31]; and Mbangiwa et al. [114], calibrated and validated the Aqua Crop model for vegetables and crops, such as Amaranthus cruentus L. ex Arusha (amaranthus), Pennisetum glaucum (pearl millet), Glycine max (soya beans), Spider flower and Swiss chard, sorghum, taro (Colocasia esculenta (L.) Schott.), Vigna subterranea (L.) Verdc (Bambara groundnut) landrace, and Arachis hypogaea L. (Groundnut) (Additional file 1: Appendix 1). The model was successful in calibration and simulating the response of the crop to water. Shortcomings were registered due to the model underestimating canopy cover in some of the crops.
With the EPIC model, Choruma et al. [34] used limited field data (maize grain yield only) to calibrate the model for maize yield simulation using field-scale data under South African conditions and validated the EPIC model for maize yield simulation using independent maize yield data from another site.
Models, such as DSSAT or APSIM, are based on ecological principles for simulating crop development and growth as a function of weather conditions, soil properties, and management practices (through simulated water and nutrient limitations to plant growth). The APSIM model is structured around the plant, soil, and management modules, including a diverse range of crops, pastures, and trees, soil processes including water balance, N and P transformations, soil pH, erosion, and a full range of management controls. APSIM resulted from a need for tools that provided accurate crop production predictions in relation to climate, genotype, soil, and management factors while addressing long-term resource management issues. In the APSIM model, high-order processes, such as crop production and the soil water balance, are represented as modules which relate to each other only through a central control unit [161]. APSIM has been used worldwide for developing interventions targeted at improving farming systems under a wide range of management systems and conditions [185]. The model has been used extensively in Africa, for example, in Zimbabwe, to assess impacts of maize-mucuna rotations on maize production and soil water and nutrient dynamics [113], and the effects of climate change in maize production systems [159]. APSIM has been intensively experimented with within Australia, including in South Western Australia, which is often highlighted as having significant similarities with Southern Africa. This can account for its use in several modelling studies, such as the on-going modelling efforts in southern Africa [16,112]. The model has been calibrated by Masikati et al. [113], who used it with confidence in conducting an exante analysis of alternative management strategies aimed at improving systems productivity.
The DSSAT Crop Systems Model allows for genetic, physiology, phenology, and management-based growth development and yield functions. The growing degree day concept can capture temperature and increase plant growth due to CO 2 fertilization. The model uses a daily time step, which allows for extremes. The climate variables are represented by daily rainfall, minimum and maximum temperature, and solar radiation, and these are used to calculate potential reference evaporation and the CO2 transpiration feedback. These are the essential input variables that are expected to change under future climate. Studies in South Africa that have used this model include those of Jones et al. [85], Estes et al. [57], Schulze and Durand [161], and Schulze et al. [162].
Both APSIM and DSSAT models have been used to assess and analyse the agronomic performance of various systems, compared simulated yields of crops grown under different tillage-based practices and management at specific sites in diverse edaphic and climatic conditions in South Africa (e.g. [37,126]).
Most models do not simulate impacts of pests and diseases unless coupled externally with time-series input data or pest models like DSSAT CSM. Models, like APSIM, can simulate intercropping [76]. An unfortunate feature of current crop models is that modules from one set of models are not compatible with other models. For example, APSIM's intercropping capabilities are deeply embedded in the system architecture and cannot be simply moved to other models, like DSSAT CSM. Moving pest and disease damage modules from DSSAT CSM to APSIM is possible but requires the coding of module 'wrappers' to handle inter-model communications-a non-trivial task [81,84]. This will create some hindrances to some model users.
As shown above, crop models have been used and tested over time for different crops and growing conditions [197] and to simulate crop response to future climate changes effectively in South Africa (e.g. [97,107,196] and supplementary material). These models have been subjected to varying degrees of evaluation using agronomic trial data [7,11] and many individual model components (e.g. water balance, photosynthesis response). They are often assessed independently, thereby increasing the confidence in models' capabilities to simulate crop responses under varying environmental conditions, including climate change [141]. However, Knox et al. [94], Thornton et al. [177], and Zinyengere et al. [199] noted that most of the studies using crop simulation models confidently apply crop models from generalized validation procedures, where its application has been successful which sometimes might not be in similar environments. During calibration, adjustments are made to the default parameters to reflect local crop cultivars and site conditions. The calibration process involves using independent data sets to simulate the various plant growth processes and conditions, including the dates of phenological events, such as emergence, anthesis, and maturity in sequence. Aspects, such as maximum leaf area index (LAI), LAI pattern, the above-ground canopy weight, its development pattern, and where such data are available, partitioning to leaf, stem, and panicles as well as the grain yield and its components, are also examined. The model's output variables are validated by comparing the simulated values with the field experimental data. Statistical parameters, such as Coefficient of determination (R2), slope and intercept of the linear regression, and correlation coefficient (r) between observed and statistical values, absolute error, relative error, root mean square error (RMSE), and relative RMSE, are used to evaluate model performance. In quantifying the effects of calibration on simulated crop yield and understanding the uncertainties associated with the calibration procedures, differences between model outputs from simulations at different steps are analysed using the mean, coefficient of variation (CV) (ratio of the standard deviation to the mean, expressed as a percentage), and estimated linear trend in simulated yields between simulations [190]. Although validation procedures were followed in most of the above-cited studies, they tended to be highly generalized-usually a correlation between observed and modelled yields with little analysis of the agreement level between the two, which can be based on the criteria choice of model used. We argue that the validity of crop simulation models under climate change conditions has not been adequately established. Again, point-based validation from experimental sites as cited by Chipanshi et al. [32] is tested as accurate even though this generalized validation does not consider the variations in modelled climatic, soil, and management conditions in space [62]. The input data determine the resultant ease and quality of the results from these simulations.
Models, such as the DSSAT and APSIM, have been used successfully in several studies in many developed countries. Still, its wide application in South Africa and other African countries have been limited, primarily due to the complexity and input requirements which often make it difficult for researchers in developing countries to run these models (e.g. [33,37,81,84,104]). Furthermore, crop yield estimation results have been shown to vary substantially in terms of accuracy and robustness [130,157]. This variation comes because many Crop Growth Models (CGMs) require extensive input data that hamper their applicability outside of research conditions. For example, the World Food Studies (WOFOST) model [182] requires about 40 parameters to characterize the crop under evaluation [93,179]. To create an optimal balance between simplicity, accuracy, and robustness, the Food and Agricultural Organisation (FAO) has developed the model ' Aqua Crop' [172]. The Aqua Crop model, when compared to other models, such as APSIM and DSSAT, uses a relatively small number of parameters and input variables compared to the earlier CSMs. AquaCrops simplicity coupled with fewer input requirements relative to other crop models [183], make ideal for simulating due to its ease of calibration and minimum input requirements compared with established models.
A view we share with Oteng-Darko et al. [129] is that an ultimate crop model would physically and physiologically define all relations between variables the model reproduces and universally real-world behaviour. However, such a model cannot be developed because the biological system is too complex and many processes involved in the system are not fully understood [80]. Furthermore, even if an ideal crop model could be produced, the collection of the highly precise system parameters and the input data for the crop environment would be a formidable task [129]. Thus, the level of detail involved in a crop model is intricately linked to the model's end-use and the precision required. Even when a judicious choice of model is made, aspects of model limitations, challenges involved in simulation, and challenges in data acquisition must be borne in mind such that modelling studies are put in the proper perspective, and successful applications are achieved.

Model calibration-minimum data requirements versus a full set of data
Most crop growth models require a substantial number of input data, which creates a limitation in their usefulness for research purposes [89] and other decision-making. Monteith and Moss [120] believed that each crop growth model requires at least data in the form of information regarding crop management, the soil, weather, temperature, phasic development, and growing degree days. It would have been ideal for a model to have at least information about soil composition, weather, and management practices. Still, often than not, these data are not accessible [89] or available. Hunt and Boote [79] have put forward a list of the minimum amount of input data that they think are needed for operating crop growth models at a given location ( Table 2). These data requirements, though reasonable, will prove to be a challenge in the case of slope and aspect since these cannot be easily calculated and most often are not recorded. An intensely calibrated and evaluated model can be used to effectively conduct research that would, in the end, save time and money and significantly contribute to developing sustainable agriculture that meets the world's needs for food [129]. Hadebe et al. [68] concurred with this and stated in their study on sorghum that sorghum genotypes differ significantly in growth and development characteristics from the default file. The use of minimal data input calibration potentially compromises the prediction of crop yield. This notion can be applied to other crops with different growth parameters to the default ones found in model modules.

Challenges in using crop simulation models in South Africa
Models provide opportunities for realistic simulations for a given environment, using a range of management practices, testing the interactions between crops, and the biophysical environment. The simulation mechanics may appear straight forward, where input data are entered into the model and ran. The results are compared to those of other simulations for validation. However, in practice, the modelling process involves numerous data availability issues and quality scaling from global climate change data to the plot scale, where these models typically operate. It has been shown that the unsatisfactory performance of models, especially at the regional scale, can be attributed to the inappropriate consideration of factors and processes determining yield variability (e.g. [89,104,105,129,146]) and/or the aggregation of input data which may inconsistently reproduce the spatial variability

Lack of available data in crop model input format
Model performance in South Africa is limited due to the quality of input data, as cited by Lüke and Hack [102].
Most simulation models require that farm management, meteorological, and crop phenological data are reliable and complete. Unfortunately, this information is not always available, or where available, they are incomplete. In South Africa, for example, although there has been a rapid expansion of the SA weather station network in the past decade, significant gaps remain, especially with respect to solar radiation and rainfall [167]. The challenges in using crop models in South Africa can be found in a couple of studies. For example, Zinyengere et al. [198] assessed the use of crop models in southern Africa's drylands centred on the DSSAT model. The study focused on three southern African countries: Lesotho, Swaziland, and Malawi. The crops of interest were maize, sorghum, and groundnut. Input data used in the model for their study included crop management practices, such as planting dates, planting densities, fertilizer application amounts, and application; data on the crop growth and phenology. These data were obtained from reported experimental trials from relevant literature. Validation of results at point scale was based on one or two sites in a district where trials were carried out for only 2 or 3 years. Simultaneously, validation conducted at the district scale was for over six to fourteen cropping seasons. The simulations were carried out with limited input data with regard to the spatial variation in climate, soils, and management practices. However, their result showed that despite the limits posed by insufficient input data, a satisfactory test of crop model usefulness for capturing crop yields in study locations was conducted. Therefore, the question that arises is to what extent is a 'satisfactory' model to feed the need for the development and implementation of policies that address the risk and vulnerability of agricultural systems in marginal production areas, adaptation strategies to curb the effects of climate change on food-producing systems?
The study by Zinyengere et al. [198] showed that even under the conditions of limited data input, the DSSAT model's application was able to produce from locationspecific experimental trials and district-wide study good yield estimates of mean crop yields in all study locations. The results further indicated that the model could have good results on impact studies focusing on crops' long-term responses to climate. However, even though the model performed well in capturing mean yields and long-term average impacts under a wide range of conditions with data limitations, this might not be the case in situations where yields are obtained under extreme climatic conditions with specific crops and locations. It is, therefore, their conclusion that DSSAT, in this instance, does not adequately capture the variations in yield due to extreme climate conditions and, more so in this case, because the required inputs were not sufficient. Thus, model parametrization and calibration were set up using average yields. This notion is supported by Raes et al. [140], who stated that in the AquaCrop model, the use of time scales other than daily will yield less reliable results when using this model.
A similar study by Gaiser et al. [62], while validating the EPIC model in the demands of western Africa, also found limiting data to be a problem for crop modelling. A mix of secondary input data obtained from an experimental station, an on-farm research field, and farmland in tropical humid to semi-humid zones were tested using the EPIC. Although the results obtained indicated rice sensitivity to seasonal rainfall, its robustness under severe water stress was limited. Furthermore, in a scenario with multiple-year calibration for various variables, such as plant biomass, leaf area index, and yield, the uncertainty in the model prediction and validation is related mostly to the lack of quality of input data (estimation of the impact of drought spells on grain yield) [62]. Although the models employed in the studies project the response of crops to the changing climate to an extent, their results draw on normal ranges in climate variability. The issues of concern are the frequency of occurrences of extreme climatic events that have been projected for South Africa (e.g. [56,196]) and because of data limitations, these models cannot adequately project. Thus, crop simulation models require detailed soil input parameters (soil depth, soil chemical composition, and soil physical characteristic) associated with processes that limit crop growth under water scarcity conditions or enhance yields under wet conditions. These parameters include water retention characteristics, organic matter, nitrogen accumulation, and climate data, such as temperature, rainfall, and potential evapotranspiration. According to Zinyengere et al. [198], the availability of a more comprehensive detail of agronomic and management practices, especially the application of fertilizer and sowing times, which are known to affect crop yields in extreme climatic conditions considerably, will help improve model performance.
Also, the general absence of a spatial component in crop growth models is considered a serious shortcoming [36]. This is especially so for yield estimations at regional scales in areas, such as South Africa. Determining model inputs for the required spatial and temporal dimensions is a burdensome task since the necessary assumption of spatial homogeneity often leads to errors in the estimated outputs [98]. There exists considerable uncertainty concerning the spatial distribution of farm management practices and soil and weather conditions [69].

Poor access to necessary data for model calibration and validation
According to the Organisation for Economic Co-operation and Development (OECD) and other international organizations, data have become the key infrastructure for 21st-century knowledge society and economics. Data are a capital good that can be and need to be used across countries and societies for a theoretically unlimited range of purposes. Therefore, broad access to it will be crucial. However, this is not the case in all disciplines and countries, especially for research conducted in the middleand low-income countries where data sharing culture is just beginning to gain traction [3]. According to Alter and Vardigan [3], most authors acknowledge the potential for the exploitation of the local population and other forms of harm that might affect research participants, including loss of privacy, and issues around informed consent, including questions about the rights of research subjects and potential benefits to the local community. Other barriers cited included the time and effort it takes to make data ready for sharing and the lack of perceived validation and recognition for researchers and the research team for their efforts.
As is with other research disciplines, a significant hindrance to crop modellers is the unavailability of unrestricted access to historical data. Most often, modellers are faced with the issue of getting historical climate data and crop management data from private companies and individuals. Given that these data are privately owned, they are not obliged to share with scientists. Irrespective of the motive, data withholding has negative consequences. For example, a study by Campbell and Blumenthal [25] found that 28 percent of those surveyed were unable to replicate research due to another scientist's refusal to share, 24% had significant delays in publishing, and 21% had to abandon a research interest altogether.
In South Africa, there is a notable lack of legislation that obliges public access to private owned data, a lack of sustainable funding mechanisms for long-term collection and curation of important classes of data, and technical difficulties in managing and sharing data. A study by Koopman and De Jager [96] carried out at the University of Cape Town (UCT) showed that even though past research had generated digital data in many different formats, these data are being reused and shared within a controlled group of collaborating researchers. Accordingly, their study indicates that very few researchers were willing to allow free use of data sets under their control. Hence, data ownership was found to be a significant limiting factor for data sharing. Data ownership varies between the funder, the institution, the research unit, the supervisor, or the student, or a combination of all these owners. This situation further complicates the issue of data sharing, given that it is not entirely clear who has rights to the data, talk less about sharing it. From their study, Koopman and De Jager [96] reported that 'when asked if their data should be made available for future research, 88% of researchers responded positively' . However, they further indicated caveats to this response with cascading requirements for making data available. A cited example was when the respondents were willing to share their data only after publication and only if the data generator was offered co-authorship [96]. It was further noted that the data that were to be shared by those respondents who indicated they were willing to share were data that already had an open mandate.
They also discovered that there were analogue data sets that were 'available' but were mostly invisible and unavailable due to logistical issues, such as lack of description and archiving. Hence, it can be assumed that if an educational institution, such as UCT, faces such challenges in making data available to other researchers, this practice is not uncommon to different intuitions. The question then to contemplate is whether if rapid data sharing were maximized, could any of the challenges confronting the world, such as natural hazards, food protection, and climate change, be solved or significantly improved? Therefore, until such a time where data ownership is resolved, data sharing will remain a barrier to data availability for research everywhere and in South Africa.

Difficulties in using data from climate change models and scenarios
Modelled climate data are available in South Africa from the University of Cape Town and the CSIR. However, as also noted by Ziervogel et al. [196], despite this relative abundance of locally developed climate scenarios and the existence of climate expertise, only a limited number of climate change impacts studies have made use of both statistically and dynamically downscaled data. Furthermore, impact studies use data from CSAG and CSIR scenarios, GCM outputs from the international Coupled Model Intercomparison Project (CMIP) archives, and RCM downscaled products from global centres. As a consequence of this 'pick and mix' approach to the use of climate scenarios, it becomes difficult to compare and synthesize the results of different impacts studies [196].
Also, as noted, climate model outputs are not primarily maps and do not have geographic features in the way in which we are accustomed to reading them [27]. Instead, they show information with spatial-scale applicability, which depends on the climate itself and is usually more significant than the domain of that grid cell [70]. Therefore, crop modelling studies either use the grid on which the input climate simulations were generated or they downscale those data to a more relevant spatial scale. Even though a wide range of downscaling methods exist, each of them has its pros and cons. Downscaled climate model grids are a source of climate data for modellers to use; however, it may not be the best way in some situations when field-scale models are used. Some crop models have spatial-scale issues and need climate data in a time step that can make regional-scale information challenging to obtain. Likewise, the potential need to account for microclimate is absent in most downscaled climate data. In South Africa, climate data on some point scale is missing and can make studies in those areas difficult.
Sometimes climate data obtained are in formats that need further processing and conversion before they can be used in models. Climate data with file extensions, such as NetCDF files, pose a problem to modellers who cannot write codes to be able to use them. Some of these file extensions necessitate expertise in fields, such as coding to manage or utilize them. Therefore, it becomes a situation where there are data available, but it is of no use to the intended user due to a lack of expertise.

Complexities in methodologies used for crop simulation
The various choices made by modellers when calibrating, running, and evaluating models result in some limitations. Justifying these modelling choices is often missing from crop-climate studies [186], making it difficult to compare different studies directly. According to Challinor et al. [28], a too complicated model will need more parameters that can be constrained by observations, thereby increasing the risk of reproducing observations without correctly representing the processes involved. This is particularly true for cases where studies are to be conducted in areas with limited resources. Additionally, some parameters are not directly observable and must be inferred as part of the calibration procedure. This increases the risk of over tuning where the right answer is obtained for the wrong reason due to an excess of tunable parameters that cannot be related directly to observations [28]. Therefore, judicious model choice and calibration are crucial, as is the evaluation of historical performance (Easterling et al. 1996) if our simulations are to be consistently correct. A precise critical assessment of methodologies and model projections can support the identification of consensus views.

Towards an integrated method in obtaining suitable input data for a crop simulation model
Several challenges are involved in the utilization of crop growth models, as discussed above. Further limitations as cited by Bhatia [19] include the associated cost (time, money, and resources) in obtaining the necessary input data needed to run the model, insufficient or lack thereof of spatial information in some cases, and the quality of the required model input data quality when obtained. It is of the essence to formulate necessary steps in collecting data inputs for crop simulation models. Worth noting is that the ease with which such data can be obtained will depend on the user's expertise and familiarity with the proposed data collection methods. Methods of data collection involve qualitative and quantitative approaches, using both primary and secondary data sources. The methods range from grey literature review, field trials, controlled environment (greenhouse) experiments, and remote sensing (Fig. 3).

Literature review as a source of input data
Tingem et al. [178] believed that crop model calibration can be performed using 'loose' parameterization. Their rationale is that if the crop model's performance is satisfactory with limited parameterization, then in a case where adequate data are available such performance could even be better. Donatelli et al. [44] conducted a study looking at the impact assessment of climate change scenarios on agriculture over the European Union's Current Member States (EU27). They fused the periods of 2020 and 2030 against a baseline centred on the year 2000. Resultant yields of wheat, rapeseed, and sunflower were simulated with the CropSyst mode under a scenario of potential and water-limited conditions. According to them, model calibration can be done with the adjustment of a parameter within a reasonable range of fluctuation suggested guided by various experiments, expert opinions, or background knowledge. Driven by this line of reasoning, Donatelli et al. [46] calibrated and adjusted a few crop input parameters based on outputs of growth characteristics and minimizing the differences between actual (as reported in the literature for crops growing in well-managed conditions) and simulated yields. Other crop-specific input parameters, such as Thermal accumulation (degree days), emergence, days to onset of flowering, days to start of grain filling, days to physiological maturity, base temperature (Tb), cutoff temperature (Tcutoff ), phenologic sensitivity to water stress, photoperiod, growth rate, as well as days to physiological maturity required to feed the model, were extracted from the various literature (e.g. [15,45,179]).
Similarly, Zinyengere et al. [198], in their study, obtained input data on crop data and management practices (e.g. planting dates, planting densities, fertilizer application amounts, and timing) for maize, sorghum, and groundnut from reported experimental trials and relevant literature. The parameterization of DSSAT use in their study was based on typical values obtained from literature and default values from the model user manual, thereby conforming to what was posited by Donatelli et al. [44].
In the same way, soil information can also be obtained from Literature. Authors such as Ritchie et al. [151], Gijsman et al. [63], Batjes [13], and Romero et al. [154] proposed different approaches for obtaining adequate soil data for crop yield simulations. In South Africa, the University of KwaZulu Natal (UKZN) has a spatial database of soils and model-ready daily weather data for 1950-1999 for 5000 quinary catchments. These data are a reliable source of input data to be used as inputs for some models.

Remote Sensing as a source of input data
Data from satellite remote sensing (RS) offer significant benefits when used to assess agricultural yield and production during cropping seasons due to their spatial, temporal, and spectral resolutions, availability, and affordability [8]. High temporal and spatial resolutions with sufficient lead time near-real-time crop production estimates over large areas can be obtained. This can solve the issue of cost, given that getting these data is a low cost, and thus can provide a cheaper alternative for natural and agricultural resources surveys. Furthermore, RS has been shown to solve to a certain degree the uncertainty of spatial information on the crop parameters that are used for crop modelling [89]. Timely and correct information on crop phenological stages is critical for crop simulation models. The availability of up-to-date and accurate information on the crop status at the (sub-) plot or farm scale will benefit crop modellers. This is particularly so given that data from RS serve as input data with ranges in time scale from hourly, daily, or weekly time frames [127], which are invaluable when utilizing crop models. An integrated approach to data availability for input into crop models RS's ability to generate information in both spatial and temporal domains makes it crucial for successful analysis, prediction, and validation [110] and projections. So far, both RS applications and dynamic simulation models have played significant but different (and mostly separate) roles in generating such information [82]. Combining RS applications and dynamic simulation models has been explored in several studies [36,139,181], but these approaches aimed at quantitative biomass, leaf area index, and canopy nitrogen estimates from RS data to reconstruct crop growth curves used for calibrating dynamic simulation models at the field scale.
Another more direct technique to integrate RS observations in crop growth simulation models has been shown by Boegh et al. [20] and Jongschaap [87]. A priori Wiegand et al. [187] and Richardson et al. [150] proposed the use of RS as a means of improving crop model accuracy. The authors suggested that data on Leaf Area Index (LAI) obtained spectrally can either be used either as a direct input into a physiological crop model or as an independent check for model validation. Studies have shown that RS can provide important information on agronomic environments at various scales, such as that of the leaves, plants, sub-fields, fields, regions, and even globally. Efforts have been made to derive useful information from RS images to get input data to support various data needs and activities. Some of these include data on leaf and plant biochemical composition, plant (health) status, crop (health) status, and regional and global estimates of vegetation cover (including arable crops) to improve farm management and to support local-, regional-, or higherscale policymakers.
The combinations of RS and crop simulation can be synergistic in several ways. The main area of interest here is how this combination can enhance model calibration and provide needed data for the effective use of a crop simulation model. In lieu of this, Maas [103] initially proposed different approaches to combine a crop model with RS observations (radiometric or satellite data). A revision to the suggested was later done by Delecolle et al. [40] and Moulin et al. [123]. In their classification, five methods of integrating RS data into crop models were shown. These include (a) the direct use of a driving variable estimated from RS data in the model; (b) the updating of a state variable of the model, such as the LAI derived from RS ('forcing' strategy); (c) the re-initialization of the model which involves the adjustment of an initial condition to obtain a simulation in agreement with the RS-derived observations (d) the re-calibration of the model which is the adjustment of model parameters to get a simulation in accord with the remotely sensed derived observations, also called 're-parameterization' strategy; and (e) the corrective method, which entails the establishment of a relationship between the error in some intermediate variable as estimated from remotely sensed measurement and error in the final yield. This relationship may apply to a case in which the final yield is unknown [39].
Drawing from the studies of Maas [103], Delecolle et al. [40], and Moulin et al. [123], the study of Jongschaap [87] went on to simplify their classification by suggesting two approaches where remotely sensed data could facilitate crop simulation model use. According to him, remotely sensed data could provide estimates of intrinsic values needed to set the crop simulation environment. These include aspects, such as crop classification, emergence, flowering, and harvest dates. Secondly, RS can estimate the values of biophysical variables that can be used to drive the simulation model during run-time ('run-time calibration'). Jongschaap [87] used RS observations of model variables (leaf area index and canopy nitrogen) for 'run-time calibration' by resetting the simulated value with the value estimated from RS data. This approach resulted in more accurate predictions of the dynamics of the crop-soil system's characteristics, including variables that were not directly adjusted. A more innovative and useful combination of both RS and simulation modelling integrates knowledge of lower-scale processes in the crop and soil systems. The relevance of the remote sense data depends on the need of the crop simulation model and the availability of other needed data, and the user's knowhow. Therefore, the sensor's spectral and spatial resolution will play a significant role in the final decision of the remotely sensed data used. Phenological events, such as emergence, flowering, and maturity (followed by crop harvest) are difficult to predict and, in general, are not accurate enough represented in simulation models [137]. Most farm experiments conducted in South Africa and especially the smallholder farmers do not keep records of their various farm management practices. Hence, records do not necessarily have accurate data input as required by the crop simulation models. These events' timing has a substantial impact on crop performance and yield, both in reality and in simulation models. RS information allows identification of the timing of those events, which can be used to adjust simulation models. Satellite imagery has likewise been used elsewhere to reset model simulations with measured data for crop cover and growth status (e.g. [48,87]).
Time series of estimates of biophysical characteristics retrieved from RS can be used for model calibrations. A study by Clevers et al. [36] showed the advantages of using RS, whereby SPOT data were used to calibrate a wheat growth model under Mediterranean conditions through the estimation of leaf area indices and introducing these as calibration sets. Jongschaap and Schouten [86] successfully applied model calibration by estimating regional sowing, emergence, flowering, and harvest dates for wheat. Simulation models are often validated by RS estimates of biophysical variables, such as biomass production on a regional scale [170].
However, Dadhwal [39] pointed out that crop simulation's driving variables, such as weather inputs of daily observations of maximum and minimum temperature, solar radiation, relative humidity, and wind speed models can be compromised. A reason for this can be, as cited by Moulin et al. [123], the effects of cloud cover on sensors and platforms, which can lead to drawbacks and cause inadequate availability of RS-derived parameters. However, this shortcoming can be overcome by using other data acquisition methods to make available the required data if the correction of the images fails. The advantage of getting a clear image from RS is that the data provide a quantification of crop parameters over larger with minimal labour and material-intensive methods as opposed to field research. While crop models can provide a continuous estimate of crop growth over time, RS provides a multispectral assessment of instantaneous crop conditions within a given area [40]. The combination of RS and crop models can only be beneficial to crop modellers. Crop production and resource management in South Africa could benefit significantly if this RS is combined with crop modelling.

Models as input data sources
Some crop simulation models in themselves can provide input data that can be utilized in other models. Most of CSM comes with various inbuilt modules. Models, such as APSIM, DSSAT, and AquaCrop, have inbuilt modules with soil profiles or generic soils favourable to particular crops. Exploring the summary description of various in-built modules, such as the soils, soil characteristics, such as the water-holding capacity, and lower and upper drained limit, which are essential parameters required for running crop models, can be generated. Other soil parameters, including the soil albedo, a soil water drainage, soil water-holding capacity, nutrient content, texture, and particle sizes, can be obtained from soil databases, such as ISRIC or FAO, in addition to the production site information.
With regard to climate data in South Africa, both statistical [70,71] and dynamical downscaling [53,54] on multi-decadal time scales have been ongoing for a good number of years at local universities and institutions, such as the University of Cape Town and the Council for Scientific and Industrial Research (CSIR) [97]. The Council for Scientific and Industrial Research (CSIR) focuses on global and regional modelling for seasonal forecasts and decadal to centennial projections and coupling to land surface dynamics [196]. The CSIR developed the conformal-cubic atmospheric model (CCAM) [116][117][118] at a regional scale. The model has been applied to study southern African atmospheric dynamics over a wide range of time scales from daily weather variability to multi-decadal variations and change [55]. The model was successfully used to realistically represent the strong temperature increases observed over southern Africa during the past five decades and further projected significant warming to occur during the twenty-first century [53]. The University of Cape Town, Climate System Analysis Group (CSAG), and the Department of Oceanography on the other hand model global and regional atmospheric, ocean and coupled modelling with a focus on ocean-atmosphere process studies, seasonal forecasting, and climate change projections (DEA, 2011). With a slightly different focus, the CSAG has a long history of statistical downscaling using neural net approaches [70] but has also produced a limited set of scenarios with regional climate models (RCMs), most notably MM5 and Weather Research and Forecasting model (WRF) [174]. The advantage of this in-country modelling expertise and experience is that substantial evaluation, tuning, and development of the statistical and dynamical modelling tools that produce climate change projections have occurred and will continue to be fine-tuned to meet country demands.

Software as data sources
Various software can be employed to compute missing parameters for input data. Downscaled climate data provide rainfall and temperature data but often missing solar radiation and Evapotranspiration. A series of software, such as the FAO ETo calculator, can be utilized to get other climate variables as needed. For example, using the temperature and rainfall projections for the future, future daily solar radiation can be estimated depending on the minimum and maximum temperatures, Julian day, latitude, altitude, and an empirical parameter described by Allen et al. [2]. Food and Agriculture FAO calculator can be used to calculate the evapotranspiration based on the temperature and rainfall data. Depending on the model's needs, various parameters can be obtained from available variables and used to feed models.
Although measured data are always preferable to propagated or derived weather data, daily data for other variables required for crop modelling besides Tmax, Tmin, and precipitation (i.e. solar radiation and vapour pressure) can be estimated in the absence of measured data with a reasonable degree of accuracy using temperature data or data retrieved from other sources. An exception is wind speed, which cannot readily be estimated from other variables. Hence, a default world average value of 2 ms −1 is typically used to estimate ETO when measured wind speed data are not available [2]. In contrast, solar radiation can be calculated using equations that rely on sunshine hours (e.g. Angstrom formula) or temperature (e.g. Hargreaves formula) [2]. Vapour pressure is typically derived from relative humidity or dew point temperature measurements. In the absence of measured data, vapour pressure can be estimated from the measured Tmin, assuming that dew point temperature is near the daily Tmin [2]. In all cases, it is desirable to locally validate these approaches using good-quality observed data from a representative subset of years and locations in the region of interest.

Experimental trials
Where time and resources are available, experimental trials have been performed and used in validating crop models in various locations in southern Africa (e.g. [106,166]). These trials are valuable because they provide data for parameterizing and validating crop models. Details on crop phenology and genotype of each crop can be obtained from seed pamphlets. Apart from experimental trials, information on crop management can be obtained from expert agronomists, DAFF pamphlets on crop production, and the farmers themselves. The rationale here for every experiment or field trial conducted should be geared towards data collection as if it were meant for entry into a crop simulation model.

Participatory stakeholder approaches and key informant as data sources
Participatory stakeholder approaches to modelling have been shown to bring about benefits of improved contextual calibration and decision-making relevance and subsequent trust in, and action on, the emergent evidence bases produced by the research [30,138]. This is further backed by the agricultural model intercomparison project (AgMIP), which has explicitly recognized the need for modelling communities to engage with stakeholders throughout the modelling process [155].
Studies where farmers, especially smallholder farmers, have been involved in modelling actual on-farm conditions and yields in South Africa are scarce. Where these experiments have taken place, such results will be invaluable for modelling because most often, researchers do not usually have access to data on-farm management practices of most smallholder farmers. As suggested by Snapp et al. [168], there is a need to build research approaches, where 'quality farmer-researcher partnerships approach is employed to make technology testing more realistic'' . Suppose this relationship is successfully established, then according to [125]. In that case, the smallholder farmers will most likely accept the results and recommendations from a research study if they were engaged in the entire process. In their paper, Ncube et al. [125] further presented the results of 3-year participatory research on improving soil fertility. The participatory research approach was used to develop strategies for improving maize yield under farmer conditions in semi-arid environments and assess farmer participation dynamics and how fully engaging farmers could be of assistance in the development of soil fertility management. However, given the site and season specificity of on-farm experimentation, the interpretation and extrapolation of results remain an issue.
Several studies have been undertaken to show simulation modelling as an analytical tool in participatory research, especially in the area of fertility management (e.g. [43,153]), and applied in smallholder farming systems in Africa [42,163]. Carberry et al. [26] reported using a simulation model with farmers and researchers to explore the climatic risks associated with the application of various crop management technologies and as an aid to design farmer experimentation. The conclusion from these studies is that where farmers participate in the actual experiments, a condition for knowledge building is created. The farmers themselves will become involved in collecting data to improve the management of their field. Their data will be a significant contribution in providing input data into various models. There will not be a need for long field trials since existing data are already available.

Increasing transparency and inter-comparability in modelling
Comparability across model simulations is only possible when some standard methods or protocols are used. Inter-comparability of studies needs to go beyond the choice and performance of models. Frameworks and assumptions need to be clearly stated. Studies might be assessed against a set of criteria, thereby forming an evidence base, data sets, and model parameterization to evaluate their potential impact across temporal or spatial scales. For example, Ruiz-Ramos et al. [158] used an ex-post plausibility check in ensemble wheat modelling, which goes some way towards increasing robustness. Platforms, such as AgMIP, and the Consultative Group on International Agricultural Research (CGIAR) try to harness data to enhance the impact of international agricultural research. One of the goals of AgMIP is to compare different modelling approaches directly. The International Consortium for Agricultural Systems Applications (ICASA) data standard was adopted as the storage format. AgMIP members developed translators and tools to convert the format to and from the various crop models [135,136]. This standard carefully describes agronomic and crop management data.
Similarly, the CGAIR has the GARDIAN (Global Agriculture Research Data Innovation and Acceleration Network), which enables the discovery of agricultural data sets and publications across the CGIAR system and beyond. As advocated by Reynolds et al. [149], a global 'science commons' attitude should be encouraged with funding bodies facilitating timely sharing of data by investing more explicitly in research publication, with the condition of public access [148]. These could also result in more thorough reporting of experimental treatments and conditions by researchers, as well as greater availability of data sets that are usually not written up [149].

Geographic Information System (GIS)
Burrough and McDonnell [23] have defined GIS as an important set of tools for collecting, storing, retrieving at will, transforming, and displaying spatial data from the real world for a particular set of purposes. Most global databases with data on future climate projections and soil information come in GIS format. Examples of such include FAO soil maps for sub-Saharan Africa, World Clim database for world climate, and a host of others. GIS provides a framework where such data can be queried, and desired information extracted through a series of GIS processes and used as inputs into crop simulation models.

Conclusion and recommendation
The challenges of producing locally relevant and climateinformed results from crop simulation models across various time frames from seasonal to future climate change for agriculture is complex. In order to establish resilient and sustainable agricultural systems in the face of climate change, there is a need for effective adaptation measures to be established. This will require a cross-scale and cross-disciplinary approach to adaptation strategy development and implementation. Agricultural adaptation measures, such as livelihood changes and farm management practices, should be appropriate, with results validating such measures based on local socio-cultural and agro-ecological conditions. The unavailability of input data for crop models functions as barriers to advances in crop research in South Africa. Not only does it limit the type of research that can be carried but it also limits research on most crops, including cereals, oilseeds, pastures, and forage. The availability of appropriate input data for crop models could provide valuable information on how, where crops should be planted in specific areas and reduce the margin of error in agronomic and field management practices. As a research tool, the proper development and application of various data collection methods can help show ways in which gaps in our knowledge in data acquisition could be filled, thereby ensuring and enabling a more efficient and targeted research planning. Results obtained from simulations where appropriate climate, agronomic, soil, and crop physiological data are used can support extrapolation to alternative cropping cycles and locations, thus permitting the quantification of temporal and spatial variability.
Most models are virtually untested or poorly tested in most cases because data are not available or not available in the proper format [142]. Hence, their usefulness is unproven. From a similar point of view of as Rauff and Bello [142], it can be said that 'it is easy to formulate crop models than to validate them' , especially in areas such as South Africa. This situation is cause for concern to researchers and many agronomists. Hence, many researchers are reluctant to engage in research dealing with crop simulation because of the complexity of the models, lack of quality data, lack of model testing, and the inevitable inaccuracies when such testing is done.
However, all is not lost, given that this study presents a practical approach to obtain specific parameters for crop models. By utilizing currently available data sets from various sources, models can be calibrated to capture the cultivar characteristics, management practices, and the environmental conditions prevalent in a specific site. This method of obtaining input data sets can easily be extrapolated to various crop models, given that it proposes several data sources for input data. To get needed input data for crop models might facilitate model calibration and emphasize targeted simulations rather than calibration procedures. Region-specific phenology parameters are often easy to estimate, given the provision of weather data and crop calendar information. Yet, for parameters related to management and yield, correction factors, like planting density, fertilizer application, weeding, water management techniques, and information on their regional details, are usually scarce and literature study can provide loose parameters.
This study highlights the promise of using various data sets sources to calibrate crop models for crop simulation, thereby making the calibration, validation, and simulation with crop models. Therefore, it is opportune to test and summarize the approach for multiple vegetables, field and forage crops, and orchards and vineyards. Also more research should be undertaken to ease the use of parameters where one crop model can be uploaded into databases and then downloaded, reformatted for use in another model. Addressing the above research issues will ultimately help address problem areas related to several climates and hydrological modelling issues discussed above, including parameter estimation, the temporal and