A scalable deep learning framework for evapotranspiration estimation using open-access climate services and remote sensing

Abstract

By addressing the operational and maintenance constraints of physical ground-based weather stations, this study proposes a deep learning (DL) framework for estimating reference evapotranspiration (ET₀) by combining open-access climate services and remote sensing (RS) data. The proposed approach is benchmarked against traditional machine learning (ML) models, while multiple deep neural network (DNN) architectures are also evaluated, including multilayer perceptron (MLP), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Experiments conducted on three agricultural plots in southeastern Spain, representing contrasting meteorological conditions, demonstrate that RNNs achieve the best performance, with a coefficient of determination of R² = 0.92. Furthermore, model interpretability was addressed using SHapley Additive exPlanations analysis, which confirmed the biophysical consistency of the predictions and identified land surface temperature as the primary driver of the model's estimations. A key contribution is the demonstration that infrastructure-free models trained solely on open-access satellite and climate data can match or even surpass conventional meteorology-based methods, providing a scalable solution for ET₀ prediction. Based on a validation performed upon three agricultural plots in southeastern Spain, which represent contrasting semi-arid meteorological conditions, the framework showcases its potential applicability for global scalability by utilizing location-agnostic, open-access data. Moreover, the integration of crop coefficients enables accurate forecasting of daily irrigation demand. Overall, the proposed methodology illustrates the feasibility of artificial intelligence-driven irrigation management across diverse climates and highlights its potential to advance sustainable water use in agriculture.

Keywords

evapotranspiration estimation deep learning remote sensing irrigation management sustainable agriculture

Introduction

Global food security and sustainable water management stand as the most pressing and urgent challenges of the 21st century (FAO 2021; Varzakas and Smaoui 2024). With a global population projected to approach ten billion, and facing the accelerating effects of climate change, the pressure on water resources has intensified, reaching a critical point. Prolonged droughts, extreme heatwaves, and erratic rainfall are transforming agricultural landscapes, threatening food production and the stability of rural economies. Agriculture, which consumes approximately 70% of the world's freshwater, is at the epicentre of this climate crisis, and faces significant challenges if innovative solutions are not adopted.

In this context, precision agriculture (PA) has emerged as a key strategy for mitigating these effects, with reference evapotranspiration (ET₀) playing a central role in its effective implementation (Gyarmati and Mizik 2020). ET₀ represents the rate of water loss through evaporation from the soil and transpiration from a hypothetical reference surface. This reference surface is a healthy, actively growing, and well-watered crop of grass with a uniform height of 12 cm (SIAR 2012). By standardizing this reference, ET₀ provides a consistent measure of the climatic demand for water, independent of the specific crop type, its growth stage, or the soil conditions. Its precise measurement is vital for optimizing irrigation schedules, predicting crop water demand, and ensuring water efficiency in increasingly vulnerable agricultural systems (Youssef et al., 2024).

However, the estimation of ET₀ faces a considerable obstacle: traditional methods, such as the Penman–Monteith equation (Allen et al., 1998), rely on a comprehensive set of meteorological parameters (solar radiation, wind speed, temperature, and relative humidity) that are obtained from ground-based weather stations. While these infrastructures are accurate, they are prohibitively expensive to install and maintain, and their distribution is often sparse and uneven, creating critical data gaps in arid, rural, and hard-to-reach regions (Yamaç, 2021). This limitation compromises the ability of farmers and resource managers to make informed decisions in a timely manner, which translates into inefficient water use, soil degradation, and reduced resilience to the water crisis. The need for a scalable, low-cost solution that does not depend on physical infrastructure is not just an advantage; it is an absolute necessity for global sustainability (Vaz et al., 2023).

To better contextualize our approach, it is essential to review the current landscape of computational models for ET₀ estimation, particularly focusing on the trade-off between model complexity and data requirements.

To address these challenges, several authors have explored different computational approaches. Over the past few decades, the rapid expansion of artificial intelligence (AI) in PA has revolutionized how farmers manage key challenges like crop monitoring, pest control, and yield prediction. According to a literature survey, this technology helps farmers make better, more informed decisions for healthier crops (Sharma and Tripathi, 2021). The versatility of AI models is particularly valuable, as it allows them to be adapted for specific applications in diverse agricultural environments, from managing pests to improving irrigation (Sharma, 2021).

The integration of the Internet of Things (IoT) and Edge Computing has transformed real-time greenhouse monitoring by enabling on-site data processing and predictive analytics (Misra et al., 2022; Rayhana et al., 2020). AI-driven models, in combination with Big Data technologies, facilitate the complex analysis of greenhouse thermodynamics. This improves the prediction of temperature, humidity, and other key environmental factors (Escamilla-García et al., 2020).

In agricultural water management, accurate ET₀ estimation is essential for optimizing irrigation efficiency. Traditionally, ET₀ estimation has been based on empirical methods like the FAO-56 (Food and Agriculture Organization's Irrigation and Drainage Paper 56) Penman–Monteith equation, which requires a complete set of meteorological data. To overcome these data limitations, machine learning (ML) models emerged as a viable solution, showing promising results with fewer inputs. Early work by Yamaç (2021) and Vaz et al. (2022) demonstrated that models like Support Vector Machine (SVM) and Random Forest (RF) could achieve high accuracy with a reduced meteorological dataset, validating the effectiveness of AI in data-scarce environments. Furthermore, recent comparative studies have highlighted the potential of tree-based regression models, such as the M5 Tree, which provides a transparent and rule-based structure for ET₀ estimation, often yielding competitive results compared to classical Artificial Neural Networks (ANNs) (Dadrasajirlou et al., 2022).

Building on the foundation of these neural architectures, deep learning (DL) has gained prominence in this field, demonstrating a superior capacity to model complex, non-linear relationships within large datasets. For example, Ravindran et al. (2021) have shown that Deep Neural Networks (DNNs) can outperform traditional ML models, yielding higher accuracy metrics. Specifically, this work validated a DNN with multiple dense layers, outperforming models such as decision tree (DT) or SVM. The literature has explored hybrid models that combine neural networks with metaheuristic optimization to further enhance performance, with models like the ANN Grey Wolf Optimization (ANN-GWO) reaching a coefficient of determination of up to R² = 0.99 in the test phase (Khairan et al., 2023).

To further enhance performance and feature selection, the literature has explored hybrid models that combine ML with metaheuristic optimization. A prime illustration of this is the work by Ikram et al. (2023), which demonstrated how a hybrid Support Vector Regression (SVR) model, specifically one that combined Particle Swarm Optimization and GWO, reduced the Root Mean Square Error (RMSE) of the single SVR model by up to 32% in one of the studied stations. Similarly, regarding feature selection, Zhao et al. (2022) developed hybrid models such as Sparrow Search Algorithm (SSA) extreme learning machine (ELM) and Golden Eagle Optimization (GEO) ELM that achieved high accuracy with a coefficient of determination of R² = 0.96. These models combine an ELM with a metaheuristic algorithm: the SSA and the GEO, respectively. More recent studies have used models like the XGBoost Regression model to optimize irrigation management in controlled environments, such as greenhouses (Ge et al., 2022).

Although previous studies have utilized XGBoost for its predictive accuracy, there is a lack of focus on model interpretability in infrastructure-free contexts. This study fills that gap by coupling XGBoost with SHapley Additive exPlanations (SHAP) to ensure that the estimated ET₀ values are not only precise but also biophysically consistent.

Unlike existing hybrid models that prioritize architectural complexity to gain marginal precision using local data, our novelty lies in the validation of a scalable data-fusion framework. This approach addresses the ‘data-scarcity’ problem in regions where maintaining physical infrastructure is unfeasible. The benchmarking of convolutional neural networks (CNNs) and recurrent neural networks (RNNs) against open-source inputs reveals that the temporal-spatial capturing capabilities of DL are sufficient to match the performance of models that traditionally require on-site instrumentation.

Despite these significant advances, most models still rely on data from private weather stations, or have been validated over limited geographical scopes, which compromises their scalability and accessibility. This highlights a critical gap in the literature. Our study addresses this by proposing and validating a methodology that integrates multiple sources of open-access data. Consequently, a practical and replicable framework is offered by providing a systematic comparison between the capacity of RNNs and CNNs to process time-series and spatial data, respectively. Consequently, a practical and replicable framework is offered for precise ET₀ estimation, demonstrating the ability of DL architectures to outperform ML techniques in a real-world application context.

To address this gap, our research proposes an innovative solution based on DL that integrates open-access data from climate services and remote sensing (RS). The primary objectives of this study are:

To develop a DL methodology for ET₀ estimation that outperforms traditional ML models.

To validate the feasibility of a scalable approach that does not require costly weather station infrastructure.

To demonstrate the capacity of these models to support PA in diverse agroecological regions.

To establish a replicable framework for integrating DL-based ET₀ estimation into future AI-driven Decision Support Systems, enhancing the potential for data-informed irrigation strategies.

This work contributes to agricultural sustainability by offering a high-precision tool for water management. The remainder of this paper is organized as follows. The ‘Material and methods’ section details the proposed methodology, including data acquisition and model development. The ‘Results and discussion’ section presents the experimental results and a discussion of the findings. Finally, the ‘Conclusions and future work’ section concludes with the key findings and future research directions.

2 Material and methods

2.1 Reference evapotranspiration (ET₀) calculation

The reference evapotranspiration (ET₀) for each study plot was calculated using the Penman–Monteith combination method (Allen et al., 1998), widely recognized as the standard approach due to its robustness and applicability in diverse climatic conditions. The calculations were based on high-resolution, real-time meteorological data obtained from the Murcia Agricultural Information System (Sistema de Información Agraria de Murcia, SIAM) network of automated weather stations operated by the Murcia Institute for Agricultural and Environmental Research and Development (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario (IMIDA)). This high-quality, standardized dataset, computed according to FAO-56 guidelines (Katerji and Rana, 2014), serves as the ground truth for training and validating the DL models.

The full formula used is:

E T_{0} = \frac{0.408 Δ (R_{n} - G) + γ \frac{900}{T + 273} U_{2} (e_{a} - e_{d})}{Δ + γ (1 + 0.34 U_{2})}

(1)

where the variables are:

$E T_{0}$ : Reference evapotranspiration $(mm \cdot d^{- 1})$

$R_{n}$ : Net radiation at the crop surface $(MJ \cdot m^{- 2} \cdot d^{- 1})$

$G$ : Soil heat flux $(MJ \cdot m^{- 2} \cdot d^{- 1})$ , considered negligible for daily periods

$T$ : Mean daily air temperature (°C)

$U_{2}$ : Wind speed measured at 2 m height $(m \cdot s^{- 1})$

$(e_{a} - e_{d})$ : Saturation vapour pressure deficit $(kPa)$

$Δ$ : Slope of the vapour pressure curve $({kPa}^{\circ} C^{- 1})$

$γ$ : Psychrometric constant $({kPa}^{\circ} C^{- 1})$

$900$ : Conversion factor

All the necessary meteorological components for the Penman–Monteith equation were calculated using the SIAM-IMIDA data. The net radiation

(R_{n})

was determined as the difference between the net shortwave radiation

(R_{n s})

and the net longwave radiation

(R_{n l})

. The calculation of

R_{n l}

was derived from maximum and minimum air temperatures, actual vapour pressure, and solar radiation.

The remaining components were calculated as follows:

The slope of the vapour pressure curve $(Δ)$ was computed as a function of the air temperature.

The psychrometric constant $(γ)$ was estimated from atmospheric pressure, which in turn was derived from the station's altitude.

The saturation vapour pressure deficit $(e_{a} - e_{d})$ was determined from the saturation vapour pressure $(e_{a})$ (calculated using maximum and minimum temperatures) and the actual vapour pressure $(e_{d})$ (obtained from the relative humidity and the saturation vapour pressure at the mean air temperature).

This rigorous methodological approach provides an accurate and reliable baseline for ET₀ estimation, which is fundamental for validating the performance of the proposed DL models under the diverse climatic conditions of the study plots (SIAM and IMIDA, 2010).

Use case setting

Figure 1 shows the three agricultural plots selected for this study. They are all situated in the Region of Murcia (southeastern Spain); an area characterized by a semi-arid Mediterranean climate and chronic water scarcity.

Figure 1.

Geographic distribution of the study plots, focusing on the three selected locations: plot 1 (Estación Sericícola, labelled 1), plot 2 (Finca Torreblanca, labelled 5), and plot 3 (Finca Hacienda Nueva, labelled 2).

The plots were selected to capture spatial heterogeneity in microclimatic conditions, thereby enabling robust evaluation of climate-resilient irrigation and evapotranspiration modelling strategies.

The selection of southeastern Spain as a study area provides a rigorous testing ground due to its high-water stress and variability. Although this study focuses on this specific region, the methodology is inherently scalable to other geographical areas because it relies exclusively on global climate services and satellite products, bypassing the need for local instrumentation.

Plot 1 (P1) is located at Estación Sericícola (La Alberca; 37.9385 °N, 1.1345 °W). It exhibits a warm, humid Mediterranean climate with frequent summer heatwaves exceeding 37 °C.

Plot 2 (P2) is located at Finca Torreblanca (Torre Pacheco; 37.7751 °N, 0.8974 °W), in proximity to the Mar Menor coastal lagoon (Murcia, Spain). The climate conditions are like P1, characterized by high summer temperatures and mild winters.

Plot 3 (P3) is located at Finca Hacienda Nueva (Cehegín; 38.1120 °N, 1.6810 °W), representing an inland, cooler setting with occasional frost events and winter minima approaching −10 °C.

The meteorological baseline for these plots was obtained from the SIAM-IMIDA network (IMIDA, n.d.), which provides the standardized, quality-controlled variables used to compute ET₀.

Data collection

Accurate estimation of ET₀ in the studied semi-arid agricultural systems requires the integration of heterogeneous data sources capturing atmospheric, soil, and vegetation dynamics at appropriate spatial and temporal scales. To this end, a multi-source dataset was assembled for the three experimental plots (P1, P2, and P3). The data collection strategy was designed to combine ground-based measurements, gridded meteorological products, and satellite-derived variables, providing a comprehensive and temporally consistent representation of local hydroclimatic conditions. This integrative approach enables robust training and validation of data-driven models. It also supports irrigation management under variable climate scenarios. The main data sources incorporated in this study include the following:

In-situ data from SIAM-IMIDA: This dataset provides the ground-truth reference (daily ET₀) for model calibration and evaluation, as previously established. The high-quality ground observations from the SIAM-IMIDA network include daily records of air temperature, relative humidity, wind speed, precipitation, and solar radiation, all computed in accordance with FAO-56 guidelines.

Meteorological data from OpenWeather: Gridded reanalysis and forecast products from a globally recognized climate service, supplying daily records of air temperature, humidity, wind speed, solar radiation, and computed ET₀. These data, excluding ET₀, are used as independent predictors in the modelling framework and allow evaluation of model performance under non-local input conditions.

RS data from Copernicus Land Monitoring Service (CLMS): Satellite-derived variables from Sentinel missions, including land surface temperature (LST) and Surface Soil Moisture (SSM), which complement ground observations and provide spatially explicit information on land surface energy and water fluxes.

RS data from Sentinel Hub: Access to harmonized multi-spectral imagery for the derivation of vegetation indices such as normalized difference vegetation index (NDVI) and normalized difference water index (NDWI), enabling characterization of crop vigour, canopy development, and water stress conditions throughout the study period.

Table 1 shows the four data sources considered in this study, covering the 2021–2024 period. CLMS provides 1931 daily records of LST, forming a consistent baseline for hydrological characterization. Although SSM data were initially available from this source, that variable was discarded due to a scarcity of values. Sentinel contributes 578 observations acquired every 2–5 days, including 19 spectral variables describing vegetation phenology and water dynamics at plot scale. OpenWeather offers 1048 daily meteorological records containing precipitation, temperature, humidity, and wind speed, capturing the atmospheric drivers of evapotranspiration. Finally, SIAM-IMIDA supplies 2123 daily measurements of ET₀, which constitute the ground-truth benchmark for model calibration and validation.

Table 1.

Descriptive synthesis of the data sources.

Source	Values	Variables	Temporal Frequency	Start Date	End Date	Description
CLMS	1931	1	Daily	03-01-2022	01-01-2024	Temperature
Sentinel	578	19	Every 2-5 days	01-11-2021	05-03-2024	Vegetation and water
OpenWeather	1048	8	Daily	10-01-2022	07-03-2024	Climate
SIAM-IMIDA	2123	1	Daily	10-01-2022	20-01-2024	ET₀ (Ground Truth)

SIAM: Sistema de Información Agraria de Murcia; IMIDA: Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario; CLMS: Copernicus land monitoring service.

Building on this description, the statistical properties of the most relevant variables are subsequently examined, with a focus on ET₀, to characterize the climatic conditions of the three experimental plots and assess their suitability for model training and evaluation.

In situ meteorological data (SIAM-IMIDA) (ET₀ ground truth): The daily ET₀ observations from the SIAM-IMIDA network (computed following FAO-56 guidelines) and associated meteorological variables form the ground-truth baseline for the study. Table 2 summarizes the distribution of daily ET₀ across the three experimental plots. Mean daily values range between 2.94 and 3.53 $mm \cdot d^{- 1}$ , with low inter-plot variability (standard deviations of 1.80–1.93 $mm \cdot d^{- 1}$ ), indicating homogeneous climatic forcing. The aggregated dataset yields an overall mean of 3.21 $mm \cdot d^{- 1}$ covering a wide spectrum of evaporative demand and ensuring robust calibration conditions for model development.

Table 2.

Statistical summary of daily ET₀ [mm · d⁻¹] for the different plots.

	P1	P2	P3
Count	761.00	705.00	657.00
Mean	3.13	3.53	2.94
Std	1.93	1.80	1.82
Min	0.37	0.63	0.36
25%	1.33	1.86	1.30
50%	2.96	3.31	2.49
75%	4.99	5.19	4.72
Max	6.63	7.60	6.43

Climate service data (OpenWeather): OpenWeather provides spatially gridded daily meteorological variables that complement the in-situ network. The descriptive statistics in Table 3 reveal large seasonal amplitude, with minimum temperatures as low as −11.54 °C and maximum values exceeding 45 °C. Relative humidity spans from 4% to 100%, and wind speed reaches up to 14.8 m/s, capturing the full range of atmospheric conditions affecting the study sites. This variability enhances the representativeness of the dataset and supports generalizable modelling across diverse weather scenarios.

Table 3.

Descriptive statistics of OpenWeather meteorological variables for aggregated plots.

	Temperature (°C)		Wind Speed (m/s)	Humidity (%)		Cloud Cover (%)	Dew Point (°C)	Solar Radiation (MJ·m⁻²·d⁻¹)
	Min	Max	Mean	Min	Max	Mean	Mean	Mean
Count	1048.00	1048.00	1048.00	1048.00	1048.00	1048.00	1048.00	1048.00
Mean	10.99	23.87	2.16	42.84	86.35	36.61	9.88	16.59
Std	6.37	7.02	0.93	14.77	10.41	28.48	5.72	6.35
Min	−11.54	2.69	0.53	4.00	37.00	0.00	−13.85	3.70
25%	5.97	18.11	1.55	32.00	81.00	11.62	5.78	10.50
50%	10.73	23.46	1.95	41.00	89.00	31.62	9.73	16.60
75%	16.55	29.83	2.50	51.00	94.00	57.50	14.46	22.20
Max	26.55	45.03	14.80	95.00	100.00	100.00	23.75	31.60

RS data: Satellite-derived variables complement the ground and climate service data by capturing spatial heterogeneity. CLMS products provide LST, showing mean values around 18 °C and a wide dynamic range from −2.71 to 38.62 °C (see Table 4). This variability is critical for quantifying surface energy balance dynamics.

Table 4.

Statistical summary of the land surface temperature (LST) variable for the different plots.

	P1	P2	P3
Count	647.00	648.00	636.00
Mean	18.63	18.98	17.14
Std	8.21	8.11	8.64
Min	−2.07	0.09	−2.71
25%	12.11	11.89	9.78
50%	18.70	18.50	17.31
75%	25.43	26.05	24.40
Max	36.69	38.62	35.54

Vegetation indices derived from Sentinel Hub imagery, including the NDVI, NDWI, Normalized Difference Moisture Index (NDMI), Green Normalized Difference Vegetation Index (GNDVI), Enhanced Vegetation Index (EVI), and Soil Adjusted Vegetation Index (SAVI), further characterize crop development and canopy moisture status. As summarized in Tables 5 and 6, NDVI values indicate moderate vegetation cover (mean ≈ 0.24), whereas negative NDWI values and low NDMI suggest periods of water stress or limited canopy moisture, both of which are relevant indicators for irrigation scheduling models.

Table 5.

Statistical summary of sentinel variables for aggregated plots (part 1).

	NDVI			NDWI			NDMI			GNDVI
	Min	Mean	Max	Min	Mean	Max	Min	Mean	Max	Min	Mean	Max
Count	578.00	578.00	578.00	578.00	578.00	578.00	578.00	578.00	578.00	578.00	578.00	578.00
Mean	0.14	0.24	0.36	−0.41	−0.31	−0.21	0.01	0.05	0.11	0.22	0.31	0.41
Std	0.09	0.15	0.22	0.23	0.18	0.13	0.17	0.16	0.15	0.13	0.18	0.23
Min	−0.26	−0.09	−0.08	−1.00	−0.93	−0.60	−0.23	−0.18	−0.15	−0.13	−0.13	−0.12
25%	0.09	0.14	0.20	−0.58	−0.44	−0.30	−0.12	−0.05	0.02	0.12	0.17	0.21
50%	0.15	0.26	0.38	−0.49	−0.38	−0.26	−0.03	0.01	0.09	0.26	0.38	0.49
75%	0.20	0.35	0.54	−0.21	−0.17	−0.12	0.06	0.11	0.16	0.30	0.44	0.58
Max	0.40	0.77	0.99	0.12	0.13	0.13	0.76	0.77	0.77	0.60	0.93	1.00

NDV: normalized difference vegetation index; NDWI: normalized difference water index; NDMI: normalized difference moisture index; GNDVI: green normalized difference vegetation index.

Table 6.

Statistical summary of sentinel variables for aggregated plots (part 2).

	EVI			EVI2			SAVI
	Min	Mean	Max	Min	Mean	Max	Min	Mean	Max
Count	578	578	578	578	578	578	578	578	578
Mean	−1.94			0.12	0.19	0.27	0.11	0.18	0.26
Std	35.33			0.07	0.11	0.16	0.07	0.11	0.16
Min	−840.00	−24.81	−0.92	−0.32	−0.17	−0.15	−0.27	−0.12	−0.11
25%	0.10	0.14	0.21	0.08	1.11	0.15	0.07	0.10	0.14
50%	0.13	0.22	0.34	0.12	0.18	0.27	0.12	0.18	0.27
75%	0.17	0.28	0.43	0.16	0.27	0.42	0.15	0.26	0.40
Max	0.60			0.37	0.43	0.65	0.34	0.41	0.64

*Invalid values (e.g., NaN or infinite results during data processing) were removed. The table now reports only physically plausible ranges. EVI: enhanced vegetation index; SAVI: soil adjusted vegetation index.

Data processing

Data cleaning and sanitization: Preliminary exploratory analysis revealed the presence of extreme values in multiple variables across the CLMS, OpenWeather, SIAM-IMIDA, and Sentinel datasets. These anomalies, visualized through box-and-whisker plots, were likely due to sensor errors, cloud contamination in RS products, or sporadic extreme events. To mitigate their impact on model performance, a systematic data cleaning procedure was implemented. Figures 2 and 3 show the distribution of key variables before and after preprocessing, for ground-based and RS data, respectively. Panels (a) correspond to the raw data distributions, whereas panels (b) show the adjusted distributions after cleaning.

Figure 2.

Boxplots of variables from CLMS, OpenWeather, and SIAM-IMIDA data sources before and after outlier removal. (a) Before preprocessing (raw data). (b) After preprocessing (cleaned data). SIAM: Sistema de Información Agraria de Murcia; IMIDA: Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario; CLMS: Copernicus land monitoring service.

Figure 3.

Boxplots of Sentinel-derived variables before and after outlier removal. (a) Before preprocessing (raw data). (b) After preprocessing (cleaned data).

The outlier removal strategy is based on percentile thresholds, specifically excluding values below the 10th percentile and above the 90th percentile. This aggressive filtering was a deliberate choice, adopted to counteract the high level of noise and potential contamination inherent in the integration of open-access RS and climate service data. Unlike the more traditional Interquartile Range (IQR) method, this approach provided a more deliberate and well-reasoned choice, ensuring that our models were not distorted by unrepresentative data while preserving the core data distribution. This decision was critical for improving the overall reliability of the trained models. This cleaning process not only improved the data's interpretability by centring variable distributions within typical ranges and reducing noise, but it also enhanced the reliability of the models trained in subsequent stages. The IQR is now more compact, reflecting representative values without distortion from extreme outliers. This adjustment facilitates pattern recognition, such as the moderate dispersion observed in indices like mean NDVI (ndvi\_mean) and mean NDWI (ndwi\_mean), compared to the lower variability in minimum SAVI (savi\_min) and minimum NDVI (ndvi\_min). Overall, this preprocessing enhances statistical reliability by reducing noise, leaving the data better suited for precise analyses and comparisons among indices.

Once the datasets had been cleaned and harmonized, we proceeded to evaluate multicollinearity to reduce redundancies and improve model stability, as described below.

Feature engineering: Another important parameter to consider is the multicollinearity among variables. A correlation analysis is conducted, with results visualized via a heatmap. This process is executed according to the following steps:

Data preparation: Using the already cleaned dataset obtained during the preprocessing stage (see the ‘Data processing’ section), ensuring integrity before feature correlation analysis.

Correlation matrix calculation: Applying Pearson's correlation coefficient to quantify relationships among variables.

Heatmap visualization: Generating a heatmap where each cell represents the correlation strength between two variables. Darker tones, ranging from red (for strong positive correlations) to blue (for strong negative correlations), indicate strong linear relationships, while lighter tones reflect weak or negligible relationships (see Figure 4).

Figure 4.

Correlation matrix for all variables.

Key observations include strong correlations among vegetation indices such as NDVI, EVI, and SAVI. This result is expected because all of them measure vegetation health and density. NDMI, which reflects soil and vegetation moisture, exhibits specific correlations with vegetation indices and climatic variables, highlighting its role in capturing broader environmental factors. Regarding climatic variables, maximum temperature and solar radiation show a strong positive correlation, as higher solar radiation is generally associated with increased temperatures. Conversely, minimum humidity negatively correlates with maximum temperature, indicating that drier conditions typically occur under high temperatures.

ET₀, one of the most critical variables, demonstrates a significant positive correlation with maximum temperature and solar radiation, aligning with the expectation that warmer, sunnier days increase water demand. Negative correlations with minimum and maximum humidity support the inverse relationship between atmospheric moisture and ET₀. A moderate positive correlation with wind speed is also observed, as higher winds enhance evaporation by dispersing the humid air layer near the surface.

To mitigate multicollinearity and improve model stability, a correlation-based feature selection procedure was applied. First, the Pearson correlation matrix was computed for all numerical variables to quantify pairwise linear associations. Variables exhibiting an absolute correlation coefficient greater than 0.9 were considered highly collinear. For each pair of strongly correlated variables, the most representative one was retained based on its domain relevance and interpretability, while the redundant counterpart was discarded. This process reduced feature redundancy and minimized the risk of instability in model parameter estimation.

The selection process prioritized variables based on their relevance and interpretability. For instance, given the strong correlation among NDVI, EVI, and SAVI, we opted to retain maximum NDVI (ndvi\_max) and minimum SAVI (savi\_min). This was done to capture both the peak vegetation status and a more stable base measurement without the redundancy of using all three highly correlated indices. Similarly, key variables directly related to the calculation of ET₀ were prioritized, such as temperature, solar radiation, humidity, and wind speed, ensuring that the model retained the most critical physical drivers.

Figure 5 presents the updated correlation matrix after multicollinearity reduction.

Figure 5.

Correlation matrix after multicollinearity reduction.

The resulting dataset retained key indicators:

Climatic: Mean surface temperature (lst\_mean), solar radiation (solar\_radiation\_mean), humidity (humidity\_min, humidity\_max), wind speed (wind\_speed\_mean), and cloud cover (clouds\_mean).

Vegetative: Representative indices such as maximum NDVI (ndvi\_max), minimum SAVI (savi\_min), and mean NDMI (ndmi\_mean).

ET₀: Primary target variable.

ET₀ analysis: The temporal evolution of ET₀ exhibits a well-defined seasonal cycle (see Figure 6), with values increasing sharply during spring, peaking between July and August, and subsequently declining toward the winter minimums observed in December–January. Daily ET₀ ranges from approximately 1 $mm \cdot d^{- 1}$ during winter to over 6 $mm \cdot d^{- 1}$ at summer peaks, closely mirroring annual variations in temperature, solar radiation, and atmospheric demand. Interannual variability is also shown, with the summer of 2023 displaying slightly elevated peak values that may be associated with anomalously warm or dry conditions in the region. Superimposed on these seasonal patterns are short-term fluctuations driven by synoptic-scale weather events, including episodes of cloud cover, precipitation, and strong winds, which introduce substantial noise into the raw series.

Figure 6.

ET₀ time series from January 2022 to March 2024 showing the seasonal cycle and interannual variability.

To improve the interpretability of the signal, a three-step preprocessing procedure was applied. First, abrupt and unrealistic changes exceeding 50% relative to the preceding day were flagged as outliers and removed, as these values were inconsistent with the expected seasonal evolution and likely originated from measurement artefacts. Missing values resulting from this filtering were then linearly interpolated to ensure temporal continuity while avoiding bias in subsequent modelling. Finally, a 7-day moving average was applied to smooth high-frequency noise, attenuating short-term variability while preserving the underlying seasonal cycle.

The effect of this procedure is shown in Figure 7, which compares the raw and pre-processed series. The smoothed curve highlights the principal seasonal dynamics and enables a clearer visualization of anomalous periods, such as extreme heat events, without the distraction of erratic day-to-day fluctuations. This preprocessing step thus provides a cleaner and more robust representation of ET₀, improving its suitability for downstream modelling and climatological interpretation.

Figure 7.

Comparison of raw ET₀ data (dashed line) and pre-processed series after outlier removal, interpolation, and 7-day smoothing (solid line).

Machine and DL models

To evaluate the predictive capacity of data-driven approaches for ET₀ estimation, a set of ML and DL models were implemented and compared. Classical ML models were used as a performance baseline, while DL architectures were explored for their ability to capture non-linear relationships and temporal dependencies in input variables (Goodfellow et al., 2016; Hastie et al., 2009). This comparative approach enables a comprehensive assessment of model complexity versus predictive performance.

ML models: Classical ML models were chosen to establish a performance baseline against which the more complex DL models could be compared.

Four classical ML algorithms were selected to provide interpretable baselines and to benchmark the added value of more complex architectures.

Linear regression (LR) was used as the simplest reference model, assuming a linear relationship between predictors and ET₀. The model can be expressed as:

Y = β_{0} + \sum_{j = 1}^{p} β_{j} X_{j} + ϵ,

(2)

where Y denotes ET₀,

X_{j}

are the predictor variables,

β_{0}

and

β_{j}

are the intercept and regression coefficients, respectively, and

ϵ

represents the error term. Despite its simplicity, LR provides valuable insights into the direct, additive effects of all predictor variables.

DTs were employed to model non-linear interactions via a hierarchical structure of binary splits. Their interpretability allows for the identification of the most influential predictors, although they are prone to overfitting; this was controlled by applying pruning strategies (Quinlan, 1986).

RF regressors (RFR) extend DTs by constructing an ensemble of trees using bootstrap sampling and random feature selection, thereby reducing variance and improving generalization performance (Breiman, 2001). This model is particularly robust to noise and non-linear interactions in climatic data.

K-nearest neighbours (KNN) regressors were included as a non-parametric alternative capable of capturing localized relationships. Predictions are based on the similarity between the input and historical weather patterns, making KNN particularly sensitive to neighbourhood size and feature scaling (Fix and Hodges Jr, 1952).

DL models: These models allow a systematic comparison between interpretable, low-complexity approaches and highly flexible architectures capable of learning hierarchical and temporal representations from the data. The following models were implemented.

Multilayer perceptrons (MLPs) were used as the baseline neural architecture, consisting of fully connected layers with non-linear activation functions. MLPs can approximate arbitrary non-linear functions and thus provide a benchmark for evaluating whether additional temporal modelling improves performance (Rumelhart et al., 1986).

CNNs were adapted to handle univariate and multivariate input time series as one-dimensional sequences. Their convolutional filters learn local temporal patterns, such as short-term fluctuations or multi-day trends, which can significantly influence daily ET₀ (LeCun et al., 2002).

RNNs, including their gated variants, were employed to model long-term dependencies by maintaining a memory state across sequential inputs. This capability is particularly relevant for processes such as evapotranspiration, which are influenced by multi-day accumulations of solar radiation, humidity, and soil moisture (Elman, 1990).

Metrics

To comprehensively evaluate model performance, four complementary metrics were used (Hastie et al., 2009; James et al., 2013). These metrics capture different aspects of predictive accuracy and allow a robust comparison between models.

Coefficient of determination (R²): This metric measures the proportion of variance in the observed data explained by the model. A value of 1 indicates a perfect fit, while 0 suggests the model provides no explanatory power.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(3)

y_{i}

: observed value.

○

${\hat{y}}_{i}$ : predicted value.

○

$\bar{y}$ : mean of the observed values.

○

$n$ : number of observations.

Although widely used, a high R² value does not guarantee that the model has captured the true underlying relationship, especially for non-linear models. It should be interpreted alongside other metrics to assess a model's ability to generalize to new data.

• Mean squared error (MSE): MSE quantifies the average squared difference between observed and predicted values. It penalizes larger errors more heavily, making it sensitive to significant prediction errors, and is a common choice for a loss function during model training.

MSE = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(4)

• Root mean squared error: RMSE is the square root of the MSE. It provides a measure of the typical magnitude of the error in the same units as the target variable (ET₀). Like MSE, it is sensitive to outliers due to the squaring of residuals before averaging.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(5)

• Mean absolute error (MAE): MAE computes the average absolute difference between predictions and actual values. Unlike MSE and RMSE, it does not disproportionately penalize large errors, making it more robust to outliers. Its value is in the same units as the target variable (ET₀), offering a more intuitive measure of the average error.

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(6)

-------------------------------------------------------------------------------------------------------

The combined use of R², MSE, RMSE, and MAE provides a balanced assessment of model performance: R² evaluates overall explanatory power, MSE and RMSE highlight large deviations, and MAE delivers an intuitive measure of typical predictive error.

Experimental design

A series of experiments were conducted to systematically assess the performance of the ML and DL models in predicting ET₀. The evaluation relied on the metrics described in Section Metrics, ensuring a comprehensive assessment of predictive accuracy. To obtain robust and unbiased performance estimates, all experiments were carried out using 10-fold cross-validation, which mitigates the risk of overfitting and accounts for data variability.

The experimental protocol was structured in three sequential stages. First, plot-specific modelling was performed, in which models were trained and validated independently for each experimental plot to establish a performance baseline under homogeneous conditions. Second, a generic modelling phase was executed, pooling data from all plots and applying feature selection and multicollinearity reduction techniques to evaluate the effect of input dimensionality on model generalization. Finally, a model optimization phase was conducted, focusing on hyperparameter tuning and architectural refinement of the best-performing models to maximize predictive performance.

First experiment: Plot-specific modelling and outlier impact. The first experimental stage focused on plot-specific modelling to establish a baseline for predictive performance and assess model generalizability across different data sources and geographical locations. For each of the three agricultural plots, two independent datasets were used. The former derived from satellite RS products and the latter from OpenWeather meteorological records. This setup allowed evaluation of the models under both high-resolution geospatial data and gridded climate service data. A descriptive summary of the study data used in this experiment is provided in Table 7.

Table 7.

Descriptive summary of the study data for experiment 1.

Source	Plot	Values	Temporal Frequency	Start Date	End Date
Satellite	1	128	Every 5 days	05-01-2022	31-12-2023
Satellite	2	130	Every 5 days	05-01-2022	31-12-2023
Satellite	3	174	Every 2-3 days	03-01-2022	19-06-2023
OpenWeather	1/2	524	Daily	10-01-2022	07-03-2024
OpenWeather	3	524	Daily	10-01-2022	20-01-2024

Three baseline DL architectures were implemented to model the relationship between inputs variables and ET₀; a MLP, a 1D-CNN, and a SimpleRNN.

The MLP consisted of three fully connected layers with 64, 32, and 1 units, respectively. ReLU activation functions were applied to the hidden layers, and the model was optimized using Adam (learning rate = 0.001) with MSE as the loss function.

The 1D-CNN architecture used a convolutional layer with 256 filters and a kernel size of 2, followed by ReLU activation, flattening, and two dense layers (64 units and a single output neuron). This architecture was designed to extract short-term temporal features from the input series.

The SimpleRNN model comprised three stacked recurrent layers with output sizes of 5, 10, and 20, respectively, followed by a dense output layer. This configuration was chosen to capture sequential dependencies and memory effects in the input time series, which are relevant for evapotranspiration modelling.

Second experiment: Generic modelling with multicollinearity mitigation. Building on Experiment 1, we next generalize the model by pooling data from all plots and applying the feature reduction procedure described in the ‘Data processing’ section. The second experimental stage aimed to build a generalized modelling framework by consolidating satellite-derived data from all plots into a single dataset. This approach was designed to evaluate whether a single model could be trained to predict ET₀ across heterogeneous locations without plot-specific retraining, thus improving scalability and generalization. Three modelling scenarios were explored. In the first scenario, all available variables were retained after standard preprocessing and outlier removal. The second scenario applied statistical filtering, discarding variables with a p-value greater than 0.02 based on Ordinary Least Squares (OLS) regression analysis. In the third scenario, multicollinearity was mitigated by computing Pearson correlation coefficients among predictors, retaining only the most representative features from highly correlated groups. This sequential process of feature reduction aimed to balance predictive performance with model parsimony and numerical stability.

OLS regression was employed to quantify variable importance and assess model diagnostics under each scenario. The initial model (Scenario 1) achieved an R² of 0.80 but exhibited severe multicollinearity and autocorrelation. Removing non-significant variables (Scenario 2) slightly reduced R² to 0.78, while worsening multicollinearity due to the disproportionate influence of a smaller set of correlated predictors. The final approach (Scenario 3), which included multicollinearity elimination, yielded a robust yet slightly lower R² of 0.77 and markedly improved numerical stability, as evidenced by a reduced condition number of 918. This trade-off was considered acceptable as it produced a more reliable model for downstream prediction tasks. The consolidated dataset used in this experiment is summarized in Table 8, while the neural network architectures remained identical to those implemented in the first experiment.

Table 8.

Descriptive summary of the consolidated satellite dataset used in experiment 2.

Source	Values	Temporal Frequency	Start Date	End Date
Satellite	402	Every 2–5 days	20-01-2022	19-06-2023

Third experiment: Optimization of CNN and RNN architectures. Based on the consolidated dataset from Experiment 2, we optimized the CNN and RNN architectures to assess the effect of depth and regularization strategies on predictive performance. The third experimental stage focused on optimizing the baseline CNN and RNN architectures to assess the effect of architectural complexity and regularization on predictive performance. The same consolidated satellite dataset from Experiment 2 was used to ensure comparability. The optimization process explored modifications such as stacking additional convolutional or recurrent layers, adjusting the number of filters and units, and incorporating regularization techniques (particularly dropout and pooling layers) to reduce overfitting and improve generalization.

Three CNN variants and one extended RNN model were evaluated. The baseline CNN_1 replicated the configuration used in previous experiments, consisting of a single 1D convolutional layer with 256 filters and kernel size of 2, ReLU activation, and two fully connected layers with 64 and 1 units, respectively. The second variant, CNN_2, introduced regularization through a dropout layer and increased depth by stacking two convolutional layers (128 and 64 filters), followed by max pooling, flattening, and two dense layers (32 units and one output). The most complex configuration, CNN_3, comprised three sequential convolutional layers (64 filters each) operating on an input shape of (5, 1), followed by a global max pooling layer, a dense layer with 64 units, dropout regularization, and a final output neuron.

For recurrent modelling, an extended RNN architecture was implemented, expanding on the baseline SimpleRNN by stacking five recurrent layers with progressively increasing and decreasing hidden sizes (5, 10, 20, 20, and 10 units). This deeper recurrent network was designed to capture richer temporal dependencies and multi-scale patterns within the time series before passing activations to the final dense output layer.

This optimization phase enabled a systematic exploration of how network depth, feature extraction capacity, and regularization strategies influence ET₀ prediction performance, ultimately identifying architectures that best balance model expressiveness and generalization capability.

Model interpretability: SHAP analysis

To address the need for model transparency and identify the primary drivers of ET₀ estimation, an interpretability analysis was performed using SHAP. While DL models like RNNs and CNNs often operate as ‘black boxes’, SHAP values provide a mathematically consistent way to attribute the contribution of each input feature to the final prediction. This analysis was conducted using XGBoost as a surrogate model due to its high performance in complex predictive tasks (Chen and Guestrin, 2016), its proven efficiency in meteorological applications, and its computational compatibility with the SHAP framework (specifically through TreeSHAP) (Lundberg and Lee, 2017). This step ensures that the model's predictive logic aligns with established thermodynamic and aerodynamic principles of evapotranspiration.

Results and discussion

First experiment: Plot-specific modelling and outlier impact

The first experimental stage aimed to establish a performance baseline by training and evaluating models on plot-specific datasets derived from satellite and OpenWeather data. This experiment was conducted in two phases: initially using the raw datasets, and subsequently with outliers removed to quantify their effect on predictive accuracy. As shown in Tables 9 and 10, outlier removal substantially improved model performance across all three plots, leading to higher R² values and lower MSE. The CNN consistently outperformed other models on the cleaned satellite data, achieving the highest R² (0.93) and the lowest MSE (0.21) in Plot 2. Both CNN and RNN architectures demonstrated a clear advantage over classical ML models (DT, KNN, LR, RFR), confirming the value of DL for capturing nonlinear relationships and temporal dependencies in ET₀ dynamics.

Table 9.

Experiment 1 results for raw satellite data across the three plots.

	P1				P2				P3
	MSE	RMSE	MAE	R ²	MSE	RMSE	MAE	R ²	MSE	RMSE	MAE	R ²
DT	0.78	0.88	0.48	0.77	0.66	0.81	0.47	0.53	0.66	0.81	0.47	0.78
KNN	0.79	0.89	0.64	0.77	0.71	0.84	0.64	0.77	0.71	0.84	0.64	0.77
LR	0.83	0.91	0.71	0.76	0.64	0.80	0.63	0.79	0.64	0.80	0.63	0.79
RFR	1.31	1.15	0.80	0.62	0.52	0.72	0.53	0.83	0.70	0.84	0.53	0.83
MLP	0.62	0.79	0.56	0.81	0.68	0.82	0.64	0.77	0.52	0.72	0.64	0.77
CNN	0.80	0.89	0.51	0.77	0.39	0.62	0.44	0.88	0.35	0.59	0.42	0.89
RNN	0.50	0.71	0.49	0.86	0.49	0.70	0.51	0.84	0.52	0.72	0.50	0.81

MLP: multilayer perceptron; CNN: convolutional neural network; RNN: recurrent neural network; DT: decision tree; RMSE: root mean square error; LR: linear regression; RFR: random forest regressor; KNN: K-nearest neighbour; MSE: mean squared error; MAE: mean absolute error; R²: coefficient of determination.

Table 10.

Experiment 1 results for satellite data after outlier removal.

	P1				P2				P3
	MSE	RMSE	MAE	R ²	MSE	RMSE	MAE	R ²	MSE	RMSE	MAE	R ²
DT	0.56	0.75	0.40	0.84	0.46	0.668	0.38	0.82	0.54	0.73	0.39	0.80
KNN	0.70	0.84	0.62	0.79	0.63	0.79	0.61	0.78	0.60	0.77	0.56	0.79
LR	0.57	0.75	0.57	0.82	0.46	0.68	0.54	0.84	0.63	0.79	0.60	0.78
RFR	0.60	0.77	0.56	0.81	0.53	0.73	0.55	0.81	0.48	0.69	0.47	0.83
MLP	0.69	0.83	0.66	0.80	0.76	0.87	0.68	0.73	0.61	0.78	0.60	0.78
CNN	0.33	0.57	0.37	0.91	0.21	0.46	0.33	0.93	0.37	0.61	0.38	0.86
RNN	0.41	0.64	0.48	0.88	0.39	0.62	0.46	0.87	0.44	0.66	0.47	0.84

MLP: multilayer perceptron; CNN: convolutional neural network; RNN: recurrent neural network; RFR: random forest regressor; DT: decision tree; RMSE: root mean square error; LR: linear regression; KNN: K-nearest neighbour; MSE: mean squared error; MAE: mean absolute error; R²: coefficient of determination.

When using OpenWeather data, models achieved slightly higher R² values overall (see Table 11), benefiting from the larger sample size and daily temporal resolution. However, the performance gap relative to satellite-based models was marginal and given the broader accessibility and reproducibility of open satellite data, subsequent experiments focused exclusively on the satellite-derived dataset.

Table 11.

Experiment 1 results for OpenWeather data.

	P1/2				P3
	MSE	RMSE	MAE	R ²	MSE	RMSE	MAE	R ²
DT	0.33	0.57	0.41	0.91	0.40	0.63	0.43	0.86
KNN	0.43	0.66	0.50	0.88	0.41	0.64	0.49	0.86
LR	0.19	0.44	0.34	0.95	0.24	0.49	0.37	0.92
RFR	0.21	0.46	0.35	0.94	0.25	0.50	0.37	0.91
MLP	0.21	0.46	0.34	0.94	0.24	0.49	0.38	0.92
CNN	0.16	0.40	0.29	0.95	0.21	0.46	0.35	0.93
RNN	0.17	0.41	0.31	0.95	0.22	0.47	0.35	0.92

Second experiment: Generic modelling with multicollinearity mitigation

The second experimental stage aimed to construct a generalized modelling framework by pooling satellite data from all plots into a single consolidated dataset. This experiment was designed to evaluate the impact of preprocessing and feature selection on model performance, with the goal of producing a robust model applicable across heterogeneous locations without plot-specific retraining. Three modelling scenarios were compared. In Scenario 1, all available variables were retained following outlier cleaning. Scenario 2 employed statistical filtering by discarding variables with p-values greater than 0.02 as determined by an OLS regression analysis. Scenario 3 extended this approach by also mitigating multicollinearity, removing redundant predictors identified through pairwise Pearson correlations exceeding $r > 0.9$ and retaining only the most representative features. This stepwise process progressively improved model interpretability and numerical stability, enabling a direct comparison of predictive performance under varying input dimensionality.

OLS regression was used not only to assess variable significance but also to diagnose model quality, under the assumptions of linearity, independence, homoscedasticity, and normality of residuals. The initial model (Scenario 1) achieved an R² of 0.80 but suffered from strong multicollinearity and autocorrelation effects, as indicated by an elevated condition number. Removing non-significant variables (Scenario 2) slightly reduced R² to 0.78 and, paradoxically, exacerbated multicollinearity by increasing the relative influence of correlated predictors. The final scenario (Scenario 3) resulted in a slightly lower R² of 0.77 but markedly reduced the condition number to 918, indicating improved numerical stability. This trade-off was considered favourable, as the resulting model offered a more interpretable and statistically sound baseline.

Tables 12, 13, and 14 summarize model performance under the three scenarios. In all cases, CNN and RNN architectures consistently outperformed traditional models in terms of R² and error metrics, highlighting their robustness to correlated features and their capacity to capture nonlinear relationships. Importantly, the reduction of multicollinearity did not degrade DL performance, reinforcing the generalization ability of these architectures.

Table 12.

Experiment 2 results after outlier cleaning (scenario 1).

	MSE	RMSE	MAE	R ²
DT	0.63	0.79	0.46	0.80
KNN	0.62	0.79	0.59	0.80
LR	0.72	0.85	0.66	0.77
RFR	0.48	0.69	0.49	0.85
MLP	0.77	0.88	0.70	0.75
CNN	0.38	0.62	0.43	0.88
RNN	0.43	0.66	0.47	0.86

Table 13.

Experiment 2 results after outlier cleaning and removal of non-significant variables (scenario 2).

	MSE	RMSE	MAE	R ²
DT	0.75	0.87	0.51	0.76
KNN	0.65	0.81	0.60	0.79
LR	0.69	0.83	0.65	0.78
RFR	0.53	0.73	0.53	0.83
MLP	0.75	0.87	0.68	0.76
CNN	0.55	0.74	0.58	0.82
RNN	0.50	0.71	0.53	0.84

Table 14.

Experiment 2 results after outlier cleaning, variable filtering, and multicollinearity mitigation (scenario 3).

	MSE	RMSE	MAE	R ²
DT	0.81	0.90	0.55	0.74
KNN	0.67	0.82	0.62	0.78
LR	0.73	0.85	0.69	0.76
RFR	0.80	0.89	0.55	0.82
MLP	0.55	0.74	0.71	0.74
CNN	0.53	0.73	0.55	0.83
RNN	0.54	0.73	0.56	0.82

Third experiment: Optimization of CNN and RNN architectures

The third experimental stage focused on refining the baseline DL architectures to maximize predictive performance. Specifically, we investigated the effect of architectural modifications, such as increasing network depth, incorporating additional convolutional layers, and applying regularization strategies like dropout-on the ability of CNN and RNN models to capture the temporal dynamics of ET₀. All models were trained and evaluated using the consolidated satellite dataset from Experiment 2 to ensure comparability. Table 15 summarizes the results of this optimization study. Among the evaluated models, the optimized RNN achieved the best performance, with an R² of 0.92, an MSE of 0.23, and an MAE of 0.23. These results confirm the ability of recurrent architectures to effectively model sequential dependencies in agroclimatic data, outperforming both the baseline CNN configurations and the RFR. Notably, CNN_3, which incorporated multiple convolutional layers and dropout regularization, exhibited a marked improvement over the baseline CNN_1, reinforcing the benefit of deeper architectures for feature extraction.

Table 15.

Experiment 3 results after cnn/rnn architecture optimization.

	MSE	RMSE	MAE	R ²
RFR	0.60	0.78	0.56	0.81
CNN_1	0.70	0.84	0.66	0.76
CNN_2	0.42	0.65	0.47	0.87
CNN_3	0.39	0.63	0.43	0.87
RNN	0.23	0.48	0.32	0.92

CNN: convolutional neural network; RNN: recurrent neural network; RFR: random forest regressor; RMSE: root mean square error; MSE: mean squared error; MAE: mean absolute error; R²: coefficient of determination.

Figures 8, 9, and 10 show these findings. The parity plot for the RNN (see Figure 8) shows predictions closely aligned with the 1:1 reference line, indicating minimal systematic bias. The histogram of residuals (see Figure 9) displays a sharp concentration of errors around zero, highlighting the model's high overall accuracy. Finally, the time series example from a cross-validation fold (see Figure 10) demonstrates that the RNN effectively captures intra-seasonal patterns and peak evapotranspiration events, though slight under- and over-predictions remain visible at extreme values, suggesting potential areas for further refinement.

Figure 8.

Parity plot comparing predicted and observed ET₀ values for the optimized recurrent neural network (RNN).

Figure 9.

Histogram of residuals for the optimized recurrent neural network (RNN) model, showing error distribution concentrated near zero.

Figure 10.

Example of predicted versus observed ET₀ values in one cross-validation iteration.

These results confirm that optimizing DL architectures, particularly through deeper recurrent networks, yields substantial gains in predictive accuracy. The strong performance of the RNN highlights its potential as a core modelling approach for sequential agroclimatic prediction tasks and for supporting operational decision-making in precision irrigation management.

Interpretability analysis and biophysical consistency

The exceptional predictive performance of the RNN model (R² = 0.92) is rooted in its ability to internalize complex land-atmosphere interactions. To validate this, a SHAP analysis was conducted to bridge the gap between the network's ‘black box’ nature and the underlying physical processes. As illustrated in Figure 11, the model exhibits a clear hierarchical structure dominated by LST (lst\_mean). Acting as the primary forcing variable, LST yields a mean absolute SHAP value of 1.33, significantly outweighing secondary spectral predictors.

Figure 11.

SHAP interpretation of the RNN model. (a) Global feature importance highlighting the dominance of lst\_mean. (b) Summary plot showing the directional impact of biophysical predictors. The R² = 0.92 is supported by the strong alignment of thermal (LST) and vegetation (EVI2/NDVI) drivers with established ecosystem dynamics. RNN: recurrent neural network; SHAP: SHapley Additive exPlanations; NDVI: normalized difference vegetation index; EVI: enhanced vegetation index; LST: land surface temperature.

The SHAP summary plot reveals a robust biophysical alignment that justifies the model's high accuracy. The positive correlation between elevated LST values (red points) and increased model output confirms that the RNN correctly identifies thermal energy as the primary driver. Interestingly, the model captures divergent behaviours among vegetation indices: while evi2\_mean and evi\_mean contribute positively, high values of ndvi\_mean exert a negative pressure on the output. This indicates that the RNN is not merely over-fitting spectral greenness; rather, it is likely accounting for the ‘saturation effect’ of NDVI or the evaporative cooling feedback loops typically associated with dense canopy covers.

Furthermore, the inclusion of the moisture index (ndmi\_mean) provides a critical baseline for water availability, allowing the model to distinguish between energy-limited and water-limited regimes. The convergence of these factors-thermal forcing, structural vegetation dynamics, and moisture constraints-within the RNN's temporal learning framework explains its superior generalization capability. These findings demonstrate that the 0.92 accuracy is not a product of stochastic correlation, but a reflection of the model's capacity to mirror the non-linear physical principles governing the ecosystem.

Discussion

This study provides robust evidence supporting the use of DL methodologies for ET₀ estimation based solely on open-access data. Our results demonstrate that RNNs deliver superior predictive performance, achieving a coefficient of determination of R² = 0.92, outperforming CNNs, MLPs, and classical ML models. The advantage of RNNs can be attributed to their architectural capacity to model temporal dependencies, enabling them to capture lagged interactions and cumulative effects of climatic variables that drive evapotranspiration processes. Preserving temporal context is crucial for accurately characterizing intra- and inter-seasonal dynamics. This result also aligns with findings reported in previous studies on sequential environmental modelling.

The high predictive accuracy (R² = 0.92) is underpinned by the model's biophysical consistency. By prioritizing LST as the primary forcing variable, the RNN aligns with energy-balance principles, where thermal dynamics drive the modelled processes. Furthermore, the divergent response between NDVI and EVI2 in the SHAP analysis is scientifically significant; it indicates that the RNN has learned to mitigate the saturation effects typical of NDVI in dense vegetation. By counter-balancing these indices, the architecture achieves a more robust generalization, proving that the high performance is a result of capturing complex land-surface interactions rather than simple numerical overfitting.

The superior performance of RNNs over CNNs can be explained by the temporal nature of ET₀ dynamics. Evapotranspiration is strongly influenced by multi-day accumulations of radiation, humidity, and soil moisture. While 1D-CNNs capture local temporal windows, they lack the long-range memory preserved by RNNs. This explains why RNNs consistently yielded higher accuracy in sequential agroclimatic prediction tasks.

An important methodological contribution of this work lies in the systematic treatment of input features. Through OLS regression analysis and correlation-based feature reduction, we eliminated non-significant and highly collinear predictors, improving model interpretability and computational efficiency while maintaining predictive accuracy. This stepwise feature selection strategy not only reduced redundancy but also mitigated numerical instabilities, as indicated by the reduced condition number in Experiment 2. These improvements were particularly relevant for linear models but also provided cleaner, better-conditioned inputs for DL architectures, thereby enhancing their generalization ability. While recent advances in DL have introduced hybrid models and metaheuristic-optimized frameworks (such as RNN long short-term memory (LSTM) or ANN-GWO) that achieve high precision, these typically depend on high-quality data from local physical meteorological stations. Our study diverges from this trend by focusing on an ‘infrastructure-free’ paradigm. We explore how DL architectures can compensate for the lack of in-situ sensors by effectively fusing heterogeneous, open-access data from satellite imagery and global climate services.

However, despite these advances, there is still a gap in infrastructure-free models. Therefore, the main contribution of this research is the demonstration of an infrastructure-independent, reproducible approach for ET₀ estimation. By leveraging exclusively RS products and publicly available climate service data, our models match or exceed the accuracy of approaches relying on traditional meteorological station networks, and in some cases surpass hybrid models that combine ground-based and RS inputs. This finding is of particular significance for regions with limited meteorological infrastructure, where station installation and maintenance are logistically and financially challenging. The proposed framework thus offers a scalable, cost-effective solution for operational water resource management.

Beyond methodological advances, the implications for agricultural practice are considerable. Accurate and spatially explicit ET₀ predictions enable improved irrigation scheduling, promote efficient water allocation, and contribute to sustainable agricultural production under increasing climate variability. In this context, our results support the integration of DL models, particularly RNN-based architectures, into digital agriculture platforms and decision-support systems, fostering data-driven strategies for enhancing water-use efficiency and climate resilience in irrigated systems.

It is worth noting that classical models such as LR and RFR remain valuable in contexts with limited datasets, where training speed, computational simplicity, or interpretability are priorities. In operational settings with scarce data or resource constraints, these models can still provide actionable insights, albeit with lower accuracy than DL approaches.

While simpler architectures such as the M5 Tree have shown competitive performance in specific case studies due to their transparent structure and lower computational requirements (Dadrasajirlou et al., 2022), they may lack the capacity to fully capture the complex, non-linear, and multi-dimensional spatial-temporal relationships inherent in satellite-derived data. Our findings suggest that, for an infrastructure-free framework relying on RS, the superior feature extraction capabilities of DL architectures (RNNs and CNNs) provide a necessary advantage over these simpler tree-based models.

Conclusions and future work

This research clarifies the trade-off between model complexity and data accessibility. We have shown that the inherent ability of RNNs to handle long-term dependencies in climatic time series allows for an ET₀ estimation that is independent of local infrastructure, representing a significant shift from traditional hybrid models that are constrained to sensor-dense environments.

Building on this premise, our study establishes a robust and reproducible DL framework for estimating ET₀ exclusively from open-access RS and climate service data. Among the evaluated approaches, RNNs consistently yielded the best performance, achieving an R² of 0.92, confirming their ability to capture the temporal dependencies inherent in agroclimatic time series.

This high predictive accuracy is further supported by SHAP interpretability analysis, which confirms that the model's internal logic is strictly aligned with biophysical principles. By identifying LST (lst\_mean) as the primary driving force and effectively integrating vegetation and moisture dynamics, the RNN demonstrates a physically consistent behaviour. This ensures that the model is not merely a ‘black-box’ numerical fit but a robust representation of the energy-balance processes governing evapotranspiration.

Importantly, this level of accuracy was obtained without reliance on conventional ground-based meteorological stations, validating the feasibility of an infrastructure-independent approach to ET₀ modelling. A key methodological contribution of this study lies in the rigorous feature selection process based on OLS regression and multicollinearity reduction. This procedure not only improved model interpretability and computational efficiency but also ensured numerical stability, providing a robust baseline for evaluating more complex models. Collectively, these contributions advance the state of the art by delivering a scalable, cost-effective solution for water resource monitoring that is particularly relevant for regions with limited or absent meteorological infrastructure. The broader implications of this work are significant. The proposed methodology:

Facilitates more informed irrigation scheduling,

Improves water-use efficiency, and

Supports sustainable agricultural production in the face of increasing climate variability.

By leveraging open data, the approach democratizes access to advanced decision-support tools, enabling equitable and data-driven water management at regional and global scales.

Building upon these findings, future research will pursue several avenues. First, we plan to expand the dataset to include additional agroecological zones, climatic regimes, and crop types worldwide, thereby enhancing model generalizability and enabling robust performance across diverse production systems. Second, we will investigate the integration of additional explanatory variables, such as high-resolution soil moisture measurements, soil physical and chemical characteristics, and crop-specific coefficients, to extend the modelling framework from ET₀ to actual crop evapotranspiration (ET_C). Finally, we will explore more advanced neural architectures, including LSTM networks and Transformer-based models, to further improve the capacity for capturing long-range dependencies and enhancing forecasting accuracy. The ultimate objective is to transition from ET₀ estimation to predictive modelling capable of supporting proactive irrigation management and adaptive strategies for climate-resilient agriculture.

Overall, the proposed framework represents a step toward the democratization of PA. By eliminating the dependency on costly in situ maintenance, this model provides a scalable tool for sustainable water management in regions where meteorological data scarcity previously hindered the adoption of advanced irrigation strategies.

Footnotes

ORCID iDs

Virginia C Sánchez

Martín González

Julio Fernández-Pedauyé

Carlos T Calafate

José M Cecilia

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially funded by the R&D project MORELLINO (CIPROM/2023/29) through the ‘Direcció General de Ciència i Investigació’ (Generalitat Valenciana, Spain); by the European Union NextGenerationEU/PRTR and project AM-DS (INREED/2024/1) under the Recovery, Transformation and Resilience Plan (GVANEXT, Generalitat Valenciana); and by the European Union under the European Regional Development Fund (ERDF) Programme Comunitat Valenciana 2021–2027, through the IVACE + i Innovation calls for ‘Strategic Cooperation Projects’ (Ref. INNEST/2025/486). This research is also part of the R&D project CPP2021-008722, funded by MCIN/AEI (10.13039/501100011033) and the European Union through NextGenerationEU/PRTR. Additionally, it was supported by the Universitat Politècnica de València (UPV) through the Predoctoral Research Programme (Subprogramme 1, Grant PAID-01–24) funded by the Vice-Rectorate for Research.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Allen

Pereira

Raes

, et al. (1998) Crop evapotranspiration - Guidelines for computing crop water requirements. Number 56 in FAO Irrigation and Drainage Paper, FAO - Food and Agriculture Organization of the United Nations, Rome.

Breiman

(2001) Random forests. Machine Learning 45: 5–32.

Chen

Guestrin

(2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p.785–794. doi:10.1145/2939672.2939785.

Dadrasajirlou

Ghazvinian

Heddam

, et al. (2022) Reference evapotranspiration estimation using ANN, LSSVM, and M5 tree models (case study: Of babolsar and ramsar regions, Iran). Journal of Soft Computing in Civil Engineering 6: 101–118.

Elman

(1990) Finding structure in time. Cognitive Science 14: 179–211.

Escamilla-García

Soto-Zarazúa

Toledano-Ayala

, et al. (2020) Applications of artificial neural networks in greenhouse technology and overview for smart agriculture development. Applied Sciences 10: 3835.

FAO (2021) The state of the world's land and water resources for food and agriculture - Systems at breaking point. Synthesis report 2021. Rome, Italy: Food and Agriculture Organization of the United Nations. doi: https://doi.org/10.4060/cb7654en.

Fix

Hodges

Jr (1952) Discriminatory analysis-nonparametric discrimination: Small sample performance. Technical Report. University of California, Berkeley. (Accessed 16–09–2025).

Zhao

, et al. (2022) Prediction of greenhouse tomato crop evapotranspiration using XGBoost machine learning model. Plants 11: 1923.

10.

Goodfellow

Bengio

Courville

(2016) Deep learning. Cambridge, MA, USA: MIT Press.

11.

Gyarmati

Mizik

(2020) The present and future of the precision agriculture. in: 2020 IEEE 15th International Conference of System of Systems Engineering (SoSE): p.593–596. doi:10.1109/SoSE50414.2020.9130481.

12.

Hastie

Tibshirani

Friedman

(2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY, USA: Springer.

13.

Ikram

RMA

Mostafa

Chen

, et al. (2023) Advanced hybrid metaheuristic machine learning models application for reference crop evapotranspiration prediction. Agronomy 13: 98.

14.

IMIDA (n.d.) Sistema de Información Agrario de Murcia (SIAM). Plataforma web. URL: http://siam.imida.es/. (Accessed 20-03-2026).

15.

James

Witten

Hastie

, et al. (2013) An introduction to statistical learning: with applications in R. 103. New York, NY, USA: Springer.

16.

Katerji

Rana

(2014) FAO-56 methodology for determining water requirement of irrigated crops: Critical examination of the concepts, alternative proposals and validation in Mediterranean region. Theoretical and Applied Climatology 116: 515–536.

17.

Khairan

Zubaidi

Muhsen

, et al. (2023) Parameter optimisation-based hybrid reference evapotranspiration prediction models: A systematic review of current implementations and future research directions. Atmosphere 14: 77.

18.

LeCun

Bottou

Bengio

, et al. (2002) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86: 2278–2324.

19.

Lundberg

Lee

(2017) A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30: 4765–4774.

20.

Misra

Dixit

Al-Mallahi

, et al. (2022) Iot, big data, and artificial intelligence in agriculture and food industry. IEEE Internet of Things Journal 9: 6305–6324.

21.

Quinlan

(1986) Induction of decision trees. Machine Learning 1: 81–106.

22.

Ravindran

Bhaskaran

SKM

Ambat

SKN

(2021) A deep neural network architecture to model reference evapotranspiration using a single input meteorological parameter. Environmental Processes 8: 1567–1599.

23.

Rayhana

Xiao

Liu

(2020) Internet of things empowered smart greenhouse farming. IEEE Journal of Radio Frequency Identification 4: 195–211.

24.

Rumelhart

Hinton

Williams

(1986) Learning representations by back-propagating errors. Nature 323: 533–536.

25.

Sharma

Tripathi

(2021) Artificial intelligence in agriculture: A literature survey. International Journal All Research Education. Science Methods 9: 510–513.

26.

Sharma

(2021) Artificial intelligence in agriculture: A review. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). p.937–942. doi:10.1109/ICICCS51141.2021.9432187.

27.

SIAR (2012) Cálculo de ETo: Método de Penman–Monteith. Technical Report. Sistema de Información Agroclimática para el Regadío (SIAR). URL: https://www.mapa.gob.es/dam/mapa/contenido/desarrollo-rural/temas/gestion-sostenible-de-regadios/anteriorregadio_sep_22/siar-2011/documentos/calculo-et0dic12.pdf. (Accessed 16-09-2025).

28.

Sistema de Información Agraria de Murcia (SIAM), Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario (IMIDA), 2010. Estimación de la Evapotranspiración de Referencia. Ecuación de Penman–Monteith como método de estimación estándar de la ET0. Technical Report. IMIDA. (Accessed 16-09-2025).

29.

Varzakas

Smaoui

(2024) Global food security and sustainability issues: The road to 2030 from nutrition and sustainable healthy diets to food systems change. Foods 13: 963.

30.

Vaz

Schütz

Guerrero

, et al. (2022) Hybrid neural network based models for evapotranspiration prediction over limited weather parameters. IEEE Access 11: 963–976..

31.

Vaz

Schütz

Guerrero

, et al. (2023) Impact of employing weather forecast data as input to the estimation of evapotranspiration by deep neural network models. In: International Conference on Environment Sciences and Renewable Energy. Singapore: Springer, 51–66.

32.

Yamaç

(2021) Artificial intelligence methods reliably predict crop evapotranspiration with different combinations of meteorological data for sugar beet in a semiarid area. Agricultural Water Management 254: 106968.

33.

Youssef

Peters

El-Shirbeny

, et al. (2024) Enhancing irrigation water management based on eto prediction using machine learning to mitigate climate change. Cogent Food & Agriculture 10: 2348697.

34.

Zhao

, et al. (2022) Prediction model for daily reference crop evapotranspiration based on hybrid algorithm in semi-arid regions of China. Atmosphere 13: 922.

A scalable deep learning framework for evapotranspiration estimation using open-access climate services and remote sensing

Abstract

Keywords

Introduction

2 Material and methods

2.1 Reference evapotranspiration (ET0) calculation

Use case setting

Data collection

Data processing

Machine and DL models

Metrics

Experimental design

Model interpretability: SHAP analysis

Results and discussion

First experiment: Plot-specific modelling and outlier impact

Second experiment: Generic modelling with multicollinearity mitigation

Third experiment: Optimization of CNN and RNN architectures

Interpretability analysis and biophysical consistency

Discussion

Conclusions and future work

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

References

2.1 Reference evapotranspiration (ET₀) calculation