UCL Department of Geography


Description Photo Here

Personal tools
Log in
This is SunRain Plone Theme
UCL Home  /  Geography  /  Research  /  Research Projects  /  Sentinels of Wheat  /  Crop yield estimation

Crop yield estimation

crop yield, winter wheat, Earth Observation, data assimilation, Sentinels, Copernicus, Gaofen, ESA, EU, NCEO, UCL, Assimila, CAAS, CAU, BNU, Peking University

Prof Philip Lewis, UCL and NCEO


email: p.lewis

February 2020

Regional crop monitoring and assessment with quantitative remote sensing and data assimilation




Crop yield estimation explainer

Here, we explain and illustrate the ideas behind our crop yield estimation system. This work builds on core support from NERC NCEO, ESA, the EU framework 2020 MULTIPLY programme, and is directly funded by the Newton Fund, through STFC and NSFC.

The work arises from a collaboration between UCL/NCEO, Assimila, CAAS, CAU, BNU, Peking University.

FieldTeamPhotoSome of the team, on fieldwork

photo from UKRI twitter post.


Earth Observation data

We have designed the monitoring system to be robust to sampling opportunities. This means that it can make use of any satellite data at suitable wavelengths and spatial resolutions. This includes optical data from the US Landsat missions and data from the Chinese Gaofen satellites, but the backbone is provided by data from the operational EU/ESA Sentinel-2 satellites. These are part of the EU-funded Copernicus programme, and use multiple satellites to provide global observations at better than weekly frequency at a spatial resolution of 10+ m. Such optical data (i.e. from around 400-2500 nm wavelength) are ideal for agricultural monitoring over much of the world, including both large farms and smallholders.



Radiative transfer models

Optical sensors on these satellites do not directly measure information that tells us about crop state. Instead, they measure sunlight, reflected from the Earth surface and transmitted through the atmosphere. Except when clouds 'get in the way', when they measure the cloud scattering, rather than that of the land surface.

To monitor the land surface, we must clear imagery of artefacts such as clouds, cloud shadows, dead pixels etc., and then try to estimate the land surface reflectance from our measurements of 'top of atmosphere' radiance. We then want to make sure that we are looking at the image pixels we are interested in (crop pixels, in this case).

We next wish to interpret the land surface reflectance in terms of the biophysical (structural and biochemical) properties that control the crop reflectance. These can be broadly characterised as the amount of vegetation (given as the Leaf Area Index, LAI), the leaf properties (typically chlorophyll, dry matter and water content) and the soil properties.

To deal with atmospheric effects and interpret the surface reflectance, we can use specialised radiative transfer models that tell us how varying the properties leads to variation in spectral reflectance. That is tricky enough, but harder still in many ways, is 'inverting' such a model, to give the mapping from spectral reflectance to e.g. vegetation properties. In recent years, fast approaches to provide estimates of this have been developed using machine learning.

The animation below illustrates the results of such inference over an agricultural field in the UK.

  • The top panel shows LAI over the (winter wheat) growing season. The red dots show LAI estimates for dates when Sentinel-2 satellite observations are available. The red bar provides an indication of uncertainty. 'NN' here means  (Artificial) 'Neural Network', a machine learning approach used to estimate LAI (and other properties) from spectral reflectance.
  • The second panel shows the spectral reflectance 'measured' by the sentinel-2 MSI instrument as blue squares, over the range 400-1000 nm. The red dotted line is for a form of quality test on the radiative transfer inference: it shows the spectral reflectance modelled by the radiative transfer model (ProSAIL here). A strong mis-match means the inference may be less reliable.
  • The lower panel shows images over the field. From left to right, these show: LAI; Chlorophyll concentration; brown pigment concentration (related to senescence), and a 'true colour' (RGB) image of the area. Each pixel represents 10 m in this mapping.




Microwave data

Optical data sources such as those above allow estimates of crop biophysical state to be inferred from the spectral reflectance as we have seen. But, a lot of the time clouds get in the way of seeing the land surface from space, so we also make use of observations from the microwave region, here, Sentinel-1 data. The C-band instrument on S-1 sends down a radar pulse, and measures how much radiation is scattered back to the sensor. Special processing synthesises a long antenna, allowing for high resolution observations to be made. The backscatter data are sensitive to vegetation amount, moisture, and soil properties, but not (most) cloud. A backscatter ration from two polarisations is found to provide quite a stable signal for tracking vegetation amount. So, even though the data are noisier and of lower information content than optical data, they fulfil the vital role of observing the surface in the presence of cloud.

The figure below shows observations over an agricultural site with a maize crop in central Ghana.

  • The upper panel shows LAI inferred from Sentinel-2 optical data. The red bar provides an indication of uncertainty. 'NN' here means  (Artificial) 'Neural Network', a machine learning approach used to estimate LAI (and other properties) from spectral reflectance.
  • The central panel shows the measured spectral reflectance over the range 400-2500 nm for Sentinel-2 observations. Note the complete lack of observations in July-September due to high cloud cover.
  • The lower panel shows the more regular and reliable microwave polarisation ratio from Sentinel-1. Even though these data have less information than the Sentinel-2 data, they allow us to track the crop biomass over the whole season, at high temporal frequency.


Sentinel 1 and 2 data

Crop models


Mechanistic models of crop development such as WOFOST allow us to link relatively simple parameterisations to crop growth, phenology and yield, driven by weather data.

crop model


If we had a well calibrated model and weather data, expressing the conditions local to a particular part of a particular field, we might expect the model to perform well, and allow for good tracking of crop status and any stresses, and good predictions of crop yield. Model calibration however requires quite a large set of agronomic data, so these models are only ever calibrated in a broad sense, for a particular crop with a particular set of practices.

A regionally-calibrated agronomic model with somewhat localised weather data is of great use in management and planning for farmers, regional and national authorities, as well as insurers etc. But it doesn't directly relate to what is happening in a particular area of a particular field. Even though they will be tied in to the weather in the year simulated and so show the right broad seasonal effects, different local conditions could give rise to a range of different scenarios of LAI development. This is illustrated in the figure below, that shows an ensemble of plausible LAI trajectories, each of which correspond to a slightly different set of model parameters and/or weather conditions that are likely representative of those found on the ground.

plausible lai

Most often in the past, crop model calibration has been used to produce a set of model parameters, i.e. the parameters that 'best' represent the range of agronomy data used in the calibration (typically with variations over space and time). That concept of calibration tends to ignore the fact that, taking into account the uncertainties in the data and model, we should more properly represent the calibration result by a set of statistical distributions of model parameters: some representation of their probability distribution functions (PDFs). We can provide better calibrations of the crop models then, by taking into account uncertainty. This gives us the ability to estimate output uncertainty.

The figure below shows a PDF of yield for winter wheat in Hengshui, in the North China Plain. Note that the distribution is quite wide, meaning that quite a range of different yields might arise in this area. We are still able to quantify the 'most likely' from the peak of the PDF, but it doesn't tell us about any particular field.

regional model yield PDF

Data Assimilation


We can provide some improvement on using regional models, with associated calibrated PDFs, by using coarse spatial resolution data from Earth Observation, and we have learned how to use data assimilation techniques to combine model and observational information. But the agricultural landscape in many countries varies at a higher resolution than this, so complicated 'scaling effects' must be taken into account in trying to user e.g. 1 km observations over fields that are smaller than this. The advantage of the coarse resolution data has been that it is high temporal frequency, typically giving observations every day or so.

We have used this combination of data and models to improve regional yield estimates in China by providing information through the Government CHARMS system. In addition, when we have only a partial set of observations and some predictions of the likely weather, we can make predictions of crop yield whilst the crop is still in the ground. As time advances and we get more information, we can refine and improve these predictions. We have also provided this sort of information for regional government in China. An example is given below.


Thanks to Copernicus, we now have the satellite observations at the right spatial and temporal frequency for agricultural monitoring.

The figure below illustrates how the regionally-calibrated crop models, with somewhat coarse resolution weather data provides a set of probably crop states (LAI in this case). The satellite data then provide spatially-localised 'measurements' of actual crop state (LAI here), The combination of the data and model information then refines the model calibration, and gives spatially localised estimates of yield, and a probability distribution function - PDF expressing the uncertainty of the yield estimate. The data-model 'merging' is down using data assimilation.

flow chart


The figure below shows the impact of the data assimilation on the crop yield PDF. Here we have moved the mean and shrunk the uncertainty.

yield posterior

The use of the system in forecasting is shown below. The lector's panel shows plausible LAI trajectories over time. The red dots illustrate Sentinel-2 LAI estimates (with uncertainty bar). The right panel shows the PDF of yield. Early in the season, the yield could take on a wide range of states. As we get more observations, the ensemble of possible states shrinks, and the uncertainty in yield prediction also shrinks. This would be further improved if observed weather were used to replace expected weather. Around a month before harvest, the yield estimate is quite stable, but even before then, we are limiting the range of expected yield considerably.




There are various outputs of the system, but the one of most general interest to users is a spatial map of yield (at 10m or so spatial resolution).

We can visualise this at a range of spatial scales to show the level of detail obtained. The images below show estimated yield (kg/ha) of winter wheat in 2017 over Hengshui in the North China Plain. We can visualise the data at different spatial resolutions. At the broad scale, we see a North-West to South-East gradient in crop productivity. This is driven by weather and soil. This regional information is needed mainly for reporting and planning.

Hengshui mean yield

As we zoom in, we can now see the full resolution of the Sentinel data coming into play. We can see individual fields: some performing better than others due very local conditions and farmer decisions on variety, fertiliser inputs, irrigation etc. We can also see within-field variations from these data, showing areas of higher- or lower-productivity. This is the sort of detailed information needed for local management and insurance.

Hengshui zoom


We have developed a capability to produce similar data in almost real time using our models on the Google Earth Engine.


a. National scalecounty

b. County scalefarm

c. Farm scale

Because we are using data assimilation, we can also estimate the uncertainty associated with our predictions.

The figures below show the mean predicted yield for Hengshui 2017, and the (per pixel) uncertainty in yield. The uncertainty here is given as the standard error, a measure of the width of the PDF.

mean yield

Mean yield over Hengshui (kg/ha)

yield std

Std yield over Hengshui (kg/ha)

It is interesting to see that there are 'stripes' in the uncertainty image. This results from the number of samples of LAI available in different parts of the image, from different satellite orbits. In essence, where there are fewer (or poorer) samples, the uncertainty is higher.

Tweets from @UCLgeography