Data assimilation is a method to produce a description of the system state, as accurately as possible, under the control of observations by using all the available information and by taking into account the observation and model errors. We developed a framework for Observing System Simulation Experiments (OSSEs) based on the ensemble square root Klaman filter (EnSRF) technique, and the framework could assimilate two data sets of chlorophyll a retrieved from Environmental Satellite 1 (HJ-1) and moderate resolution imaging spectro-radiometer (MODIS) onboard the Terra platform, separately. We assumed that one of the retrieved results was the proxy “truth value” and the other one contained errors. Based on EnSRF technique, combined with the three dimensional numerical model of wind-driven circulation and pollutant transportation in a large-scale lake, we investigated the potential impact of location distributions of simulated observation stations in Taihu Lake (China) on the performance of data assimilation. In addition, the effectiveness of this method for evaluation and prediction of the concentration of chlorophyll a was validated. The results showed that the location of simulated observation stations not only influenced the accuracy of evaluating and forecasting results, but also the performance of data assimilation. We also discuss the impact of assimilation time and background error on the results. This study demonstrated that this method of data assimilation is effective for evaluation and prediction of the concentration of chlorophyll a in highly turbid case 2 waters.
With the development of monitoring techniques, water environment parameters could be observed in different ways. For example, some parameters, i.e. chlorophyll concentration and suspended matter concentration, could be retrieved by satellite imagery, and many parameters could be obtained from buoy observation. Due to its continuous observation, buoy monitoring systems have been catching the attention of Chinese environment monitoring centers and researchers in recent years. As the third largest and eutrophic lake of China, more than 20 buoy monitoring systems have been placed in Taihu Lake. With these buoys, the real-time water quality parameters can be downloaded online, and therefore, continuous observation data can be accumulated. This is significant for long-term water quality monitoring and data analysis. However, the data obtained by buoys can only represent the information of the buoy stations. To obtain spatial distribution of parameters through buoy observation, water dynamic models should be developed to simulate the diffusion and transportation of water components (Lv et al., 2010; Zhang et al., 2004; Baptista et al., 2005). These models can describe the water body from a macroscopic perspective and can get continuous, real-time simulated results, but these models are subjected to the influences of initial conditions, boundary conditions and other uncertainties (Wang et al., 2014; Deremble et al., 2011; Huang et al., 2010). Remote sensing, using satellite imagery, is one of the effective ways to retrieve the water constituent concentration from a macroscopic perspective. Satellite sensors that provide data in visible and near-infrared wavelengths can be used to estimate water constituent concentration based on its optical properties. Using remote sensing technology in water quality monitoring, can achieve macro spatial distribution of pollutants, and also has the advantages of low cost and wide range. However, the quality of satellite images is often subjected to the influence of temporal resolution, spatial resolution, cloud and other factors. All of these factors will impact the retrieve accuracy of using remote sensing.
Combined with the already considerable conventional collected data, how to effectively use these data becomes the key to the problem. Data assimilation is one of the possible methods to solve this dilemma. The main purpose of data assimilation is to combine different observational data and theoretical model results to get close to the actual results, by taking advantage of different source data. The technology of data assimilation, combined with a dynamic model, has been widely used in meteorology, oceanography and other fields (Kamachi and O’Brien, 1995; Zhang et al., 2003; Supharatid, 2008; Natvik and Evensen, 2003; Miesch et al., 2003; Gu et al., 2009; Leisenring and Moradkhani, 2012). The ensemble Kalman filter (EnKF) technique for data assimilation was first introduced by Evensen (1994). This is a kind of “forward” technique (Anderson et al., 2000), which directly incorporates the data into a prognostic model simulation. It has gained enormous popularity in recent years because of its simple formulation and relative ease of implementation compared with “inverse” techniques, such as, the four-dimensional variational data assimilation (4DVAR) method. Furthermore, its computational requirements are comparable to sophisticated methods such as the 4DVAR method (Le Dimet and Talagrand, 1986; Talagrand and Courtier, 1987) and the representative method (Bennett, 1992). In the standard EnKF formulation, observations are treated as random variables that are subjected to added perturbations (Burgers et al., 1998; Houtekamer and Mitchell, 1998, 2001; Evensen, 2003), so that the analysis error covariance is consistent with that of the traditional Kalman filter. Recently, several deterministic methods have been developed to avoid sampling errors associated with the use of perturbed observations; these methods include the ensemble square root filter (EnSRF; Whitaker and Hamill, 2002; Tippett et al., 2003), ensemble adjustment filter (EAKF; Anderson 2001), and ensemble transform Kalman filter (ETKF; Bishop et al., 2001), and all of these belong to the broader class of square root filters (Tippett et al., 2003).
In this article, an Observing System Simulation Experiment (OSSE) system based on EnSRF (Whitaker and Hamill, 2002) is established by combining with a water dynamic model. In this simulation experiment, we tried to assimilate two data sets of chlorophyll a concentration (Cchla) derived from Environmental Satellite 1 (HJ-1) and moderate resolution imaging spectro-radiometer (MODIS). Our purposes are: (1) to evaluate the effectiveness of data assimilation method for evaluating and forecasting the Cchla; (2) to evaluate the influence of observation locations on the results of data assimilation, and therefore, to give the suggestion of buoy deploying.
Data and methods
Taihu Lake, located between 30°56′–31°33′ N and 119°553′–120°536′E, is the third largest freshwater lake in China. It has an area of 2,338 km2 and a mean depth of 1.9 m. The flat-bottom and shallow water is a notable feature of Taihu Lake, with 48 islands and 72 peaks. Since the 1980s, rapid industrialization and urbanization in the surrounding areas, in conjunction with the widespread use of fertilizers in the rural areas, have produced an enormous amount of wastewater and sewage being discharged into the lake without treatment. Consequently, eutrophication has becoming a serious environmental problem that threatens the normal functioning of lake life. The environmental importance of regularly monitoring the biophysical status of highly turbid case 2 waters has aroused widespread interest. Chlorophyll a, a photosynthetic active pigment in phytoplankton, is a key indicator of the biophysical status of water body. It is thus of importance to estimate Cchla timely in order to predict the water quality.
There are many rivers flowing into and out of Taihu Lake, with more than 25 inflow rivers. The flow into Taihu Lake is small due to short river courses, and the mean flow rate is 14.7 m3·s−1.
Observing System Simulation Experiments
Observing system simulation experiments are typically designed to use data assimilation ideas to investigate the potential impacts of prospective observing systems. They also can be used to investigate current data assimilation systems by testing the impact of new observations on them. Therefore, the chlorophyll a (chla) data assimilation system can be tested and the buoy locations in Taihu Lake can be designed through the OSSEs.
The methodology of an OSSE consists of (Lahoz et al., 2010):
Nature Run. It generates the reference state of entire OSSE period, which provides the proxy “truth.” Observations are simulated and subsequent OSSE assimilation experiments are verified;
Control Run. A control run (or experiment) in which all the data representing the current operational observational data stream are included. In this experiment, the realistic errors of the existing observing systems are considered;
Assimilation Run. It produces results which are close to actual results. Since the nature run provides the proxy “truth,” therefore, if the data assimilation system works well, the assimilation run results will approach to the nature run results.
A comparison between the nature, control and assimilation runs.
The framework of the OSSE is given in Figure S1 (available in the online supplementary information). Let CR denote the control run (results), AR denote the assimilation run (results) and NR denote the nature run (results).
Construction of initial value field for Observing System Simulation Experiments test
The Cchla data derived from MODIS and HJ-1 were defined as dataset-1 and dataset-2 separately, which formed the initial value field. Both the MODIS and HJ-1B CCD2 data were obtained on 21 April 2009. The spatial resolution of MODIS imagery use in this research is 1 km, with 1d resolution.
HJ-1 satellite was launched in September 2008 by China, which included two optical satellites (HJ-1A and HJ-1B) with CCD camera onboard. The performance parameters are shown in Table S1 (Wang et al., 2010). The 2d resolution provides high frequency image data for real time and long-time serials. This advantage brings us a new chance for establishing local retrieval models of water quality parameters in different water regions of China.
Our study area is Taihu Lake and the in situ sample distribution is shown in Figure S2. The in situ data, including above water remote sensing reflectance and chlorophyll concentration, was used to establish the chla retrieval model from MODIS and HJ-1 data separately. Remote sensing reflectance measurement was conducted in Taihu Lake on 21 April 2009 with 26 samples (Figure S2), from 9:00 to 16:00 h local time. And the above water remote sensing reflectance measurements were taken in a boat using ASD Handheld Spectroradiometer which has a sensitivity spectral range of 350∼1050 nm at an increment of 1 nm, according to Le et al. (2009).
Water samples (2.5 l volume) were collected at a depth of 0.5 m below the surface immediately after reflectance measurement. Then, the samples were stored in a cooler with ice in the dark, and taken back to the laboratory to analyze the Cchla at the end of the day. The Cchla was measured according to the lake investigation criteria in China (Huang, 1999). Pigment samples were extracted in hot (80°C) 90% ethanol, and chla concentration was quantified fluorometrically (Welschmeyer, 1994).
Pre-processing of the image data
The image data were pre-processed as follows: firstly, radiation correction (convert DN value to radiance value); secondly, geometric rectification; thirdly, atmospheric correction; fourthly, water body extraction.
where, RTOA is the top-of-atmosphere reflectance; LTOA is the measured radiance in mW m−2 sr; F0 is the extraterrestrial solar irradiance corrected for each day of the year. is the solar zenith angle.
Secondly, field measurements were considered as the best possible approximation of the water’s optical properties, and the sensors should have been capable of estimating the values observed in the field. Any deviation from the measurement would mean an error, amongst which atmospheric interferences were considered as a great part. Based on this argument, atmospheric contribution for each wavelength was equal to RTOA (λ) minus Rfield (λ). Rfield(λ) is the in situ quasi contemporaneous above water remote sensing reflectance available from the in situ sample station. Quasi-coincident satellite derived RTOA and field values Rfield are referred to as matchups. A matchup is considered valid if at least one field measurement Rfield is available over a 2 h time interval centered on the satellite overpass time. The two platforms, HJ-1B and MODIS, overpass time were 11:17 and 11:25, respectively. Of the in situ sample stations, number 10 sample was recorded in 11:35. No.10 sample’s collected time was closest to the satellite overpass time. Therefore, we use this sample’s remote sensing reflectance as the Rfield. The in situ reflectance of ASD was processed in 12 bands (band 8 to band 19) and 4 bands (band 1 to band 4) using spectral respond function (SRF) method to compare with MODIS and HJ results, respectively.
Lastly, we assumed that the atmosphere was homogeneous throughout the study area. Then, atmospheric contribution was applied to the entire Taihu Lake, to obtain the ground remote sensing reflectance.
To evaluate the accuracy of the atmospheric correction, a match-up data set was assembled within the satellite overpass ±2 h from in situ measurement. A total of 5 match-up stations were obtained. These five sample stations were numbered 8 to 13. Figure S3 shows that the spectrum of MODIS and HJ after atmospheric correction is consistent with in situ measured reflectance (The remote sensing reflectance for MODIS in band 12 to 16 were discarded, because of the DN value in these bands are negative). The root mean square error (RMSE) between the in situ reflectance and satellite results after atmospheric correction were calculated and listed in Table S2 which showed that the MODIS and HJ remote sensing reflectance were close to the in situ measurement. The results suggested that the atmospheric correction procedure described above is a feasible method.
The field distribution of chlorophyll a concentration
From Figure S4, a big difference of Cchla from the two kinds of images could be observed in the boundary, while, in the center of the lake the difference is relatively small. In the central of Taihu Lake, the difference of Cchla is within the scope of ±5 μg·l−1. On the contrary, in the boundary of Taihu Lake, the difference of Cchla is relatively very big. That might be caused by the relatively lower spatial resolution of MODIS image and the water-land mixing pixel at the boundary of Taihu Lake.
In the OSSE, the retrieved Cchla from HJ-1B was treated as dataset-2, assuming it as the actual “truth” based on the model and result (Table S3). And the result retrieved by MODIS is set as dataset-1.
Water dynamic model
where x, y and z are the Cartesian coordinates oriented eastwards, northwards and upwards, respectively; u, v and w are the velocity components in the horizontal x, y and vertical z directions; is the water elevation; t is the time; g is the gravitational acceleration; is the fluid density and is the reference density; f is the Coriolis parameter; Fx and Fy are horizontal diffusion for momentum; vT is the eddy viscosity coefficient in the vertical direction.
Where K is the attenuation coefficient and C0 is the input rate from environment. The domain is discretized into a series of layers in the vertical and into a combination of triangular element in the horizontal direction. The whole domain is divided into a number of prisms, with 2,813 nodes and 5,135 triangles in the horizontal direction. The average distance between nodes is about 1 km. The numerical algorithm for this water dynamic model can be referenced to Zhang et al. (2009) and Zhang and Song (2010). The water dynamic model was compiled by Fortran.
An EnSRF algorithm following Whitaker and Hamill (2002) was used in this study. The specific algorithm is as follows:
(1) Error evaluating. The observation error Rn and model error covariance matrix Qn are evaluated at every step. The initial background value and background covariance matrix are known values. The members of ensemble are which satisfy , . Where is the normal distribution with mean and covariance matrix , and K is the ensemble size.
(5) Going back to (2).
The EnSRF algorithm was compiled by MATLAB and coupled with water dynamic model by MATLAB program.
Results and discussion
Figure S4 shows that the results retrieved form MODIS and HJ-1B at the boundary and in the central of Taihu Lake are different. This may cause different assimilation results when we set the buoys in the boundary and in the central of Taihu Lake. Therefore, we discussed the effectiveness of data assimilation especially by laying the buoys in the north of the bay and the central of Taihu Lake. We chose the Meiliang Bay and the central of Taihu Lake as the study area for laying simulated buoys to discuss the influence of observation locations on the results of data assimilation. Here, we only considered the horizontal variation of chla concentration.
Case A: In Meiliang Bay
The frame was tested by assimilating the data of 10 virtual buoys in Meiliang Bay (Figure S5). All these buoys were laid in the areas where the retrieved chla data had large differences.
The retrieved Cchla from HJ-1B combined with water dynamic model was used to predict 12 h as the “truth” value, which was regarded as the NR. Similarly, the result retrieved from MODIS combined with water dynamic model to predict 12 h, was regarded as the CR. The simulated results by water dynamic model were output every hour. The observation value for the buoys is the simulated value in the position of the buoys by the NR results. In OSSE, the simulated observation buoys value was assimilated every 1 h, using the MODIS imagery retrieved data as the background, and the total assimilation procedure had been lasted for 6 h. The analysis results of AR after the 6th hour were used as the initial value for the prediction stage from the 7th to 12th hour. The assimilation-prediction results are shown in Figure S6. From the linear regression equation in Figure S6, we know that the slope of AR’s regression equation is more close to 1. Therefore, we can draw that AR’s results is more close to NR results. That is, after 6 h assimilation, the difference between NR results and CR results are reduced.
Then, the RMSE and ARE were calculated during the assimilation time (1–6 h) and prediction time (7–12 h) separately. The AR and CR results were compared with the NR results in Figure S7. It could be observed that among the assimilation-prediction results in ten virtual buoys, numbered 7 and 10 perform best, while on the contrary, the virtual buoy numbered 8 is the worst. The accuracy improved by 93%, 93.1% separately in buoys numbered 7 and 10. However, the accuracy declines by 14% in the buoy numbered 8. On the whole, in ten buoys, the average accuracy improved by 65% versus CR. During the prediction stage, AR results is better than the CR results because of the initial value of the whole Taihu Lake having become close to the “truth value” NR results after the assimilation. That indicates that the evaluation and prediction of chla concentration in Meiliang Bay have achieved the good results based on the EnSRF algorithm. The absolute difference of Cchla between AR and NR 12 h after assimilation-prediction are shown in Figure S8. We can see that the AR result in Meiliang Bay, Gonghu Bay and East Taihu Lake are close to the NR results which are the “truth” result. In the central of Taihu Lake, some areas have a higher Cchla than the “truth” results, while some areas have chla concentration close to the “truth” result. Through assimilation, in the central part of the lake, Cchla is not well-distributed, with sudden occasional changes. The reasons might be: (1) the impact of the assimilation effect on the central part has not reached stable within six assimilation hours; (2) there is a spurious correlation between Meilang Bay and the central of the lake, which may due to that the observation operator designed in this experiment has some insufficiency.
Case B: In the central part of Taihu Lake
The frame was tested by assimilating the data of 20 virtual buoys in the central of Taihu Lake (Figure S5). All these buoys were laid in the area where the retrieved chla data have a relative small difference. The average of absolute difference between NR results and CR results in these twenty buoys is 3.5 μg·l−1.
In this case, all conditions in the OSSE were the same as case A, except the location of the buoys. The assimilation-prediction results are shown in Figure S9. Figure S9 shows that AR results are more close to 1:1 line. The RMSE and ARE results are shown in Figure S10. We can see that among the assimilation-prediction results in twenty virtual buoys, numbered 18 and 19 perform best, while on the contrary, the virtual buoy numbered 13 is the worst. The accuracy is improved by 96.7%, 91.4% respectfully in buoys numbered 18 and 19. However, the accuracy is improved by 27.9% in buoy numbered 13. On the whole, in twenty buoys, the average accuracy improved by 57% versus CR. From this point of view, the assimilation effect is not better than that in Meiliang Bay area. But in the whole lake, the distribution of Cchla is continuous and consistent. During the prediction stage, after the assimilation, AR results are better than the CR results. Based on the EnSRF algorithm, the evaluation and prediction of Cchla in the central of Taihu Lake have achieved relatively good results. The absolute differences of Cchla between AR and NR after 12 h assimilation-prediction are shown in Figure S11. We can see that the AR results in the central part of Taihu Lake are close to the NR result which is considered as the “truth” results. In most areas of Zhushan Bay, the concentration of chla is higher than the “truth” results. But from the view of the whole lake, the assimilation results basically reflect the true situation.
In this article, a framework for OSSE based on the EnSRF has been established, and it has been used to evaluate and predict the concentration of chla. The assessment of the accuracy indicates that this algorithm is an effective algorithm for evaluating and predicting the concentration of chla. It is conducive to real-time evaluation of the water quality. If we apply this method to other water quality parameters such as: total phosphorus, total nitrogen, and suspended particulate matter, a more comprehensive evaluation could be obtained. From the above discuss, we know that (1) when the buoys are deployed in the central of Lake, we could achieve a relative good results and the distribution of Cchla are continuous. Therefore, in the future, when we deploy the real buoys, the central part is preferable; (2) EnSRF is an effective method to evaluate and predict the Cchla. From the results and discussion of OSSEs, we know that we can use the real buoys observational data and combine with the method of data assimilation to evaluate and predict chla concentration in next stage. Therefore, next stage, we will assimilate the true value of buoys as the observation data for true evaluation and prediction of the chla concentration.
We would like to express our deepest thanks to anonymous reviewers for their useful comments and suggestions.
Supplemental data for this article can be accessed on the publisher’s website.
This research was supported by the National Natural Science Foundation grant of China (No. 41271343), Key Fundamental Research Projects of Natural Science in Universities affiliated with the Jiangsu Province (No. 11KJA170003), Scientific Research Foundation of Creative Plan for graduate students of Jiangsu province (No. CXZZ12_0397), and Special Project of High-Resolution (E0203/1112).