Long-term datasets of nutrient levels provide a unique opportunity to study the changes that occur in lake ecosystems subjected to multiple man-induced and natural pressures. Unfortunately, the relevant dataset of thousands of records for Lake Peipsi (Estonia/Russia) is rather fragmentary and therefore difficult to analyse. Focusing on the total phosphorus concentration (TP), which is believed to be the main reason for degradation of the shallow lowland lake, we tailored, using SAS/STAT software, a special mixed regression model having 61 linear parameters and a repeated measures approach with spatial covariance structure. The model describes the dependence of TP in the surface water on year number (1985–2010), on the day of year and on geographical coordinates. Based on Akaike's criterion, the present model is superior to the analogous model, where the prediction errors (residuals) are supposed to be independent. Another advantage of the present model is that it distinguishes two groups of factors related to the variation in TP residuals. Factors of the first group affect TP synchronously within the larger areas, while this synchrony is weakening down to about 75 km or even more as has been shown by the SAS VARIOGRAM procedure. Factors in the other group can be associated with sampling instability and random errors in laboratory analyses. The influence of both factor groups on TP is approximately equal, with the related variance components on a log2-scale being 0.12 and 0.15.
Waterbodies all over the world are threatened by the excessive nutrient loading of eutrophication (Håkanson, 1995; Moss, 2007). Phosphorus (P) has been recognized as the most critical limiting nutrient for lake productivity (Wetzel, 2001). The P concentration in a lake is determined by the P concentration of the inflowing water, the residence time and the internal P dynamics of the lake (Schippers et al., 2006). P availability and the rate at which it is cycled determine primarily the growth of algae in freshwater systems (Burley et al., 2001; Håkanson et al., 2003; Williams, 2006). A thorough understanding of the P cycle is needed both to understand the structure and functioning of aquatic ecosystems and to preserve the quality of our aquatic resources (Istvánovics, 2008).
In Lake Peipsi (Estonia/Russia) eutrophication has led to the undesirable growth of algae, massive blooms of cyanobacteria accompanied by oxygen depletion during the night and fish kills, low water transparency, and siltation of the lake bottom (Kangur and Möls, 2008). Although long-term datasets are valuable resources for detecting and explaining complex changes in natural environments (Jackson and Füreder, 2006; Benson et al., 2006), their direct use for drawing solid scientific conclusions on the ecosystem functioning of Lake Peipsi is complicated due to the level of fragmentation of the data. Our recent modelling work has resulted in a predictive tool useful for water quality indicators in the lake (see Möls  for details). In essence, values contained in the database predicted from the year, the day of the year and geographical coordinates have been successfully used in a number of studies: polarity in the distribution of nutrients in Lake Peipsi (Kangur and Möls, 2008), seasonal changes in nutrient concentrations in the water of Lake Peipsi (Haldna et al., 2008), comparisons associated with long-term changes in hydrochemical and -hydrobiological parameters in Lakes Peipsi and IJsselmeer (Lammens et al., 2007). However, a better understanding of the eutrophication problem and more effective controls of nutrient enrichment in the lake have heightened the need for data of even higher quality.
In the present study we will examine whether using covariance structures provided recently by SAS software is able to improve the quality of the predictions of total phosphorus concentration (TP) from the year, the day of the year and geographical coordinates. Furthermore, we will try to give a deeper insight into the variance structure of the measurements and to explain the origin of the various variance components.
Material and Methods
Lake Peipsi is a large (3,555 km2) shallow (mean depth 7.1 m) lowland lake lying on the border of Estonia and Russia (Figure 1). The catchment area of Lake Peipsi (47,800 km2, including lake surface) extends from 56°08′ to 59°13′ N and from 25°36′ to 30°16′ E (Jaani, 2001). The lake extends in the meridianal direction more than 150 km and consists of three different parts; i.e. Lake Peipsi sensu stricto (s.s.), Lake Lämmijärv and Lake Pihkva, with differences of a morphometric nature, although they form a single waterbody (Jaani, 2001). The majority of nutrients are carried into the lake by the rivers Velikaya and Emajõgi (Figure 1).
During the study period 1985–2010 the number of sampling sites has varied considerably. We have no observations from the Russian part of the lake between 1993 and 2000 because we could not cross the border for sampling. However, Estonian-Russian joint expeditions to the whole lake have been arranged since 2001. Seasonal (monthly) water samples for TP analysis were obtained from the surface layer (0.1–1.0 m) with a Ruttner sampler. TP was determined in the water samples by the ammonium molybdate spectrometric method (EVS-ES 1189). Chemical analyses were performed in the Institute of Zoology and Botany of the Estonian University of Life Sciences, and since 1992 at Tartu Environmental Researches Ltd., Estonia. The laboratories applied identical methods.
In the present study we have used 1281 surface water TP measurements from Lake Peipsi s.s., 206 measurements from Lake Pihkva and 252 measurements from Lake Lämmijärv. TP in the surface water over the whole lake varied during the study period (1985–2010) between 8 and 210 mg P m-3.
In order to estimate any correlation between the measurements which can be considered as simultaneous but occurring in different parts of the lake, we clustered TP measurements into ‘expeditions’, joining together measurements carried out during relatively short time intervals. Of all 1739 data rows in the TP dataset, 165 such expeditions were formed. The mean duration of an expedition was 1.6 days (maximum 7 days); the number of sampling sites visited during an expedition varied from 2 to 31 (with an average of 11 sites). The time interval between expeditions was in 80% of the cases greater than 20 days, mostly about 29 days, justifying the grouping of observations into expeditions. The TP was originally fixed in mg m-3. Distribution of this measure is not normal, but its logarithm is fairly normally distributed (Möls, 2005). In the following work we have used the logarithmic TP, still denoting it the TP.
To separate the variability related to long-term changes, yearly seasons and geographical differences between sampling sites in the lake from the overall TP variability, we used a large regression model, containing 61 carefully selected terms, e.g. 49 interaction terms optimally presenting these factors (Table 1). Hereinafter we refer to such mixed models as P61, irrespective of the shape of the covariance structure of residuals. In former applications of P61, and also in other analogous general linear models (Starast et al., 2001; Möls, 2005), it was assumed that residuals have a joint normal distribution and are mutually independent. In those cases, the covariance matrix of residuals was assumed to be diagonal and called trivial. In the present study, we tested exponential, Gaussian and power covariance structures in addition to trivial one. To compare the quality of models with different covariance structures we used Akaike's information criterion (AIC) as a measure of the goodness of fit of the statistical model calculated in the MIXED procedure.
To select for residuals, a nontrivial covariance structure, which more correctly characterizes the statistical dependence between residuals, we additionally used the SAS procedure VARIOGRAM (SAS, 2008).
By using the SAS MIXED procedure, we had the opportunity to split the total residual variance into components - the natural variance (σ2) and the variance of measurement error (τ2). The covariance between two observations can be calculated as σij = σ2ρ(θ, dij), where ρ(θ, dij) is the covariance function (Littell et al., 2006). We selected ρ(θ, dij) = exp(–dij/θ), where θ is parameter estimated from data, and dij is the Euclidean distance between sampling sites i and j. If dij = 0, then the samples are taken in the same place, ρ(θ, dij) = 1, and the covariance between TP residuals is maximal.
Results and Discussion
Lake models that predict P concentrations have provided important information which has enabled successful restoration of many eutrophic lakes during the last few decades (Bryhn and Håkanson, 2007). However, in order for the model to perform well, it is essential that the environmental data are of high quality (Bennion and Smith, 2000). Nevertheless, several studies have shown that a linear mixed model P61 works sufficiently well, as has also been shown with the sparse long-term TP data series available for Lake Peipsi (Starast et al., 2001; Möls, 2005; Lammens et al., 2007; Kangur and Möls, 2008).
In the present article, we changed the trivial covariance structure used in the former versions of P61, using the special exponential spatial structure. Change of covariance structure improved the model goodness of fit in the sense of Akaike's informational criterion: AIC decreased from 2231 in the case of the trivial structure to 1753 in the case of the spatial structure. Therefore, using the spatial covariance structure in the mixed linear models of lakes such as Lake Peipsi having a large surface area is justified.
According to the results obtained from the mixed model P61 in our study, the residuals, which correspond to spatially close measurements during an expedition, are correlated (Figure 2). The covariance decreases, when the distance between sampling sites increases. In Lake Peipsi, the decrease can be followed until a distance of about 75 km, because there are too few sampling sites that are more distant. The minimal covariance between TP residuals in Lake Peipsi is about 0.02–0.03. Both components of the total residual variance – natural variance and the variance associated with measurement errors – as estimated using the SAS MIXED procedure were virtually equal, being 0.15 and 0.12, respectively.
The natural component of variance, the σ2, can result from simultaneous episodic processes having a direct influence on TP within an area of tens of kilometres, or even over the whole lake. For example, resuspension tends to take place only during short periods of high longitudinal winds (Ekholm et al., 1997). P release may markedly increase if the P in bottom sediments is translocated to the water column by wind-induced resuspension (Søndergaard et al., 1992). As the loading of nutrients from the drainage basin determines substantially the TP status of lakes, episodic changes in external nutrient supply can immediately affect TP concentrations. Salvia-Castelvi et al. (2005) have demonstrated that TP loads are completely determined by high discharges and, more precisely, by high discharges in the rising phase of storms, and this represents only a few days of a year. Among factors influencing the TP indirectly, in the short-term scale, are episodic algal blooms. Fast sediment-water exchange and the rapid turnover of P in water may support the enhanced growth of phytoplankton during periods of low external loading (Wetzel, 2001; Istvánovics, 2008). Many algae, when provided with a sufficient supply of P, can absorb the nutrient in quantities far in excess of their actual needs (Istvánovics, 2008). Assimilation of surplus amount of P by algae can provide sufficient P to maintain algal growth even at times when the external concentration of P may be very low. Several algae and cyanobacteria found in lakes exhibit vertical migrations on a daily basis from nutrient-rich sediments to the euphotic trophogenic zones. The cyanobacteria Gloeotrichia echinulata, which has been found to take up P in the sediments and transport it to the euphotic trophogenic zone, where this stored P could support three to four doublings (Istvánovics, 2008), is abundant in Lake Peipsi. These episodic algal blooms can increase the TP measured in the unfiltered water samples. Thus, fluctuations in environmental conditions such as solar radiation regulating phytoplankton growth could explain the residual variance. Although the interpretation of the information provided by the model is of rather hypothetical character and needs further investigations in field, it is still very useful to water managers.
Factors behind the other variance component, the τ2, influence TP randomly and independently each time TP is measured. This variance could emanate from random errors in sampling and laboratory procedures, as well as any small-scale heterogeneity of the water surface layers where our samples were taken.
Using spatial covariance structures in mixed linear models improves, in the sense of Akaike's criterion, the quality of general linear models used for TP predictions in Lake Peipsi. It also helps to achieve a deeper insight into the variance structure of TP measurements, by estimating the proportions of different variance components in the TP variability. In particular, the influence of non-local unidentified factors reported for TP values in Lake Peipsi are approximately at the same level as random measurement errors.
Funding for this research was provided by the target-financed projects SF0170006s08 and SF0180026s09 of the Estonian Ministry of Education and Research, by the Estonian Science Foundation grant 7643 and by the Estonian Centre of Excellence in Genomics. We are indebted to Dr. Märt Möls for useful suggestions and comments on the theoretical aspects of statistical methods used in this article. Additionally, we gratefully acknowledge Dr. Renee Miller (UK) for language editing.