In river ecosystems, mesohabitat characteristics (i.e. pool, run, riffle, rapids, etc.) act as proximate variables to fish species occurrence. Fish occurrence and mesohabitat data are very often collected independently for different purposes, which invites challenges to characterize the fish species distribution pattern on mesohabitat scale. The present article delineates quantitative assessment of fish occurrence in relation to mesohabitat using secondary data. Middle stretch of the Narmada River of India has been selected for the study. Geographic information system tools have been used for integration of species and mesohabitat data. Nonmetric multidimensional scaling, cluster analysis and analysis of similarity techniques have been used for similarity analysis. Logistic regression model has been applied for model-based inferences on family-mesohabitat relationship. Two separate mesohabitat types, viz., Pool-Run and Run-Riffle, have been characterized by the fish species occurrence pattern. Dissimilarity of fish species composition between Pool-Run and Run-Riffle was statistically significant (p-value < 0.05). The family-mesohabiat model predicted that the occurrence probability of a fish species was 14.49 times more in the Pool-Run than that in the Run-Riffle. The predictive accuracy of the model was 69.8%.
Habitat changes caused by anthropogenic stress have endangered various fish species and many more are feared to be affected due to habitat alteration of freshwater resources (Postel et al., 1996). A clear understanding of the relationship between a fish species or family and its environment is of paramount importance for prudent management of riverine ecosystems. Rivers and streams may be regarded as a hierarchical system of patches that differ in age, size and environmental conditions. This concept explains habitat occurrence of large scale and small scale features (Clarke et al., 2003). Fishes also respond to system processes over the whole range of scales. They exhibit short-term local activities (e.g. daily feeding) involving small-scale movements between habitats, as well as long-term broad-scale seasonal migrations within and between catchments (Durance et al., 2006). The largest habitat scale is the macro-scale, incorporating dynamics of the entire catchment. Microhabitat is the smallest scale which describes the immediate area a fish is located at any particular time (Cowx et al., 2004). The intermediary scale is meso-scale depicting instream unit of uniform flow and substrate types (Tickner et al., 2000; O’Neill and Abrahamas, 1984). Meso-scale units can also be defined by relating riverbed morphology and channel hydraulics (Thomson et al., 2001) and is termed as hydromophologic units (HMU). HMU size and location very often correspond to that of mesohabitat, suggesting synonyms between mesohabitat and HMU (Parasiewicz, 2001). Mesohabitat is usually classified into three major categories namely pool, run and riffle, which are again subdivided into 11 sub-categories (Krstolic et al., 2006). The mesohabitat scale is becoming increasingly popular because of its use in environmental flow studies (Parasiewicz, 2007) and due to importance of mesohabitat characteristics as a proximate variable for many aquatic organisms including fish.
Numerous studies have successfully related mesohabitat characteristics to fish assemblage. Fish community structure could be distinguished among mesohabitat types by feeding guild during summer low flow period (Schwartz and Herricks, 2008). This scale may provide greater realistic insight of selection of physical habitat by fish due to their natural mobility (Lobb and Orth, 1991, Paraseiwicz, 2001, Mouton et al., 2011). Parasiewicz (2007) described mesohabitats as fish species and life-stage specific areas where the configuration of hydraulic patterns, together with attributes that provide shelter, create favourable or unfavourable survival and developments. Lowe et al. (2006) provided evidences on strong influence of the presence or absence of some fish species in river reach due to high mesohabitat type heterogeneity. This implies that fish distribution is also driven by wider habitat area than the mesohabitat (Dunbar, 2008).
None of the aforesaid studies used secondary data. Using secondary data to develop predictive models is very common in terrestrial ecology although it is rare in case of fish species (a few exceptions are Oberdorff et al., 1995; Fu et al., 2004; Bhatt et al., 2012), particularly for studies on fish species occurrence on mesohabitat scale.
Plenty of studies have been conducted over a 50-year period on biodiversity and habitat characteristics (Karamchandani et al., 1967; Dubey, 1993; Rao et al., 1991) on the Narmada river basin. Under a comprehensive resource development exercise, a series of dams have been constructed on river Narmada. Consequently, the river practically has been transformed into small, medium and large patches of diverse aquatic habitat types, each with unique features. Biodiversity analysis in relation to habitat parameters thus becomes important even for small stretches of the river. In this direction, few studies have been undertaken to characterize fish species distribution, catch composition and mesohabitat characterisation (Vyas et al. 2007, 2009; Vyas, 2009). However, none of those studies emphasized upon characterization of fish species distribution on mesohabitat scale. The management recommendations, so far, have generally been based on either qualitative analysis or professional knowledge of the investigators. Further, the information on fish species distribution and habitat characteristics have often been collected in a haphazard way with multiple objectives. This leaves challenges inherent in using those existing data, collected for different purpose, to generate useful information derived from quantitative analysis. In this article an attempt has been made to exploit those existing data for integrated quantitative assessment of fish species diversity in relation to mesohabitat characteristics.
Materials and methods
The study area, i.e. the middle stretch of River Narmada, starting from Shahganj to Bandua (Figure 1), spanning between (77° 35′ E, 22° 52′ N) and (77° 50′ E, 22° 50′ N), covering 37 km distance along the river, has been considered for the present investigation. It is chosen because of availability of relevant secondary data for that stretch of the Narmada.
The mesohabitat and fish data
Fish presence-absence data in the study area was obtained from Vyas et al. (2009). The presence-absence data was recorded on the basis of fishes captured by cast net, monofilament gill net of varying mesh size at nine sampling sites, as well as, from the catch of nearby landing sites during December 2005 to 2006 (Vyas et al., 2009). A total of 47 fish species belonging to 29 genera, 15 families and 6 orders were recorded in the mentioned study (Vyas et al., 2009). GIS dataset of mesohabitat characteristics of the same study area was obtained from Vyas et al. (2007). In the study area, the locations of six riffles, five pools and three runs were identified (Figure 1).
The challenging task here is to synchronize mesohabitat and fish data over location and time. We accomplished such synchronization by using GIS tools. IRS LISS-III image of the same sampling year (i.e. 2006) was geo-referenced. Thereafter GIS dataset on mesohabitat location and boundaries (Vyas et al., 2007) as well as the locations of sampling sites were overlaid on the same image. Visual inspection of the image also ensured the similar mesohabitat type as delineated by Vyas et al. (2007). The fish species presence-absence data were recorded from the sampling sites as well as from nearby landing sites (Vyas et al., 2009). So it can be contemplated that the fish presence-absence assigned to the site might very well be encountered either at the sampling site or at the peripheral area or at both. Therefore fish species presence will also be influenced by habitat characteristics of the peripheral area of the site. Thus existing fish species data was already spatially aggregated by the way of recording fish species occurrence. So we follow spatial aggregation method (Vadas and Orth, 1997) to create pooled habitat characteristics for each of the sites. It was observed that major portion (82%) of the stretch was covered by Run with discrete presence of Pool and Riffle (Figure 1). Hence Run is considered to be the common habitat effect of fish species occurrence in the study area.
Examining the image, we create pooled (or aggregated) mesohabitat category, as shown below, which is a combination of mesohabitat types of the sampling site and peripheral area.
Site similarity due to fish species assemblage pattern was carried out using hierarchical cluster analysis. The fish species distribution on the basis of mesohabitat was analysed using non-metric multidimensional scaling (NMDS). NMDS is preferred to CCA since (a) it is not derived from any assumed model of species response (unimodal) to gradient (b) the focus here is more on identifying factors characterizing species composition than positioning species along gradient. Analysis of similarity (ANOSIM) was employed to examine the statistically significant difference of fish species distribution between meshohbaitat. Resampling of 2000 permutations was used to implement (Clarke, 1993) test of significance. Standard Bray-Curtis distance measure was used for all the aforementioned analyses.
where p is the probability of presence of a fish species, xi is the i-th explanatory variable with corresponding regression coefficient bi. In the present set up two variables of mesohabitat type and taxonomic family were incorporated in the model. Model was fitted by using maximum likelihood approach (McCullagh and Nelder, 1989) under binomial error structure.
The model was evaluated on the basis of confusion matrix. Following Jiménez-Valverde and Lobo, 2007, we considered cut-off dependent and cut-off independent measures of performance of the model. Cut-off dependent measure is based on specificity (true negative rate), sensitivity (true positive rate) and Minimized Difference Threshold (MDT). A cut-off independent measure is based on receiving operating characteristics (ROC) curve. Overall performance of the model was evaluated by the value of area under curve (AUC) of the ROC. Stability of prediction has been judged with respect to the bootstrapped confidence intervals.
Results from cluster analysis are shown in Figure 2. The sites can be classified into four distinct groups according to similarity (35%) of the fish species distribution. Site 5 has been classified as single group having fish species distribution different from remaining sites. Sites 3 and 8 have similar community structure and those are different from the group containing Site 4 and 9. The maximum similarity of fish species distribution is observed in the group containing Sites 1, 2, 6 and 7. These groupings are compared with those of ordination analysis as described below.
Fish species habitat gradient analysis
In NMDS, the number of dimensions is determined on the basis of the combination of ‘stress value’ and ‘number of dimensions’. For two dimensional ordinations the stress value of 7.95% and the corresponding accuracy of 99.7% were obtained. Clarke and Warwick, 2001 suggested that a stress value below 10% was good ordination for ecological data with little possibility of having misleading interpretation. In view of the above, three or more dimensional solutions would not add any additional information about the overall structure. We preferred to base our interpretation on the basis of the results of two dimensional solution of NMDS.
The tri-plot obtained from NMDS is depicted in Figure 3a. The separation of sites with respect to fish species distribution is similar to that obtained in the hierarchical cluster analysis. Sites 1, 2 and 6 have substrate type of soil, clay and sand. Sites 7 and 8 are dominated by sandy bed. Sites 3, 4 and 9 comprise hard rock, cobble, boulder and sand which are not conducive for fish habitat. These different substrate types influence the fish species distribution. Hence groups of sites have been formed with respect to such varying fish species distribution. Two hulls were formed by joining the vertices of sites corresponding to Pool-Run and Run-Riffle based on GIS classification. These were overlaid on NMDS plot for comparison. It was observed that excepting Site 7, GIS based classification was almost same as the previous classification. Hence the sites could be comfortably grouped into two major habitat types. This grouping paved way for meaningful interpretation in characterizing species mesohabitat relationship.
NMDS1 separates Pool-Run and Run-Riffle habitat types according to fish species distribution. In general, species distribution in Pool-Run is more (76.60%) than that in Run-Riffle (23.40%). However, some species are positioned around the center of the tri-plot, indicating the same species may be encountered in all the habitat types, viz., pool, run and riffle. Such phenomenon is possibly because the stretch under study has been dominated by run with some patches of riffle and pool (Figure 1). The fish species Notopterus notopterus (1), Amblypharyngodon mola (2), Oxygaster bacaila (7), Oxygaster clupeides (9), Puntius sarana (12), Labeo gonius (18), Osteobrama cotio (23), Ompok bimaculatus (26), Mystus bleekeri (29), Mystus tengra (31) and Nandus nandus (41) are more inclined towards Run-Riffle than Pool-Run.
From the results of ANOSIM we found (Figure 3b) that dissimilarity of species composition between Pool-Run and Run-Riffle were found to be statistically significant (p-value < 0.05). Further, the dissimilarity within Pool-Run was less than that within Run-Riffle (Figure 3b). Extreme point in Pool-Run habitat was possibly due to low availability of species at Site 5. The high variation of species community structure within Run-Riffle is perhaps because of the sampling difficulty in that habitat, especially in riffle.
ANOSIM provides only the evidence of significantly different species composition between Pool-Run and Run-Riffle without characterizing family-mesohabitat relationship. This was accomplished by family-mesohabitat model, employing logistic regression to the presence-absence data. Analysis of deviance under logistic regression ensured mesohabitat type as significant (p-value < 0.05) effect on occurrence probability of fish species. Family level effect and interaction between family and habitat were statistically insignificant with p-values of 0.06 and 0.07, respectively.
In the logistic regression analysis, Pool-Run mesohabitat and Ambassidae family were considered as references for habitat and family respectively. The detailed results on specific effects (only for statistically significant) are shown in Table 1. The negative sign of RR indicates that chance of presence of fish species in Pool-Run is more than that in Run-Riffle.
On computation of relative occurrence probability of fish species in Pool-Run and Run-Riffle it is found that chance of encountering fish species in Pool-Run is 14.49 times more than that in Run-Riffle. This finding has significant importance of fish species occurrence in relation to mesohabiat. Generally stretches of river system in hilly region are most often associated with alternating shallow rapids followed by deep pools and thus form a sequence of combination of pool, run and riffle. Pool was believed to be important breeding ground for fish (Sjorslev, 2000) and run, enriched with clay and organic matter in the study area, was considered as good mixture of aquatic life. These two river morphological factors signify ecological plausibility of higher occurrence probability of fish species in Pool-Run than that in Run-Riffle. Table 1 shows that most of the families have almost similar occurrence probability, excepting the family Cyprinidae. Occurrence probability of Cyprinidae is relatively higher than that of other families and this pattern is very common in Indian River system. Only the Bagridae family has showed significant association with Run-Riffle mesohabitat. More specifically, there is a positive association in occurrence probability between Bagridae and the reference family (i.e. Ambassidae) in the Run-Riffle mesohabitat.
Cut-off based performance measure of the model (Figure 4a) produced the optimal classification at probability threshold of presence at 0.25 with overall accuracy of 64%. The accuracy of predicting fish species presence and absence from the model were 61.5% and 66.2%, respectively. Similar results were obtained in the cut-off independent performance measure based on ROC (Figure 4b). The 95% confidence interval was computed on the basis of 2000 simulated bootstrapped samples. Though compact confidence interval indicated a stable estimate of ROC over entire range of specificity, the AUC was estimated at 69.8% (Figure 4b). In view of Manel et al. (2001) this level of accuracy in prediction is moderate and useful for ecological application.
A comprehensive quantitative assessment of fish biodiversity in mesohabitat scale has been carried out in the present study. The approach adopted here is a move from human recognition of riverine hydraulics and subsequent classification to data driven objective assessment framework. The significant contribution of the work is that the research opportunities using existing data have been fully exploited. In the earlier study in the same study area, the preferential habitat was determined on basis of local fisermens’ knowledge or surveyors’ knowledge (Vyas et al., 2009). On the other hand, detailed mesohabitat description (Vyas et al., 2007) was given without explaining fish species mesohabitat relationship. The proposed integrated assessment framework completely eliminates subjective bias and derives model-based inferences for characterizing fish species assemblage on mesohabitat scale in a more efficient manner without incurring any further survey cost. As such, even though the data had been collected in a haphazard way, the approach adopted produced similar results in comparison with that derived from extensive primary survey data. For example, earlier studies on peninsular rivers in India (Arunachalam, 2000) reported that Cyprinidae mostly structured around pool type habitat. Similar results in the Pool-Run sequence were obtained from the present quantitative assessment.
Moreover, estimate of relative occurrence of fish species (14.49 times more in Pool-Run than Run-Riffle) was reported for the first time in the local scale. Assignment of distinct boundaries among habitats corresponding to fish species distribution patterns is difficult because there are basically few examples in nature where habitat boundaries have strict demarcation. There is usually presence of gradients of intermediate habitat and the spatial scale of study may influence the conclusions of stream community structure (McAuliffe, 1984; Pringle et al., 1988; Kerans et al., 1992). The aggregated mesohabitat type has eliminated the edge effect to some extent between run, pool and riffle in structuring fish assemblage.
Though different types of analytical tools ensured species habitat relationship, the proposed mesohabitat-family model underperformed in terms of predictive accuracy. There are several issues in this regard. Firstly, the secondary data were relative small in size for evaluating predictive accuracy. Secondly, species detection probability was highly influenced by sampling efforts, which was unknown in the present study. These are the limitations in using existing data. Thirdly, the model ignores the effect of other environmental factors like hydrochemistry, substrate type etc. which are also important in structuring fish species assemblage. Considering trade-off between precision and cost in using secondary data, the model satisfactorily characterizes the family-mesohabitat relationship. We restricted ourselves to family level model only because the information on fish species presence-absence was available only in single instance in a site. As such, there is immense scope for further studies with refinements. Heterogeneity due to sampling effort can be taken into account under Generalized Linear Mixed Model if information on repeated sampling at site is available. Simultaneous water sampling, as well as, fish sampling for large number of sites of different mesohabitat types will enrich fish (abundance or occurrence), hydrochemistry and mesobabitat data. Advanced modeling approach (e.g. Random Forest, Boosted Regression Trees, Multivariate Adaptive Regression etc.) that use rich data can then be applied which may lead to new findings with high predictive accuracy.
The authors sincerely acknowledge Mr. Anirban Goswami and Amitangshu Sil, SRF, CIFRI for assisting in statistical computing.