Statistical and Visual Comparison of Water Quality Changes Caused by a Large River Restoration Project

Abstract

This article presents a statistical and visual comparison of water quality changes caused by a large river restoration project. Since water quality data are often shown as a non-normal distribution with high seasonal variations, appropriate statistical methods should be selected according to data characteristics for accurate scientific decision-making. In this study, a normality test was first performed using the Shapiro–Wilk test and two statistical comparison tests were then performed, including the paired T-test and the sign-test. Seasonality was considered by comparing monthly data pairs. In addition, a diagonal pair comparison plot was proposed as a visual comparison method. This plot is a graphic data display, where monthly paired water quality is represented by X–Y coordinates. From this study, it was concluded that the series of statistical and visual methods would be suggested for comparison of non-normally distributed water quality with high seasonal variations.

Introduction

A large river restoration project was conducted in South Korea from 2010 to 2011. The main objectives of this project were flood control and water resource enhancement to counteract climate change in the Korean peninsula. The Korean peninsula is located in the Asian monsoon region with heavy rainfall in the summer and a long drought period, and is thus classified as an extreme risk area in the climate change vulnerability index (IPCC, 2013). The project included enlargement of the river channel capacity and construction of multipurpose weirs, where a large amount of benthic sediment and portion of the riverside floodplain were dredged. Besides river channel modification, the intention of the project was also to increase the water quality by reducing the pollution load from the watershed and promoting the recreation and tourism values of the river by landscaping the riparian zone.

River restoration projects can have a positive or negative influence on water quality. The reduction of the pollutant load from the watershed and the sediment dredging may enhance the water quality of the river (Zhang et al., 2014a, 2014b). The increased water volume by the enlarged river channel may dilute the concentration of pollutants loaded from point and nonpoint sources. However, the installation of river channel weirs may cause increased river depth and a stagnant water body, which may induce unexpected results with respect to water quality (Lee and Park, 2013; Cha et al., 2016). A stagnant water body enhances the sedimentation of suspended solids (SS) accompanied by trace pollutants and precipitation of calcium carbonate, resulting in clean and soft water (Bainbridge et al., 2012; Schaffelke et al., 2012; Jung et al., 2015). However, the increased river depth prevents oxygen penetration into the bottom of the river, which reduces the self-purification capacity of the river (Homoky et al., 2012; Kim, 2018).

Water quality impacts caused by restoration projects attract a great deal of attention, such that an accurate assessment has become a nationwide issue in Korea. The most reliable scientific method is the statistical comparison of water quality data observed from the river before and after the restoration project (Ruiz-Jean and Mitchell Aide, 2005; Woolsey et al., 2007; Hirsch et al., 2010; Sprague et al., 2011; Wan et al., 2014; Hirsch et al., 2015; Hickman and Hirsch, 2017). Several statistical methods are available for data comparison. In natural rivers, most water quality data are shown as non-normal distribution with high seasonal variation (Helsel and Hirsch, 1992; Yue and Pilon, 2004; Hirsch et al., 2010). Therefore, only a few selected methods can be used for data comparison of water qualities collected from natural rivers (Boyer et al., 1999; Zipper et al., 2002; Lee et al., 2010; Kroon et al., 2012; Naddeo et al., 2013). In addition, a simple and clear visual comparison of water quality is necessary to show the results of the restoration project to people who do not have much statistical knowledge.

This article presents a statistical approach to the assessment of water quality data collected from a natural river system, where high seasonal variations and non-normal distributions were observed. A simple and clear visual comparison method is also proposed, with the results of water quality changes caused by a river restoration project.

Materials and Methods

Study area and water quality data

A restoration project was carried out in the Geum River from 2010 to 2011. The Geum River is one of four major river systems in South Korea and plays an important role as a water resource for agriculture, industry, and municipalities in the mid-west area. The river basin is located in the mid-west (126°40′8″ to 128°3′25″E, 35°34′42″ to 37°3′7″N) of the Korean Peninsula. Its basin area and river length are 9,915 km² and 398 km, respectively. The river comprises more than 20 tributaries (Fig. 1). The mean annual precipitation is 1,374 mm (2005–2014) and the monthly precipitation is highly variable by season, where more than 60% of the total rainfall occurs during the wet monsoon in the middle of the year, during the dry seasons except for summer, and during a cold winter (Lee et al., 2012, 2015). The river has benefited from the considerable effects of flow duration control through the two upstream dams (Ahn et al., 2014). The total water use consists of water abstraction from the river and reservoirs equivalent to 2,522 × 10⁶ m³.

FIG. 1.

Study area map with water quality monitoring stations and constructed weirs.

Heavy restorations were carried out in the downstream area below the Daecheong Reservoir. A large amount of benthic sediment and a large portion of the riverside floodplain were dredged, and three multipurpose weirs were constructed. In the upstream area, only very light restorations, such as shore protection efforts and riverside maintenance, were carried out. Benthic sediment dredging and weir construction were not conducted in the upstream area (Table 1).

Table 1.

Restoration Intensity and Sampling Stations

	Upstream	Downstream
Restoration	Light restoration: Shore protection, riverside maintenance	Heavy restoration: Dredging benthic sediment and floodplain, three weirs construction
Stations	M1∼M7	M8∼M19

Water quality data were obtained from the national water quality monitoring stations operated by the National Institute of Environmental Research (NIER), the Korea Ministry of Environment (http://water.nier.go.kr). One hundred twenty-nine monitoring stations (31 stations of main stream and 98 stations of tributaries) are located in the Geum River basin. Each monitoring station measures 19 water quality indicators with a monthly base, including pH, temperature, conductivity, dissolved oxygen, 5-day biochemical oxygen demand (BOD), chemical oxygen demand (COD), SS, total nitrogen, total phosphorus (TP), and chlorophyll a (Chl-a).

Water quality data compared in this study were collected from 19 stations (M1∼M19), which are located in the restoration project section of the main stream (Fig. 1). Among the 19 stations, 12 stations (M8∼M19) were located in the downstream area and 7 stations (M1∼M7) were located in the upstream area. In this study, four major water quality indicators were compared: BOD, COD, TP, and Chl-a. The data measured between 2012 and 2013 (after the project) were compared with the data measured in 2009 (before the project). Considering the restoration intensity, the comparison of water quality data was carried out by separating the upstream and downstream stations.

Methods

Since the water quality in the Korean peninsula is highly seasonally variable, the data seasonality should be included in the statistical and visual methods. Thus, the water quality data measured on a similar Julian day or the same month/season need to be compared, such as January data with January data and February data with February data. In addition to seasonality, the normality is also important in the statistical comparison of water qualities. If the data show a normal distribution, parametric methods can be applied. If not, nonparametric methods are more appropriate (Charles and Terry, 1992). To select the proper statistical method, a normality test should therefore first be performed.

The flowchart of the statistical and visual methods is presented in Fig. 2. As shown in this figure, two comparisons were conducted independently. In the statistical comparison, the parametric (paired T-test) and the nonparametric (sign-test) tests were performed after the normality test (Shapiro–Wilk test). Generally, the parametric test is more powerful than the nonparametric test for population estimation. Another advantage of the parametric test is that the degree of changes can be calculated. However, the parametric test is very sensitive to outliers and often produces incorrect estimation with non-normal distribution data (Hamed, 2008). Therefore, for the natural river data, a nonparametric test is more suitable. In this study, all three tests (normality, parametric, and nonparametric) were performed together such that they would complement each other.

FIG. 2.

Flowchart of the statistical and visual comparison methods.

Statistical comparison

Normality test

A normality test is a fundamental step in water quality statistics. Approximately 40 numerical methods of normality can be used, such as the Pearson's chi-squared test, chi-square goodness-of-fit test, Anderson–Darling test, Lilliefors test, Shapiro–Wilk test, and the Kolmogorov–Smirnov test (Dufour et al., 1998; Razali and Wah, 2011; Lee et al., 2014). In this study, the Shapiro–Wilk test was performed using the SPSS 21.0v package and the result was confirmed by the graphical method using a histogram.

The Shapiro and Wilk (1965) test is often used for a sample size of less than 50. This was the first test to examine the normality with skewness or kurtosis. The Shapiro–Wilk test modified by Royston (1982a, 1982b, 1995) is available for a sample size between 3 and 5,000. The Shapiro–Wilk test statistic W is given as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} W = { \frac { { { \left( { \mathop \sum \nolimits_ { i = 1 } ^n { a_i } { x_ { \left( i \right) } } } \right) } ^2 } } { \mathop \sum \nolimits_ { i = 1 } ^n { { ( { x_i } - \overline x \; ) } ^2 } \; } } \tag { 1 } \end{align*} \end{document}

where x_i is the i^th order statistic (i.e., the i^th smallest number in the sample), \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\overline x$$ \end{document} is the sample means, and the constants a_i are given as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} ( { a_1 } , \cdots , \; { a_n } ) \; = { \frac { { m^T } { V^ { - 1 } } } { { { ( { m^T } { V^ { - 1 } } { V^ { - 1 } } m \; ) } ^ { 1 / 2 } } \; } } \tag { 2 } \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$m = { \left( {{m_i} , \cdots , \;{m_n}} \right) ^T}$$ \end{document} are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution, and V is the covariance matrix of these order statistics. The value of W is insignificant if the distribution of the variables does not differ from normal distribution. When the sample has perfect normal distribution, the W value is equal to one. If the probability value of the W statistic is less than 0.05, then the data can be said to form a non-normal distribution.

Sign-test

A sign-test is a nonparametric test used to determine the difference between the pairs that have seasonal variation and non-normal distribution. The sign-test is based on the positive or negative signs for comparisons of paired observations \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\left( {x , { \rm{ \;}}y} \right)$$ \end{document} and ignores the magnitude of the difference. The sign-test is the most useful test if the comparison of observation is not numeric, but can only be expressed as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x > y , \; \;x = y , \;{ \rm{or \; \;}}x < y$$ \end{document} based on the direction of the plus and minus signs. The sign-test statistic S is the number of positive signs among the differences before and after the project. The null hypothesis is then formulated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} {H_{ \rm{o}}}:P \left[ + \right] = P \left[ - \right] = 0.5 \tag{3} \end{align*} \end{document}

where the statistic S indicates the number of successes in n trails, and therefore has a binomial distribution with p = 0.5 under \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${H_{ \rm{o}}}$$ \end{document} . The large sample approximation to the sign-test statistic S distributed as N(0,1), is given as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Z = { \frac { S - n / 2 } { \sqrt { n / 4 } } } \tag { 4 } \end{align*} \end{document}

Paired T-test

The paired T-test is a parametric test used to compare two population means that have normal distribution with means \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\overline X$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\overline Z$$ \end{document} . It is assumed that each pair of observations is independent of the other pairs. The paired T-test statistic t is given as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} t = { \frac { \left( { \overline X - \overline Y } \right) } { { S_d } / \sqrt n } } \tag { 5 } \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { S_d } = \sqrt { { \frac { n \sum { { \left( { { d_i } \; } \right) } ^2 } - { { \left( \sum { d_i } \right) } ^2 } } { n - 1 } } } \tag { 6 } \end{align*} \end{document}

where d_i is the mean difference of the paired data (x_i, y_i) for i = 1, …, n. It is observed that S_d is the standard deviation of the difference.

Visual comparison

A diagonal pair comparison plot was proposed in this study as a simple and clear visual comparison of water quality. This method is a visual data display in an X–Y graph, as shown in Fig. 3, where monthly paired water quality data are represented by two axes, such as before (X) and after (Y). The data point falls above, below, or on the diagonal line, referring to degradation, improvement, or no change, respectively. If more points of data occupy the upper triangular areas of the graph, the water quality has been degraded by the restoration. In the same way, improvement has more data points in the lower triangular areas of the graph. The numbers of data points above, below, or on the line are recorded in the upper right corner in the graph, because they are critical to the sign-test. The percentage degree of change ( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta$$ \end{document} ) can be calculated as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Delta = { \rm { \; } } { \frac { \overline X - \overline Y } { \overline X } } \cdot 100 \tag { 7 } \end{align*} \end{document}

FIG. 3.

Schematics of proposed diagonal pair comparison plot (N_u: number in upper diagonal zone; N_l: number in lower diagonal zone; and N_o: number on-line).

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\overline X$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\overline Y$$ \end{document} are the arithmetic mean point of the X and Y in the graph, respectively. Since the arithmetic mean is statistically significant in the normal distribution sample, the degree can be acceptable only when both the sign-test and the paired T-test results are the same.

Results and Discussion

Statistical comparison

Normality test

Results of the normality tests are presented in Table 2. Twenty-four data sets were analyzed in the Shapiro–Wilk test. Each data set included 84 samples from 7 upstream stations and 144 samples from 12 downstream stations, measured during the years of 2009 (before the restoration), 2012 (after the restoration), and 2013 (after the restoration for confirmation), for 4 different water quality parameters, including BOD, COD, TP, and Chl-a. Each data set was tested at a 95% confidence level (α-level 0.05). If the p-value was smaller than a given α-level (0.05), the null hypothesis was rejected and the alternative hypothesis was accepted. In such a case, it was assumed that the data would show non-normal distribution.

Table 2.

Results of Normality Test

	Upstream			Downstream
Parameter	W	p	Normality	W	p	Normality
BOD
2009	0.943	0.001	Non-normal	0.969	0.003	Non-normal
2012	0.972	0.063	Normal	0.968	0.002	Non-normal
2013	0.898	0.000	Non-normal	0.982	0.058	Normal
COD
2009	0.846	0.000	Non-normal	0.989	0.284	Normal
2012	0.908	0.000	Non-normal	0.987	0.194	Normal
2013	0.911	0.000	Non-normal	0.978	0.019	Non-normal
TP
2009	0.540	0.000	Non-normal	0.965	0.001	Non-normal
2012	0.778	0.000	Non-normal	0.779	0.000	Non-normal
2013	0.685	0.000	Non-normal	0.980	0.031	Non-normal
Chl-a
2009	0.713	0.000	Non-normal	0.893	0.000	Non-normal
2012	0.748	0.000	Non-normal	0.840	0.000	Non-normal
2013	0.684	0.000	Non-normal	0.944	0.000	Non-normal

BOD, biochemical oxygen demand; COD, chemical oxygen demand; TP, total phosphorus; Chl-a, chlorophyll a.

As expected, most data sets were confirmed to be non-normal distribution, except for BOD of 2012 in upstream and BOD of 2013, COD of 2009, and 2012 in downstream. The p-values of these four data sets were more than the given α-level (0.05), so the null hypothesis was accepted, as shown in Table 2. Even though these four data sets were proven to have normal distribution, the parametric test (the paired T-test) can be applied to only one comparison between the COD data sets of 2009 and 2012. From the normality test, it was therefore concluded that the nonparametric method (sign-test) should be used for water quality comparisons.

To identify the reason for the non-normal distribution, the frequency histogram of each data set was drawn and is shown in Figs. 4 and 5, which include upstream and downstream data, respectively. From these figures, it can be recognized that most graphs have tails extending to the right with several outliers; these graphs are called right-skewed graphs. These right-skewed graphs are well known as very typical distribution of water quality data. Although the BOD graphs of 2012 shown in Fig. 4 and of 2013 shown in Fig. 5 were proven to have normal distribution by the Shapiro–Wilk test, the data frequency does not seem to be visually symmetrical. Only two data sets (CODs of 2009 and 2012 shown in Fig. 5) show visually symmetrical distribution without outliers. From the 24 histograms shown in Figs. 4 and 5, it was concluded that the non-normal distribution of most data sets is caused by the right-skewed frequency with outliers.

FIG. 4.

Frequency histogram of water qualities with respect to data normality in upstream stations.

FIG. 5.

Frequency histogram of water qualities with respect to data normality in downstream stations.

Comparison test

Data of 2009 (before the project) were first compared with those of 2012 (after the project). Spatially, the statistical comparison was carried out by separating the upstream and downstream data, depending on the project intensity (heavy and light restoration). To confirm the first statistical results, the data of 2009 were also compared with those of 2013.

From the normality test, it was determined that most of the data sets should be shown as having non-normal distribution. Therefore, the nonparametric sign-test can provide a scientific judgment. The sign-test can only determine whether or not the water quality was improved. The degree of water quality change cannot be calculated from the sign-test. To compensate this weakness of the sign-test, the paired T-test was also applied.

Table 3 shows the results of the water quality comparison between 2009 and 2012. As shown in the sign-test results, all of the water quality parameters (BOD, COD, TP, and Chl-a) were improved in the downstream stations. In the upstream stations, the BOD, COD, and Chl-a were not changed, but the TP was degraded. From the paired T-test, all of the parameters (BOD, COD, TP, and Chl-a) should have improved in the downstream stations. In the upstream stations, the BOD and Chl-a were improved, but the COD and TP had not changed. To summarize, both the sign-test and the paired T-test showed the same results in all water quality parameters in the downstream stations. In the upstream stations, however, the two tests showed different results in all water quality parameters, except COD. In these cases, it can be inferred that the sign-test results should have provided a more accurate estimation of the population qualities than the paired T-test, because all of the data sets showed non-normal distribution.

Table 3.

Results of Water Quality Comparison (2009 vs. 2012)

	Sign-test			Paired T-test
Category	Z	p	Result	t	p	Result
BOD
Upstream	1.66	0.10	No change	2.98	0.00	Improved
Downstream	8.94	0.00	Improved	10.61	0.00	Improved
COD
Upstream	1.45	0.15	No change	1.43	0.16	No change
Downstream	7.08	0.00	Improved	10.61	0.00	Improved
TP
Upstream	2.25	0.02	Degraded	−0.14	0.89	No change
Downstream	8.09	0.00	Improved	10.73	0.00	Improved
Chl-a
Upstream	0.00	1.00	No change	2.20	0.03	Improved
Downstream	7.58	0.00	Improved	8.67	0.00	Improved

Table 4 shows the results of the water quality comparison between 2009 and 2013. As shown in the sign-test results, all of the water quality parameters (BOD, COD, TP, and Chl-a) were improved in the downstream stations. In the upstream stations, the BOD, TP, and Chl-a had not changed, but the COD was improved. From the paired T-test, all of the parameters (BOD, COD, TP, and Chl-a) should have improved in the downstream stations. In the upstream stations, the BOD, TP, and Chl-a had not changed, but the COD was improved. In short, both of the tests showed the same results in all water quality parameters in the upstream stations as well as in the downstream stations.

Table 4.

Results of Water Quality Comparison for Confirmation (2009 vs. 2013)

	Sign-test			Paired T-test
Category	Z	p	Result	t	p	Result
BOD
Upstream	0.46	0.64	No change	0.06	0.95	No change
Downstream	9.33	0.00	Improved	10.77	0.00	Improved
COD
Upstream	2.42	0.02	Improved	4.05	0.00	Improved
Downstream	9.58	0.00	Improved	13.13	0.00	Improved
TP
Upstream	0.55	0.58	No change	−0.53	0.60	No change
Downstream	10.37	0.00	Improved	17.72	0.00	Improved
Chl-a
Upstream	1.11	0.27	No change	1.44	0.16	No change
Downstream	5.02	0.00	Improved	6.95	0.00	Improved

To compare these results with the first statistical results presented in Table 3, very similar results were obtained in the confirmation tests (2009 data vs. 2013 data). From the comparison tests, it was concluded that all of the water quality parameters should have distinctly improved in the downstream stations after the restoration. In the upstream stations, however, no statistically discernable changes were observed, where most water qualities were shown to be unchanged in both tests.

Degrees of improvement were estimated from the paired T-test and are presented in Table 5. Since most of the data sets were proven to have non-normal distribution, these estimated degrees are only useful if both the sign-test and the paired T-test show the same results. In this study, the results of the sign-test and paired T-test are mostly the same, such that the quantitative improvement was evaluated by the arithmetic mean. In the downstream stations, BOD, COD, TP, and Chl-a were improved by 36.3%, 22.3%, 44.4%, and 57.7%, respectively, for 2012, and 38.0%, 26.8%, 58.2%, and 47.6%, respectively, for 2013, as shown in Table 5. It can be seen that the heavy restoration project was more successful in TP and Chl-a enhancement than in BOD and COD enhancement. In the upstream stations, however, the light restoration did not result in discernible improvement in all the parameters. In conclusion, it was clearly confirmed that all water qualities should have been significantly improved by the heavy restorations in the downstream area.

Table 5.

Degree of Water Quality Improvement After Restoration Project

	2012–2009			2013–2009
Category	Mean (2009)	Mean (2012)	Degree of improvement (%)	Mean (2009)	Mean (2013)	Degree of improvement (%)
BOD
Upstream	0.88	0.78	11.4	0.88	0.88	No change
Downstream	3.50	2.23	36.3	3.50	2.17	38.0
COD
Upstream	3.90	3.74	No change	3.90	3.48	10.8
Downstream	8.00	6.22	22.3	8.00	5.86	26.8
TP
Upstream	0.025	0.025	No change	0.025	0.027	No change
Downstream	0.153	0.085	44.4	0.153	0.064	58.2
Chl-a
Upstream	4.01	3.05	23.9	4.01	3.43	No change
Downstream	55.81	23.60	57.7	55.81	29.23	47.6

Visual comparison

Figure 6 presents the diagonal pair comparison plots of BOD and COD in upstream and downstream stations. Monthly paired data points were represented by X–Y coordinates. In the upstream plots, the data points are scattered around the diagonal line and the number of data points is similar in both the upper and lower triangular zone. In the calculated mean point, the X-value is almost equal to the Y-value. From the upstream plots, it can be seen that neither BOD nor COD should have improved or degraded by the restoration. In the downstream plots, however, most data points are located in the lower triangular zone. In the calculated mean point, the X-value is much larger than the Y-value. From the downstream plots, it can be seen that both BOD and COD should be significantly improved by the restoration. The water quality changes can also be assumed from the number of data points above, below, and on the line presented in the upper right corner box.

FIG. 6.

Diagonal pair comparison plot of BOD and COD (2009 vs. 2012) (Gray circle: mean). BOD, biochemical oxygen demand; COD, chemical oxygen demand.

Figure 7 presents the diagonal pair comparison plots of TP and Chl-a in upstream and downstream stations. In the upstream plot of TP, the number of data points in the upper diagonal zone is considerably more than that in the lower diagonal zone, but the X-value of the mean point is almost equal to the Y-value. This plot explains why different results were obtained from the sign-test and the paired T-test. In the upstream plot of Chl-a, the number of data points in the upper diagonal zone is equal to that in the lower diagonal zone, but the X-value of the mean point is much larger than the Y-value. In this case, no change was shown by the sign-test, even though the paired T-test showed 22.3% improved Chl-a. In the downstream plots of TP and Chl-a, most data points are located in the lower triangular zone. At the calculated mean point, the X-value is much larger than the Y-value. From these plots, it can be seen that both TP and Chl-a should have significantly improved. The restoration project improved TP and Chl-a by 44.4% and 57.7%, respectively. The improvement can also be assumed from the number of data points presented in the upper right corner box.

FIG. 7.

Diagonal pair comparison plot of TP and Chl-a data (2009 vs. 2012). Chl-a, chlorophyll a; TP, total phosphorus.

Confirmation test results of BOD and COD are presented in Fig. 8. In the upstream BOD plot, the numbers of data points are 35, 9, and 40 shown above, on, and below the diagonal line, respectively. In the calculated mean point, the X-value is equal to the Y-value. These results were obtained because no changes resulted from the sign-test and the paired T-test. In the upstream COD plot, however, the number of data points in the lower diagonal zone is greater than that in the upper diagonal zone, and the X-value of the mean point is larger than the Y-value. From both the sign-test and the paired T-test, it was therefore shown that COD should have improved. From the downstream plots, it can be seen that most data points are located in the lower triangular zone and the X-value of the calculated mean point is much larger than the Y-value. Therefore, in the downstream stations, it can be confirmed that both BOD and COD should have significantly improved by the restoration.

FIG. 8.

Diagonal pair comparison plot of BOD and COD data (2009 vs. 2013).

Figure 9 presents the confirmation test plots of TP and Chl-a. In the upstream plot of TP, the number of data points in the upper diagonal zone is greater than that in the lower diagonal zone, but the X-value of the mean point is almost equal to the Y-value. In the upstream plot of Chl-a, the number of data points in the upper diagonal zone is greater than that in lower diagonal zone, but the X-value of the mean point is larger than the Y-value. These figures show that the sign-test and the paired T-test showed no changes of TP and Chl-a. It should be noted that the data points of TP and Chl-a are widely scattered in the plots with high standard deviation. In the downstream plots of TP and Chl-a, most data points are located in the lower triangular zone. In the calculated mean point, the X-value is much larger than the Y-value. From these plots, it can be seen that both TP and Chl-a should have significantly improved. TP and Chl-a were improved by the restoration project by 58.2% and 47.6%, respectively.

FIG. 9.

Diagonal pair comparison plot of TP and Chl-a data (2009 vs. 2013).

Conclusions

To assess the water quality changes by the river restoration project, a statistical and visual comparison study was performed. Since the water quality data are often shown as having non-normal distribution with high seasonal variations in natural rivers, appropriate methods should be selected according to data characteristics for accurate judgment.

From the normality test, it was determined that most data would be shown as having non-normal distribution, as expected. To identify the reason for non-normal distribution, the frequency histogram of each data set was drawn. From the histogram, it was demonstrated that the non-normal distribution was caused by the right-skewed frequency with outliers. For the water quality comparison, the nonparametric sign-test was applied. Since the sign-test cannot provide the degree of change, the parametric paired T-test was also applied. The improvement or degradation of water quality was determined by the sign-test and the degree of change was computed as arithmetic mean, only when both the sign-test and paired T-test results were the same. A diagonal pair comparison plot was proposed as a simple and clear visual comparison of water quality. This plot is a visual data display in an X–Y graph, where monthly paired water quality data are represented by X–Y coordinates.

From the statistical comparison, it was concluded that all water quality parameters should have distinctly improved in downstream stations after the restoration. In the upstream stations, however, no statistically discernable changes were observed, and most water qualities were shown to be unchanged in both tests. From the comparison plots, it can be seen that all water quality parameters were significantly improved in the downstream plots, but no discernable changes were observed in the upstream stations. The degree of changes could be calculated from the X and Y values of the mean point in the plots. The water quality changes can also be assumed from the number of data points above, below, and on the line presented in the upper right corner box. From the number of data points in the upper and lower diagonal zones, the water quality changes could be clearly visualized. The series of statistical and visual methods presented in this article would be suggested for comparison of non-normally distributed water quality with high seasonal variations.

In this study, we have conducted a visual comparison of water quality data by the diagonal pair comparison plot as well as a statistical comparison of non-normally distributed data collected from the natural river. The results presented that the degree of restoration project carried out in upstream and downstream resulted in the different impact, positive or negative, on the water quality change of study area.

Footnotes

Acknowledgments

This study was partially supported by the Ewha Womans University Research Grant of 2017. The authors would like to thank the anonymous peer reviewers for improving the quality of this article.

Author Disclosure Statement

No competing financial interests exist.

References

Ahn

J.M.

, Lee

, and Kang

T.S.

(2014). Evaluation of dams and weirs operating for water resource management of the Geum River. Sci. Total Environ. 478, 103.

Bainbridge

Z.T.

, Wolanski

, Alvarez-Romero

J.G.

, Lewis

S.E.

, and Brodie

J.E.

(2012). Fine sediment and nutrient dynamics related to particle size and floc formation in a Burdekin River flood plume, Australia. Mar. Pollut. Bull. 65, 236.

Boyer

J.N.

, Fourqurean

J.W.

, and Jones

R.D.

(1999). Seasonal and long-term trends in the water quality of Florida bay (1989–1997). Estuaries, 22, 417.

Cha

Y.K.

, Park

S.S.

, Lee

H.W.

, and Stow

C.A.

(2016). A Bayesian hierarchical approach to model seasonal algal variability along an upstream to downstream river gradient. Water Resour. Res. 52, 348.

Charles

J.C.

, and Terry

L.Z.

(1992). The specification and power of the Sign Test in event study hypothesis tests using daily stock returns. J. Financ. Quant. Anal. 27, 465.

Dufour

J.M.

, Farthat

, Gardiol

, and Khalaf

(1998). Simulation-based finite sample normality test in linear regression. Economet. J. 1, 154.

Hamed

K.H.

(2008). Trend detection in hydrologic data: The Mann-Kendall trend test under the scaling hypothesis. J. Hydrol. 349, 350.

Helsel

D.R.

, and Hirsch

R.M.

(1992). Statistical methods in water resources. In Studies in Environmental Science. New York: Elsevier, p. 49.

Hickman

R.E.

, and Hirsch

R.M.

(2017). Trends in the Quality of Water in New Jersey Streams, Water Years 1971–2011: U.S. Geological Survey Scientific Investigations Report 2016-5176, p. 58. https://doi.org/10.3133/sir20165176.

10.

Hirsch

R.M.

, Archfield

S.A.

, and De Cicco

L.A.

(2015). A bootstrap method for estimating uncertainty of water quality trends. Environ. Modell. Softw. 73, 148.

11.

Hirsch

R.M.

, Moyer

D.L.

, and Archfield

S.A.

(2010). Weighted regression on time, discharge, and season (WRTDS), with an application to Chesapeake Bay River inputs. J. Am. Water Resour. Assoc. 46, 857.

12.

Homoky

W.B.

, Severmann

, McManus

, Berelson

W.M.

, Riedel

T.E.

, Statham

P.J.

, and Mills

R.A.

(2012). Dissolved oxygen and suspended particles regulate the benthic flux of iron from continental margins. Mar. Chem. 134–135, 59.

13.

Intergovernmental Panel on Climate Change (IPCC). (2013). Climate Change 2013: The Physical Science Basis. Cambridge: Cambridge University Press.

14.

Jung

B.J.

, Jeanneau

, Alewell

, Kim

, and Park

J.H.

(2015). Downstream alteration of the composition and biodegradability of particulate organic carbon in a mountainous, mixed land-use watershed. Biogeochemistry, 122, 79.

15.

Kim

S.H.

(2018). Drought and weir construction impact stationarity assumption in watershed water quality modeling in South Korea. Ecol. Inform. 45, 38.

16.

Kroon

F.J.

, Kuhnert

P.M.

, Henderson

B.L.

, Wilkinson

S.N.

, Kinsey-Henderson

, Abbott

, Brodie

J.E.

, and Turner

R.D.R.

(2012). River loads of suspended solids, nitrogen, phosphorus and herbicides delivered to the Great Barrier Reef lagoon. Mar. Pollut. Bull. 65, 167.

17.

Lee

H.W.

, Bhang

K.J.

, and Park

S.S.

(2010). Effective visualization for the spatiotemporal trend analysis of the water quality in the Nakdong River of Korea. Ecol. Inform. 5, 281.

18.

Lee

H.W.

, Kim

E.J.

, Park

S.S.

, and Choi

J.H.

(2012). Effects of climate change on the thermal structure of lakes in the Asian Monsoon Area. Clim. Change, 112, 859.

19.

Lee

H.W.

, Kim

E.J.

, Park

S.S.

, and Choi

J.H.

(2015). Effects of climate change on the movement of turbidity flow in a stratified reservoir. Water Resour. Manag. 29, 4095.

20.

Lee

H.W.

, and Park

S.S.

(2013). A hydrodynamic modeling study to estimate the flushing rate in a large coastal embayment. J. Environ. Manage. 115, 278.

21.

Lee

, Qian

, and Shao

(2014). On rotational robustness of Shapiro-Wilk type tests for multivariate normality. Open J. Stat. 4, 964.

22.

Naddeo

, Scannapieco

, Zarra

, and Belgiomo

(2013). River water quality assessment: Implementation of non-parametric tests for sampling frequency optimization. Land Use Policy, 30, 197.

23.

Razali

N.M.

, and Wah

Y.B.

(2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirov, Lilliefors and Anderson-Darling tests. J. Stat. Mod. Anal. 2, 21.

24.

Royston

J.P.

(1982a). An extension of Shapiro and Wilk's W tests for normality to large samples. Appl. Stat. 31, 115.

25.

Royston

J.P.

(1982b). Algorithm AS 177: Expected normal order statistics (exact and approximate). Appl. Stat. 31, 161.

26.

Royston

J.P.

(1995). Remark AS R94: A remark on algorithm AS181: The W-test for normality. J. R. Stat. Soc. 44, 547.

27.

Ruiz-Jean

M.C.

, and Mitchell Aide

(2005). Restoration success: How is it being measured?. Restor. Ecol. 13, 570.

28.

Schaffelke

, Carleton

, Skuza

, Zagorskis

, and Furnas

M.J.

(2012). Water quality in the inshore Great Barrier Reef lagoon: Implications for long-term monitoring and management. Mar. Pollut. 65, 249.

29.

Shapiro

S.S.

, and Wilk

M.B.

(1965). An analysis of variance test for normality (complete sample). Biometrika, 52, 591.

30.

Sprague

L.A.

, Hirsch

R.M.

, and Aulenbach

B.T.

(2011). Nitrate in the Mississippi River and its tributaries, 1980 to 2008: Are we making progress?. Environ. Sci. Technol. 45, 7209.

31.

Wan

, Cai

, Li

, Yang

, Li

, and Nie

(2014). Inferring land use and land cover impact on stream water quality using a Bayesian hierarchical modeling approach in the Xitiaoxi River Watershed, China. J. Environ. Manage. 133, 1.

32.

Woolsey

, Capelli

, Gonser

, and Hoehn

(2007). A strategy to assess river restoration success. Freshwater Biol. 52, 752.

33.

Yue

, and Pilon

(2004). A comparison of the power of the T-test, Mann-Kendall and bootstrap tests for trend detection. Hydrolog. Sci. J. 49, 21.

34.

Zhang

, Jiang

, Zhang

, Cui

, and Li

(2014a). Distribution of nutrients, heavy metals, and PAHs affected by sediment dredging in the Wujin'gang River basin flowing into Meiliang Bay of Lake Taihu. Environ. Sci. Pollut. Res. Int. 21, 2141.

35.

Zhang

, Zeng

F.X.

, Liu

W.J.

, Zeng

R.J.

, and Jiang

(2014b). Precise and economical dredging model of sediments and its field application: Case study of a river heavily polluted by organic matter, nitrogen, and phosphorus. Environ. Manage. 53, 1119.

36.

Zipper

C.E.

, Holtzman

G.I.

, Darken

P.F.

, Gildea

J.J.

, and Stewart

R.E.

(2002). Virginia USA water-quality, 1978 to 1995: Regional interpretation. J. Am. Water Resour. Assoc. 38, 789.