MethCP: Differentially Methylated Region Detection with Change Point Models

Abstract

Whole-genome bisulfite sequencing (WGBS) provides a precise measure of methylation across the genome, yet presents a challenge in identifying differentially methylated regions (DMRs) between different conditions. Many methods have been developed, which focus primarily on the setting of two-group comparison. We develop a DMR detecting method MethCP for WGBS data, which is applicable for a wide range of experimental designs beyond the two-group comparisons, such as time-course data. MethCP identifies DMRs based on change point detection, which naturally segments the genome and provides region-level differential analysis. For simple two-group comparison, we show that our method outperforms developed methods in accurately detecting the complete DMR on a simulated data set and an Arabidopsis data set. Moreover, we show that MethCP is capable of detecting wide regions with small effect sizes, which can be common in some settings, but existing techniques are poor in detecting such DMRs. We also demonstrate the use of MethCP for time-course data on another data set after methylation throughout seed germination in Arabidopsis.

1. Introduction

DNA methylation is an important epigenetic mechanism for regulation of gene expression. Methylation is a process by which methyl groups are added to DNA cytosine (C) molecules. The methylation of promoter sites, in particular, is negatively correlated with gene expression while methylation in gene bodies is positively correlated with gene expression. Whole-genome bisulfite sequencing (WGBS) allows for precise measurement of DNA methylation across the genome. In brief, when DNA is treated with bisulfite, the unmethylated cytosines are converted to uracil (U), leaving methylated cytosines unchanged. Sequencing of bisulfite-treated DNA and mapping of the sequenced reads to a reference genome then provide a quantification of the level of methylation at each cytosine. Methylation occurs in three different contexts: CG, CHG, and CHH (where H corresponds to A, T, or C). We just refer to methylation of individual cytosine nucleotides.

In the analysis of BS-Seq data, a common interest is to identify regions of the genome where methylation patterns differ across populations of interest. Such regions are called differentially methylated regions (DMRs). Identifying DMRs are generally considered preferable than detection of individually differentially methylated cytosines (DMCs) from both statistical and biological perspectives (Teschendorff and Relton, 2018). DNA methylation shows strong local patterns, and it is believed that region-level differences are more biologically important. Because of the low coverage and the fact that nearby cytosines usually have similar levels, combining them into regions substantially improves statistical power and lowers the false discovery rate. For the downstream analysis, reporting regions also reduce the redundancy.

A number of methods have been developed to identify regions from BS-Seq data that show differential methylation between two groups of samples [see (Shafi et al., 2018) for a detailed review]. One common strategy is to perform a test at every cytosine that appropriately accounts for the proportions and then use these significant results to determine the DMRs. For example, methylKit (Akalin et al., 2012) performs either a logistic regression test or Fisher's exact test per-cytosine; RADMeth (Dolzhenko and Smith, 2014) uses a β-binomial regression, and a log-likelihood ratio test; DSS (Feng et al., 2014; Wu et al., 2015; Park and Wu, 2016) uses a Bayesian hierarchical model with β-binomial distribution to model the proportions and tests for per-cytosine significance with a Wald test. Other methods use the local dependency between neighboring cytosines to improve their per-cytosine test. BSmooth (Hansen et al., 2012) and BiSeq (Hebestreit et al., 2013) both use local likelihood regression to estimate an underlying methylation curve, and then test for differences in the smoothed methylation ratios between populations. HMM-DM (Yu and Sun, 2016a,b) and HMM-Fisher (Sun and Yu, 2016; Yu and Sun, 2016a) both use Hidden Markov Models along the genome to account for the dependency between cytosines. For many of these methods, the region is often either predefined or determined by merging adjacent DMCs based on specific criteria such as distance.

Another approach is to directly segment the methylation levels to find DMRs. The method metilene (Jühling et al., 2016) uses a modified circular binary segmentation (CBS) algorithm with statistics based on the mean differences in methylation ratios between two groups. The segments are tested for significance using Kolmogorov–Smirnov or Mann–Whitney U tests until the test results do not improve or the number of cytosines is too small.

We, too, propose a segmentation approach, MethCP, for finding DMRs from BS-Seq data. MethCP uses as input the results of a per-cytosine test statistic, like one of the methods already described, uses this input to segment the genome into regions, and then identifies which of those regions are DMRs. Our method, therefore, takes into account the coverages and biological variance between samples. Furthermore, all of the previously mentioned methods, including existing segmentation methods, are developed for simple two-group comparisons, and are not straightforward to extend to more general experimental designs. MethCP, on the other hand, can be used in a wide variety of experimental designs.

We show through simulations that our method more accurately identifies regions differentially methylated between groups, as compared with competing methods. We illustrate the performance of MethCP on experimental data and show that its behavior on experimental data mirrors that of the simulations. We further demonstrate the flexibility of MethCP for use beyond the two-group setting by applying it to a time-course study.

2. Methods

MethCP assumes as input the results of a per-cytosine test of significance, such as those mentioned previously in the introduction. The main steps of MethCP are to (1) segment the test statistics into regions of similar values and then (2) assign a p-value per region as to whether the region is a DMR.

Let T_k, $k = 1, \dots, K$ , be per-cytosine statistics for each of K cytosines, ordered by the location of the cytosines. We assume for now that the test statistics are independent (asymptotically) normally distributed, such as z-statistics or Wald statistics for testing equality of a proportion between two populations, and in Section 2.1 we extend this approach for other test statistics. We segment the T_k into regions of similar levels of significance based on the CBS algorithm of Olshen et al. (2004), which was originally developed for segmentation of DNA copy number data. Note that MethCP applies the segmentation to test-statistics T_k, which is a summary per-cytosine as to the difference of interest across the samples, so that it finds regions of similar population significance.

In brief, binary segmentation methods involve testing over all of the possible breakpoints (cytosines) for whether there is a change in the mean of T at location $i \in [K]$ ; in the case of genomic data, the segmentation is applied per chromosome. The CBS algorithm performs a binary segmentation and adapts the algorithm so as to view the data from a chromosome as if it lies on a circle, segmenting the circle into two arcs. The segmentation procedure of CBS is then as follows: for each possible arc defined by $1 \leq i < j \leq K$ , the likelihood ratio test statistic Z_ij is calculated by comparing the mean value of T_k found in the arc from i + 1 to j with that found in the remaining circle. To find a significant breakpoint, CBS determines whether the statistic $Z = {max}_{1 \leq i < j \leq K} | Z_{i j} |$ is significantly larger than 0. If so, this implies a detection that the arc $(i + 1, j)$ has a significantly different mean than the remaining arc and the two arcs are declared to be separate segments. The procedure is then applied recursively on each resulting segment until no more significant segments are detected.

The number of computations required for the segmentation is $O (K^{2})$ . However, due to the uneven distribution of methylation cytosine across the genome, “gaps” where nearby methylation cytosines are far from each other and almost uncorrelated naturally presegment the genome. Like other methods (Jühling et al., 2016), MethCP can optionally presegment the genome and apply the algorithm separately to these highly separated segments. This reduces the computation to $O (K M)$ , where $M ≪ K$ is the maximum number of cytosines in these segments.

The underlying purpose of the segmentation step is to segment the differential region from the undifferentiated regions. After the completion of the segmentation, it remains to determine which of these segmented regions correspond to significant DMRs, rather than their surrounding undifferentiated regions. To classify these regions, we use meta-analysis principles to aggregate the per-cytosine statistics and obtain one single statistic per region, to which we apply significance tests.

We assume that for each cytosine, the calculated statistic T_k can be written as $T_{k} = \frac{e_{k}}{\hat{σ} (e_{k})}$ , where e_k is the effect size that has approximate normal distribution with estimated variance ${\hat{σ}}^{2} (e_{k})$ . For a region i with a set of cytosines S_i, the weighted effect size is given by $e_{i}^{*} = \frac{\sum_{k \in S_{i}} w_{k} e_{k}}{\sum_{k \in S_{i}} w_{k}},$ (1)

where the weights w_k signify the contribution of cytosine k. Typically in meta-analysis applications, w_k is set to be $\hat{σ} (e_{k})^{- 1}$ (Borenstein et al., 2009). In the case of WGBS, assuming that appropriate methods that account for the variability in the counts are used to calculate T_k, $\hat{σ} (e_{k})^{- 1}$ will be closely related to the coverage of the cytosine, which we designate as C_k. Alternatively, for example, when $\hat{σ} (e_{k})$ is not available, we can use $w_{k} = C_{k}$ , explicitly giving larger weights for high coverage cytosines.

A test statistic for a region found after segmentation is, therefore, calculated by $T_{i}^{*} = \frac{e_{i}^{*}}{\hat{σ} (e_{i}^{*})}$ , where $\hat{σ} (e_{i}^{*})$ is the estimated variance of $e_{i}^{*}$ . Based on our Gaussian distribution assumptions on the individual e_k, we call the region significant if $| T_{i}^{*} | > z_{α ∕ 2}$ , where α is the significance level.

In the standard meta-analysis, the individual T_k is often assumed to be independent so that the estimated variance of $e_{i}^{*}$ is given by ${\hat{σ}}^{2} (e_{i}^{*}) = \frac{\sum_{k \in S_{i}} w_{k}^{2} {\hat{σ}}^{2} (e_{k})}{{(\sum_{k \in S_{i}} w_{k})}^{2}} .$ (2)

In the setting of methylation analysis, we have noted that the individual loci statistics are not independent. Even so, we show through simulation study (Section 3.1) that in using this estimate of ${\hat{σ}}^{2} (e_{i}^{*})$ , we still control the false discovery rate per region.

2.1. Generalizing beyond z-statistics

The mentioned approach relies on input statistics that are Gaussian. This can be limiting, since methods often produce other types of statistics, such as Fisher's exact test implemented by methylKit and log-likelihood ratio test from RADMeth. For this reason, we give a further adaptation in MethCP so as to be applicable for any cytosine-based parametric statistics that result in valid p-values. Let $p_{1}, p_{2}, \dots, p_{K}$ be the p-values indexed by the location of cytosines. For segmenting the genome into regions, we use the standard transform of the p-values to Z-scores, $z_{k} = s g n (e_{k}) Φ^{- 1} (1 - p_{k} ∕ 2),$ (3)

where Φ is the cumulative distribution function of standard Gaussian. MethCP then performs CBS on the z_i's to segment the genome.

Region-level statistics can be obtained by aggregating p-values using Fisher's combined probability test (Fisher, 1934) or Stouffer's weighted Z-method (Stouffer et al., 1949; Whitlock, 2005). Namely, for a region i with a set of cytosines S_i, let $T_{i}^{F i s h e r} = - 2 \sum_{k \in S_{i}} log p_{k},$ (4)

T_{i}^{S t o u f f e r} = \frac{\sum_{k \in S_{i}} w_{k} Φ^{- 1} (1 - p_{k})}{\sqrt{\sum_{k \in S_{i}} w_{k}^{2}}},

(5)

where w_k can be chosen to be constant or given by coverage C_k. We test $T_{i}^{F i s h e r}$ against $χ_{2 | r |}^{2}$ , and $T_{i}^{S t o u f f e r}$ against standard Gaussian for significance.

2.2. Quantifying region alignment

To quantify the performance of the methods or similarity of DMR sets detected by different methods, we need to define some measures for whether a region was successfully detected. One simple solution is just to calculate measures of specificity and sensitivity based on the percentage of whether individual cytosines were correctly called to be in a DMR. However, since our goal is to detect regions, this is unsatisfactory. Thus, we extend the specificity and sensitivity to region detection problem. Our framework of evaluation is closely related to supervised measures such as directional Hamming distance and segmentation covering in the image segmentation literature (Huang and Dom, 1995; Pont-Tuset and Marques, 2016). Specifically, to determine whether a detected region is considered a true or false positive (FP), we set a parameter $α \in (0, 1]$ that is the percentage of overlap required to be considered as having successfully detected a region. We then calculate true positive rates (TPRs) and false positive rates (FPRs) that vary with α.

Furthermore, we see that some DMR methods are biased toward longer or shorter regions (Section 3), which can make comparing methods difficult. To account for different biases of regions found (in the following, we refer to number of cytosines in a region as the length of the region), we calculate the percentage overlap between a detected region and a true region using three different denominators: that of the detected region, that of the true region, and that of the union of the detected and true regions. The three measures can be interpreted as the local measures of precision, recall, and Jaccard index. The first two allowed us to distinguish as to whether methods detected a high percentage of the true regions, versus if a high proportion of the detected regions were truly DMRs. The local Jaccard index allows us to measure the similarity between detected and true regions symmetrically.

We demonstrate our definitions using the local measure of precision—that is, the overlap is determined by the proportion of the detected region that intersects the truth. Denote the detected and true regions set as $ℛ^{d}$ and $S^{t}$ , respectively. To determine whether a detected region $R_{i}^{d} \in ℛ^{d}$ was a true positive (TP), we calculate the TP indicator for $R_{i}^{d}$ : ${T P}_{i}^{d} (α) = I \{{max}_{S_{j}^{t} \in S^{t}} \frac{| R_{i}^{d} \cap S_{j}^{t} |}{| R_{i}^{d} |} \geq α\},$ (6)

where $| R_{i}^{d} |$ is the length of the detected region $R_{i}^{d}$ and ${max}_{S_{j}^{t} \in S^{t}} | R_{i}^{d} \cap S_{j}^{t} |$ is the maximum overlapping cytosines of $R_{i}^{d}$ with a true region. Note that we take the maximum over all true regions to account for the fact that a detected region may overlap multiple true regions (and vice versa). From the ${T P}_{i}^{d}$ definitions, we calculate the total TP as a function of α: $T P^{d} (α) = \sum_{R_{i}^{d} \in ℛ^{d}} {T P}_{i}^{d} (α) .$

The mentioned formulas for TP can be extended to local measure of recall or Jaccard index by adjusting the denominator in Equation (6), from $| R_{i}^{d} |$ to $| R_{i}^{t} |$ and $| R_{i}^{d} \cup S_{j}^{t} |$ , respectively, for calculating overlap. Similarly, we can calculate the FP, false negative (FN), and true negatives (TNs). Please refer to Appendix A for detailed formulations.

3. Results

3.1. Simulation study

In this section, we applied our method as well as five representative methods (Shafi et al., 2018) BSmooth, HMM-Fisher, DSS, methylKit, and metilene on simulated data with two population groups. Our method, MethCP, was run using the statistics of both DSS and methylKit as input (hereinafter referred to as MethCP-DSS and MethCP-methylKit). We evaluated their performances with the measures described in Section 2.2. The data simulation procedure and details of applying these methods can be found in “B” in Appendix.

3.1.1. A simulation for comparing two population groups

Figure 1 shows the summary of the length of the DMRs detected by the six methods using the default significance level (or test statistic thresholds); we also show the distribution of the lengths of true regions. MethCP and metilene give the closest length distribution to that of the true regions. Although we shortened the smoothing window compared with the default, BSmooth and DSS detect much larger regions. In contrast, HMM-Fisher and methylKit both detect small fragmented regions.

FIG. 1.

Boxplot of number of CpGs in the DMRs. Number of CpGs in the true DMRs and in the DMRs detected by the seven methods compared for the simulated data. DMRs, differentially methylated regions.

To evaluate the accuracy of the methods on the simulated data, we plot the receiver operating characteristic (ROC) curve for both the local precision (Fig. 2a) and the local recall (Fig. 2b). The local precision requires that a large percentage of the detected region overlaps a true DMR (easier for shorter detected regions and conservative methods), whereas the local recall requires that a large percentage of the true region be overlapped by a detected region (easier for longer detected regions).

FIG. 2.

ROC curve: method comparison on the simulated data. FPR versus TPR as we change the parameter α for (a) local precision measure and (b) local recall measure. The solid lines indicate significance levels <0.05 for HMM-Fisher, DSS, metilene, MethCP-DSS, and MethCP-methylKit, statistics threshold >4 for BSmooth (the author recommendation is 4.6), and q-values cutoffs <0.05 for methylKit. Thus, in real applications, we only focus on the regions of solid curves. For the completeness of the graph, we extend the curve to larger significance levels. methylKit uses an FDR correction procedure and only reports q-values. Their FPR is reasonably controlled especially when we use the detected length as the denominator of our measure ( $F P R^{d}$ ). FDR, false discovery rate; FPR, false positive rate; TPR, true positive rates.

MethCP-DSS and MethCP-methylKit detect highly similar regions despite using different test statistics as input. For small α, MethCP and metilene achieve the highest TPR across all of the methods. And both $T P R^{d}$ and $T P R^{t}$ are close to 1, which suggests that most true and detected regions match in pairs, with slight disagreement in the region border. This is not the case for the other four methods, which for a given FPR are usually strong in either $T P R^{d}$ or $T P R^{t}$ , but not both, which is evidence that either a proportion of the ground truth regions are not detected ( $T P R^{d}$ high but $T P R^{t}$ low), or a proportion of detected regions are not overlapping the ground truth ( $T P R^{t}$ high but $T P R^{d}$ low). DSS and BSmooth behave similarly in that $T P R^{t}$ varies little with α while $T P R^{d}$ decreases dramatically with the increase of α. This is an indication that both methods detect larger regions than the truth, which is shown in Figure 1. Despite detecting regions wider than the true regions, BSmooth misses at least 20% of the true regions, as indicated by the values of $T P R^{t}$ , whereas DSS misses a smaller proportion.

HMM-Fisher and methylKit exhibit the opposite behavior, calling fewer and smaller regions significant, but as a result not obtaining good coverage of the true regions. The DMRs identified by these two methods are generally a subset of the true regions as indicated by the high values of $T P R^{d}$ regardless of the significance level α. However, they also miss a good number of regions as shown in their lower $T P R^{t}$ (Fig. 2b). Both metilene and MethCP rely on segmentation procedures, and metilene achieves better performance than the other competing methods, with high levels of both $F P R^{d}$ and $F P R^{t}$ , although metilene still gives smaller TPR than MethCP.

However, in addition to assessing their overall sensitivity and specificity, we can consider whether the methods actually control the FPR at the desired level, and here we can see an even stronger difference between metilene and MethCP. In particular, for Figure 2a and b, the sold portions of the precision–recall curves indicate where the cutoff was due to the p-value cutoff being <0.05, meaning that control of the FPR at level 0.05 would be indicated by solid portion of the curve not extending beyond the true FPR of 0.05 given in the x-axis. For metilene, however, the solid curve continues beyond true FPR of 0.1 (see “Unzoomed” metilene plot), indicating that metilene is not accurately controlling the FPR. MethCP, in contrast, does not have this property, and is quite conservative, despite having a higher TPR than the other methods.

3.1.2. A simulation for small effect size regions

Recent studies (Eichten and Springer, 2015; Leenen et al., 2016; Breton et al., 2017) have shown that small-magnitude effect sizes are functionally important for DNA methylation and are associated with specific phenotypes. Furthermore, in plants, there exist other contexts of methylation other than CpG pairing (CHG and CHH methylation), where the baseline levels can be much lower, and hence their changes are also much smaller in scale, usually <10%. However, related discoveries have been hampered by a lack of DMR-calling tool addressing this issue (Eichten and Springer, 2015; Breton et al., 2017). We now show through a simulation study that MethCP is the only method of those we consider that is capable of accurately detecting regions with small (<10%) changes in DNA methylation. In fact, none of the DMR detection methods we consider here, other than metilene, even identify any regions with methylation differences <10%, so we focus our comparison only on metilene. We simulated data sets with DMRs that have 2.5%, 5%, 10%, and 20% changes in methylation, and their lengths vary from 25 methylation cytosines to 400 cytosines. As shown in Figure 3, MethCP achieves a high TPR while keeping the FPR low. MethCP consistently has higher TPR and lower FPR throughout the range of values.

FIG. 3.

A comparison between metilene and MethCP for the small effect size simulation. Upper panel shows $T P R^{J}$ and the lower panel shows $F P R^{J}$ (α = 0.5). On the x-axis are the simulated methylation differences between control and treatment groups. The five columns show simulated DMR length from short DMRs (25 cytosines) to large DMRs (400 cytosines).

3.2. Arabidopsis data set

To further illustrate the performance of MethCP on real data sets, we consider their performance on actual WGBS data set. We do not have WGBS with actual ground truth (i.e., knowledge of where DMR regions are and are not located), so we use a data set with a relatively large number of replicates that allow us to use permutation methods to give some idea of the performance of the tests; we combine this with a consideration of whether the important features seen in our simulation results are mimicked in real data. We use a WGBS on Arabidopsis Thaliana [(Coleman-Derr and Zilberman, 2012), Gene Expression Omnibus (GEO) accession number GSE39045] that contained six biological replicates each of a wild-type control line and an H2A.Z mutant line.

Figure 4a shows the boxplot of the size of the DMRs detected (i.e., number of CpGs) under significance level 10⁻² and FDR-corrected level 10⁻². We see that the sizes of the regions are mostly fairly consistent across the different significance levels, and that the relative sizes between methods resemble that of the simulated data. Namely, DSS and BSmooth detect large regions, HMM-Fisher and methylKit identified small regions, and MethCP-DSS and MethCP-methylKit identified highly similar regions despite different test statistics they use. metilene, stands out as sensitive to the significance level, with smaller required significance levels resulting in more small regions being reported (and HMM-Fisher finds no DMRs at more stringent significance levels).

FIG. 4.

(a, b) Summary of DMRs detected for Arabidopsis data set. (a) Boxplot of the number of CpGs in the DMRs. (b) Number of DMRs detected by different methods. We summarize the results under significance level 10⁻² and FDR-corrected level 10⁻². BSmooth and HMM-Fisher are colored because we use the author-recommended test statistic cutoff (4.6) for BSmooth and significance level 0.05 for HMM-Fisher on three plots (small significance level for HMM-Fisher returns no DMR). (c–f) Permutation results for Arabidopsis data set. (c) Permutation 1, number of DMRs detected. (d) Permutation 2, number of DMRs detected. (e) Permutation 1, the proportion of significant CpGs. (f) Permutation 2, the proportion of significant CpGs. Methods other than BSmooth uses significance level 0.01, and BSmooth uses the author-recommended test statistic cutoff (4.6).

To compare the FPs produced by six methods, we create a null data set, that is, one with no region of cytosines different between the two groups, by randomly assigning the six controls into two groups of three. Individual differences between the mean methylation levels, combined with the fact that nearby sytosines have similar methylation levels in a sample, meaning that by chance there are regions of cytosines that are different between the two random groups. Therefore, we further permute the data in two different ways to create null data sets that have no DMRs between the two groups. The first method permutes the sequencing counts (methylated and unmethylated count pairs) across the samples for each CpG position (Permutation 1). The second method permutes the cytosine positions but keeps the sample labels, thus breaking any residual spatial signal between neighboring CpGs (Permutation 2). Figure 4 shows the number of DMRs and proportion of CpGs detected by each method. In both permutations, MethCP has a small number of DMRs relative to the other methods. Indeed for the second permutation, MethCP detects 0 DMRs. For the first permutation, only HMM-Fisher finds fewer, but HMM-Fisher also appears to have less power in detecting the real differences, Figure 4b. If we further consider how well MethCP assigns individual cytosines correctly to DMRs (i.e., its ability to detect DMCs), then MethCP also results in fewer false assignments of individual cytosines than the other methods, which is surprising since it is not a DMC method.

We would note that despite detecting fewer FP DMRs in our permutation analysis, MethCP still remains competitive, as compared with the other methods, in terms of the number of DMRs it finds on the real data (Fig. 4b), indicating that it is not suffering from a lack of power. Both DSS and methylKit find more regions, but we see from our permutation analysis that DSS and methylKit tend to find many more FPs than MethCP. Furthermore, the regions found by methylKit are small (Fig. 4a), suggesting that like in the simulations methylKit may be missing or fragmenting large parts of the true DMRs.

3.3. Time-course data set

A strength of MethCP is that it is flexible for a wide variety of experimental designs, because it only requires an appropriate per-cytosine test statistic, which can be calculated by standard tests on a per-cytosine basis. We demonstrate this utility on a seed germination data set from A. thaliana ((Kawakatsu et al., 2017), GEO accession number GSE94712). The data are generated from tracking over time germinating seeds after the dry seeds are given water. Two experimental conditions are considered: wild-type plants (Col-0) and ros1, dml2, and dml3 triple demethylase mutants plants (rdd). ROS1, DML2, and DML3 are closely related DNA demethylation enzymes that mainly act in vegetative tissues. Two replicates were collected at each of 0–4 days after introduction of water (DAI), resulting in six time points, including the dry seed.

The original authors did not conduct a DMR analysis between these groups. Rather they did a two-group test of the overall methylation levels by grouping together the summarized methylation ratios and testing for the difference in distributions at each time point. They saw a modest overall increase between the two groups at each time point. We perform a DMR analysis with MethCP by fitting per CpG a linear model on the arcsine-transformed methylation ratios and choosing the statistic T_k to be the difference between the time coefficient of Col-0 and rdd groups. Appendix Figure A1 shows the histogram of DMR length. Unlike the length distribution given in Section 3.2, we detect some very large DMRs with small changes of methylation over time. Our ability to pick up both small and long regions gives us the ability to see multiple effects. We see the overall general increase, reported by the authors, represented by the long regions of small increase (Fig. 5, regions in black). But we also detect some smaller regions with an opposite pattern of decrease in methylation in the mutant samples (Fig. 5, regions in gray).

FIG. 5.

Heatmap of DMRs for the seed germination data set. Heatmap of average methylation levels in the DMRs detected. Here we show the results for CpG context. For each condition and each DMR, we subtract the ratios by the average dry seed methylation ratios, so that the heatmap better shows the changes with time. We annotate the DMRs with >200 CpGs as large regions.

4. Conclusion

We proposed a method MethCP for identifying DMRs. We presented the results of MethCP-methylKit and MethCP-DSS on simulated and real data sets. And we showed that MethCP gives better accuracy and a lower number of FPs than existing methods. We also show that MethCP is the only method that detects large DMRs with small effect sizes, which is prevalent in DMR methylation data. Other than comparing two groups, we presented an example of a time-course study with MethCP. Our framework is in principle flexible for general experimental design, assuming that an appropriate single-cytosine test statistic can be calculated. Thus the method can be expanded immediately to more complicated situations, such as comparing multiple groups or measurements of methylation status over time or developmental progression.

Software Availability

The package MethCP has been submitted to Bioconductor, and is currently available at https://github.com/boyinggong/MethCP

Footnotes

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

This study was supported, in part, by a DOE BER grant, DE-SC0014081.

Appendix

References

Akalin

, Kormaksson

, Li

, et al. 2012. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 13, R87.

Borenstein

, Hedges

L.V.

, Higgins

, et al. 2011. Introduction to Meta-analysis. John Wiley & Sons, Chichester, England.

Breton

C.V.

, Marsit

C.J.

, Faustman

, et al. 2017. Small-magnitude effect sizes in epigenetic end points are important in children's environmental health studies: The Children's Environmental Health and Disease Prevention Research Centera's Epigenetics Working Group. Environ. Health Persp. 125, 511.

Coleman-Derr

, and Zilberman

2012. Deposition of histone variant H2A. Z within gene bodies regulates responsive genes. PLoS Genet. 8, e1002988.

Cruickshanks

H.A.

, McBryan

, Nelson

D.M.

, et al. 2013. Senescent cells harbour features of the cancer epigenome. Nat. Cell Biol. 15, 1495.

Dolzhenko

, and Smith

A.D.

2014. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinformatics, 15, 215.

Eichten

S.R.

, and Springer

N.M.

2015. Minimal evidence for consistent changes in maize DNA methylation patterns following environmental stress. Front. Plant Sci. 6, 308.

Feng

, Conneely

K.N.

, and Wu

2014. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res. 42, e69.

Fisher

R.A.

1992. Statistical Methods for Research Workers. In Breakthroughs in Statistics, pp. 66–70. Springer, New York, NY.

10.

Hansen

K.D.

, Langmead

, and Irizarry

R.A.

2012. BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13, R83.

11.

Hebestreit

, Dugas

, and Klein

H.-U.

2013. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics, 29, 1647–1653.

12.

Huang

, and Dom

1995. Quantitative methods of evaluating image segmentation. In: Proceedings of the International Conference on Image Processing, 1995, vol. 3. IEEE, Washington, DC. pp. 53–56.

13.

Jühling

, Kretzmer

, Bernhart

S.H.

, et al. 2016. metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res. 26, 256–262.

14.

Kawakatsu

, Nery

J.R.

, Castanon

, et al. 2017. Dynamic DNA methylation reconfiguration during seed development and germination. Genome Biol. 18, 171.

15.

Leenen

F.A.

, Muller

C.P.

, and Turner

J.D.

2016. DNA methylation: Conducting the orchestra from exposure to phenotype?. Clin. Epigenetics, 8, 92.

16.

Olshen

A.B.

, Venkatraman

, Lucito

, et al. 2004. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557–572.

17.

Park

, and Wu

2016. Differential methylation analysis for BS-seq data under general experimental design. Bioinformatics, 32, 1446–1453.

18.

Pont-Tuset

, and Marques

2016. Supervised evaluation of image segmentation and object proposal techniques. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1465–1478.

19.

Shafi

, Mitrea

, Nguyen

, et al. 2018. A survey of the approaches for identifying differential methylation using bisulfite sequencing data. Brief. Bioinform, 19, 737–753.

20.

Stouffer

S.A.

, Suchman

E.A.

, DeVinney

L.C.

, et al. 1949. The American Soldier: Adjustment During Army Life. (Studies in Social Psychology in World War II), vol. 1. Princeton, NJ: Princeton University Press.

21.

Sun

, and Yu

2016. HMM-Fisher: Identifying differential methylation using a hidden Markov model and Fisher's exact test. Stat. Appl. Genet. Mol. Biol. 15, 55–67.

22.

Teschendorff

A.E.

, and Relton

C.L.

2018. Statistical and integrative system-level analysis of DNA methylation data. Nat. Rev. Genet. 19, 129.

23.

Whitlock

M.C.

2005. Combining probability from independent tests: The weighted Z-method is superior to Fisher's approach. J. Evolution. Biol. 18, 1368–1373.

24.

, Xu

, Feng

, et al. 2015. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res. 43, e141.

25.

, and Sun

2016a. Comparing five statistical methods of differential methylation identification using bisulfite sequencing data. Stat. Appl. Genet. Mol. Biol. 15, 173–191.

26.

, and Sun

2016b. HMM-DM: Identifying differentially methylated regions using a hidden Markov model. Stat. Appl. Genet. Mol. Biol. 15, 69–81.