In this vignette, we’ll first explore the BSseq class of objects for storing bisulfite sequencing data in R. Then we’ll go through a reanalysis of a subset of the data from (Ford et al. 2017), which concluded that there was not an appreciable reduction in transcription following forced genomewide promoter methylation. The full reanalysis is detailed in (Korthauer and Irizarry 2018). The analysis will make use of the dmrseq Bioconductor package (Korthauer et al. 2018).

1 Download Ford et al. Data

In the interest of computational efficiency, for this exercise we’ll start with loading a BSseq object and only work with chromosome 20. We’ll also look only at the control and induced samples (ignoring the withdrawn samples).

Note: the dmrseq function in the DMR Analysis section will still take some time to run only on a single chromosome, since the p-value calculation involves permutation. On your laptop, you’ll want to go get a hot drink while it runs. On a cluster, you can use multiple cores to obtain a faster runtime.

How this data was processed: The original WGBS methylation counts for all cytosines are provided on GEO (accession number GSE102395). The CpG counts were extracted from these files and the data was read into R and bundled as a BSseq object. Metadata obtained from the getGEO() function and parsed from the filenames was added to the pData slot. You can find the code used to carry out these steps in this Rmd.

dblink <- "https://www.dropbox.com/s/heipl6nsy90oekq/bsseq_chr20.rds?dl=1"
dir.create("../data")
bsfile <- "../data/bsseq_chr20.rds"
if (!file.exists(bsfile))
  download.file(dblink, dest=bsfile)

Read in chromosome 20 BSseq object and check out the metadata in the pData slot.

library(bsseq)
bs <- readRDS("../data/bsseq_chr20.rds")
bs

## An object of type 'BSseq' with
##   711200 methylation loci
##   6 samples
## has not been smoothed
## All assays are in-memory

colnames(pData(bs))

##  [1] "title"                  "geo_accession"         
##  [3] "status"                 "submission_date"       
##  [5] "last_update_date"       "type"                  
##  [7] "channel_count"          "source_name_ch1"       
##  [9] "organism_ch1"           "characteristics_ch1"   
## [11] "characteristics_ch1.1"  "characteristics_ch1.2" 
## [13] "characteristics_ch1.3"  "treatment_protocol_ch1"
## [15] "condition"              "Sample"

2 `BSseq` Object Anatomy

Here we’ll go over basic operations on objects of the class BSseq.

class(bs)

## [1] "BSseq"
## attr(,"package")
## [1] "bsseq"

Analogous to SummarizedExperiment and SingleCellExperiment, BSseq is a class for housing bisulfite sequencing data along with its metadata. The main components of the class are:

M: matrix of methylated counts (dimension loci \(\times\) samples)
Cov: matrix of coverage counts (dimension loci \(\times\) samples)
chr: vector of chromosome names (length loci)
pos: vector of basepair positions (length loci)
pData: data frame of sample metadata (dimension samples \(\times\) any number of attributes)
sampleNames: vector of unique sample identifiers (length samples)

These are used to construct a new object. For example:

M <- matrix(0:8, 3, 3)
Cov <- matrix(1:9, 3, 3)
chr <- c("chr1", "chr2", "chr1")
pos <- c(1,2,3)
sampleNames = c("A","B", "C")
pData <- data.frame(condition = 1:3,
                    row.names = sampleNames)
testBS <- BSseq(M = M, Cov = Cov,
      chr = chr, pos = pos,
      sampleNames = sampleNames,
      pData=pData)

They can all also be accessed via the following getter methods:

getCoverage(): access M and Cov count matrices (use type="M" and type="Cov", respectively)
seqnames(): access chromosome information (stored in an Rle object; use as.character to coerce to character vector)
start(): access basepair position information
sampleNames(): access vector of sample names
pData(): access metadata data frame

# get M count
getCoverage(testBS, type="M")

##      A B C
## [1,] 0 3 6
## [2,] 1 4 7
## [3,] 2 5 8

# get Coverage count
getCoverage(testBS, type="Cov")

##      A B C
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9

# get chr
seqnames(testBS)

## factor-Rle of length 3 with 3 runs
##   Lengths:    1    1    1
##   Values : chr1 chr2 chr1
## Levels(2): chr1 chr2

# get position
start(testBS)

## [1] 1 2 3

# get sample names
sampleNames(testBS)

## [1] "A" "B" "C"

# get sample metadata
pData(testBS)

## DataFrame with 3 rows and 1 column
##   condition
##   <integer>
## A         1
## B         2
## C         3

Note that you can subset the object by loci (rows) and/or samples (columns), just as you would a SummarizedExperiment object, and all the existing slots will be carry through the subsetting accordingly.

testBS[2,]

## An object of type 'BSseq' with
##   1 methylation loci
##   3 samples
## has not been smoothed
## All assays are in-memory

2.1 Exercise

Extract the methylation counts matrix from the bs object. Notice what class it is in.

getCoverage(bs, type="M")

## <711200 x 6> DelayedMatrix object of type "double":
##           MCF7_emptyVector_doxInduced_rep1_WGBS ...
##      [1,]                                     0   .
##      [2,]                                     6   .
##      [3,]                                     5   .
##      [4,]                                     0   .
##      [5,]                                     1   .
##       ...                                     .   .
## [711196,]                                     0   .
## [711197,]                                     2   .
## [711198,]                                     3   .
## [711199,]                                     6   .
## [711200,]                                    14   .
##                  MCF7_ZF_DNMT3A_noDox_rep1_WGBS
##      [1,]                                     0
##      [2,]                                     2
##      [3,]                                     2
##      [4,]                                     0
##      [5,]                                     0
##       ...                                     .
## [711196,]                                     0
## [711197,]                                     0
## [711198,]                                     0
## [711199,]                                     0
## [711200,]                                     1

class(getCoverage(bs, type="M"))

## [1] "DelayedMatrix"
## attr(,"package")
## [1] "DelayedArray"

Using the pData() function, tabulate how many samples are in each condition (stored in the condition column).

table(pData(bs)$condition)

## 
##    Control Methylated 
##          3          3

2.2 Test for DMC

Here we’ll carry out a simple test for differentially methylated cytosine at a specific locus using beta-binomial regression. Note that this cytosine was cherry-picked specifically for illustrative purposes (it was already discovered to be differentially methylated by region level analysis).

ix <- 533680
df <- data.frame(m = getCoverage(bs, type = "M")[ix,],
                 cov = getCoverage(bs, type = "Cov")[ix,],
                 prob = getMeth(bs, type = "raw")[ix,],
                 treatment = pData(bs)$condition)
df

##    m cov       prob  treatment
## 1  2  52 0.03846154    Control
## 2  0  32 0.00000000    Control
## 3 46  72 0.63888889 Methylated
## 4  8  24 0.33333333 Methylated
## 5  6  16 0.37500000 Methylated
## 6  0   7 0.00000000    Control

library(aod)
fit <- betabin(cbind(m, cov-m) ~ treatment, ~ treatment, data=df)
summary(fit)

## Beta-binomial model
## -------------------
## betabin(formula = cbind(m, cov - m) ~ treatment, random = ~treatment, 
##     data = df)
## 
## Convergence was obtained after 997 iterations.
## 
## Fixed-effect coefficients:
##                       Estimate Std. Error    z value Pr(> |z|)
## (Intercept)         -3.794e+00  7.145e-01 -5.310e+00 1.097e-07
## treatmentMethylated  3.698e+00  7.965e-01  4.642e+00 3.443e-06
## 
## Overdispersion coefficients:
##                          Estimate Std. Error   z value   Pr(> z)
## phi.treatmentControl    1.516e-06  2.000e-13 0.000e+00 1.000e+00
## phi.treatmentMethylated 5.478e-02  6.089e-02 8.996e-01 1.842e-01
## 
## Log-likelihood statistics
##    Log-lik      nbpar    df res.   Deviance        AIC       AICc 
## -1.103e+01          4          2  8.108e+00  3.007e+01  7.007e+01

2.3 Exercise

What is the estimated difference in methylation levels between the methylated and control samples at this locus?

# controls
exp(coef(fit)[1]) /  (1+exp(coef(fit)[1]))

## (Intercept) 
##  0.02201179

# treated
exp(sum(coef(fit))) /  (1+exp(sum(coef(fit))))

## [1] 0.4760134

2.4 Exploratory analysis

Here will look at some sample similarity metrics among the samples to verify that control samples are most similar to controls and ZF dox samples are more similar to other ZF dox samples (this is indeed the case from the correlation matrix and clustering dendrogram below).

library(DelayedMatrixStats)
library(ggplot2)
library(dendextend)
library(dplyr)
library(tidyr)
cov.mat <- getCoverage(bs, type="Cov")
filter <- pmax( 1*(rowSums2(cov.mat[,pData(bs)$condition == "Control"]) >= 5),
                1*(rowSums2(cov.mat[,pData(bs)$condition == "Methylated"]) >= 5))
filter <- which(filter > 0)
bs.filt <- bs[-filter,]
rm(cov.mat)
cormat <- round(cor(as.matrix(getMeth(bs.filt, type="raw")),
                    use = "pairwise.complete.obs", 
                    method = "spearman"),2)
rownames(cormat) <- colnames(cormat) <- labs <- paste0(pData(bs)$condition, 
                                               pData(bs)$dox, 
                                               "_Sample", 1:ncol(cormat))
cormat <- data.frame(cormat) %>%
  mutate(Sample1 = labs) %>%
  gather("Sample2", "Correlation", 1:ncol(cormat))
cormat$Sample1 <- factor(cormat$Sample1) 
cormat$Sample2 <- factor(cormat$Sample2)
ggplot(data = cormat, aes(x=Sample1, y=Sample2, fill=Correlation)) + 
  geom_tile() +
  scale_fill_gradient(low="white", high="red") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# dendrogram
d <- hclust(dist(t(as.matrix(as.data.frame(getMeth(bs.filt, type="raw")))),
                 method="euclidean"))
d$labels <- pData(bs)$condition
dend <- as.dendrogram(d)
labels_colors(dend) <-as.numeric(pData(bs)$condition)[order.dendrogram(dend)]
plot(dend)

rm(bs.filt)

Plots of coverage by group and sample:

library(dmrseq)
plotEmpiricalDistribution(bs, 
                          bySample = TRUE,
                          testCovariate = "condition",
                          type = "Cov") +
  guides(linetype=FALSE)

Plots of beta values by group and sample:

plotEmpiricalDistribution(bs, 
                          bySample = TRUE,
                          testCovariate = "condition",
                          adj = 3) +
  guides(linetype=FALSE)

3 DMR Analysis

We will use dmrseq to identify Differentially Methylated Regions (DMRs).

First we will filter the loci by coverage (remove those with zero coverage in at least one condition).

cov.mat <- getCoverage(bs, type="Cov")
filter <- pmax( 1*(rowSums2(cov.mat[,pData(bs)$condition == "Control"]) == 0),
                1*(rowSums2(cov.mat[,pData(bs)$condition == "Methylated"]) == 0))
filter <- which(filter > 0)
bs <- bs[-filter,]
rm(cov.mat)

This removed 3788 out of 707412 loci ( 0.54%). Now we’ll run dmrseq.

library(BiocParallel)
register(MulticoreParam(1))
set.seed(399)
resfile <- file.path("../data", "regions.rds")
if (!file.exists(resfile)){
  regions <- dmrseq(bs, testCovariate = "condition",
                    bpSpan = 500,
                    maxGap = 500,
                    maxPerms = 10)
  saveRDS(regions, file=resfile)
}else{
  regions <- readRDS(resfile)
}
sum(regions$qval < 0.1)

## [1] 639

regions$meanDiff <- meanDiff(bs, dmrs=regions, testCovariate="condition")

We’ll add the raw mean methylation differences to the region summaries and plot the top 10 DMRs.

regions$meanDiff <- meanDiff(bs, dmrs=regions, testCovariate="condition")
plotDMRs(bs, regions=regions[1:10,], testCovariate="condition")

3.1 Exercise

Plot a histogram of the sizes (width in basepairs) of the regions that are significant at the 0.05 level (by qvalue).

hist(width(regions[regions$qval < 0.05,]))

4 DE Analysis

Here we carry out a differential expression analysis using the corresponding RNAseq data from (Ford et al. 2017).

First, we download the expression count matrix.

# expression counts
library(R.utils)
file <- file.path("../data", "GSE102395_MCF7_ZF_DNMT3A_countstable.txt")
if (!file.exists(file)){
  download.file(url = "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE102nnn/GSE102395/suppl/GSE102395%5FMCF7%5FZF%5FDNMT3A%5Fcountstable%2Etxt%2Egz",
              destfile = paste0(file, ".gz"),
              mode="wb")
  gunzip(paste0(file, ".gz"))
}

First we need to run the differential expression analysis.

library(data.table)
exp <- fread(file.path("../data/GSE102395_MCF7_ZF_DNMT3A_countstable.txt"))

We need to remove two duplicate genes. Examining the names of these genes, it appears it may stem from a classic Microsoft Excel conversion error (Ziemann, Eren, and El-Osta 2016). We also move the gene names to row names, and remove rows that have only zero values.

# There are duplicate genes -- looks like a classic excel conversion error -- remove these
dup <- which(table(exp$gene) > 1)
names(dup)

## [1] "Mar-01" "Mar-02"

exp <- exp %>% dplyr::filter(!(gene %in% names(dup)))
# move genes to rownames  
rownames(exp) <- exp$gene
exp <- as.matrix(exp %>% dplyr::select(-gene))
# remove genes that are all zero
allZero <- rowSums(exp==0)==ncol(exp)
exp <- exp[!allZero,]

We construct the metadata to put in the colData of the DESeq2 dataset object.

# strings to match the two comparison samples - control and ZF + DOX
ctrl <- "MCF7_emptyVector|MCF7_ZF_DNMT3A_noDOX_rep"
trt <- "MCF7_ZF_DNMT3A_DOX"
# set up colData
coldata <- data.frame(sample=colnames(exp)) %>%
  mutate(condition = ifelse(grepl(ctrl, sample), "Control",
                     ifelse(grepl(trt, sample), "Methylated", "Other")))

Here we construct the DESeq2 dataset.

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = exp,
                              colData = coldata,
                              design = ~ condition)

Since we expect (and do indeed observe - see (Korthauer and Irizarry 2018)) global shifts in the expected direction (i.e. lower expression for the methylated condition), we will not perform standard normalization using all genes. Instead, we’ll compute size factors only on a set of control genes that were at least 10kb away from a ZF binding site or putative DMR (by dmrseq, with FDR level 0.25). To find these genes, we’ll use the ZF ChIP-seq binding site data from the Supplementary table S1.

library(annotatr)
if (!file.exists(file.path("../data/170506-1.txt"))){
  download.file(url = "https://www.biorxiv.org/highwire/filestream/57972/field_highwire_adjunct_files/0/170506-1.txt",
              destfile =  file.path("../data/170506-1.txt"))
}
ts1 <- fread(file.path("../data/170506-1.txt"), header=FALSE, skip=1)  #ZF Binding sites
colnames(ts1) <- c("chr", "start", "end", "closest_gene_promoter", 
                   "distance_to_nearest_promoter",
                   "promoter_classification",
                   "CpGisland_classification", 
                   "genebody_classification",
                   "enhancer_classification")
annot = build_annotations(genome = 'hg19', annotations = 'hg19_genes_promoters')
ol <- distanceToNearest(regions[regions$qval < 0.25,], annot)
ol2 <- findOverlaps(regions[regions$qval < 0.25,], annot)
dmr_genes <- unique(c(annot$symbol[ol@to[mcols(ol)$distance <= 10000]],
                      annot$symbol[ol2@to]))
dmr_genes <- dmr_genes[!is.na(dmr_genes)]
zf_genes <- (ts1 %>%
  dplyr::filter(distance_to_nearest_promoter <= 10000))$closest_gene_promoter
exp <- data.frame(counts(dds)) %>% 
  mutate(gene = rownames(dds)) %>%
  gather(sample, count, 1:ncol(dds)) %>% 
  mutate(condition = ifelse(grepl(ctrl, sample), "Control", "Methylated")) %>%
  mutate(dox = ifelse(grepl("noDOX", sample), "No Dox", 
                      ifelse(grepl("DOXremoval", sample), "Dox withdrawal", 
                             "Dox")))
norm_genes <- unique((exp %>%
  dplyr::filter(!(gene %in% unique(c(dmr_genes, zf_genes)))))$gene)
str(norm_genes)

##  chr [1:13527] "A1BG" "A1BG-AS1" "A1CF" "A2M" "A2M-AS1" "A2ML1" "A2MP1" ...

We only compute sizefactors on the 13527 control genes. We also filter out very lowly expressed genes (total counts in all samples together fewer than 10 or counts of zero in more than half of the samples), and include only genes on chromosome 20. We’ll use the 8 no dox control versus the 4 dox methylated group for the DE comparison.

dds <- dds[, dds$condition == "Methylated" | 
             (dds$condition == "Control" & grepl("noDOX", dds$sample)) ]
dds <- dds[rowSums(counts(dds)) >= 10,]
dds <- dds[rowSums(counts(dds)==0) <= 0.5,]
dds <- estimateSizeFactors(dds, 
                           controlGenes = rownames(dds) %in% norm_genes)
table(dds$condition)

## 
##    Control Methylated 
##          8          7

rowData(dds)$control_expr <- rowMeans(counts(dds, normalize = TRUE)[,dds$condition == "Control"])
library(EnsDb.Hsapiens.v86)
location <- mapIds(EnsDb.Hsapiens.v86, keys=rownames(dds), 
    column="SEQNAME", keytype="SYMBOL")
rowData(dds)$chr <- location
dds <- dds[rowData(dds)$chr == "20" & !is.na(rowData(dds)$chr),]

Finally, we actually run DEseq2.

dds <- DESeq(dds)
res <- results(dds)
sum(res$padj < 0.05, na.rm = TRUE)

## [1] 102

res$control_expr <- rowData(dds)$control_expr
dds <- dds[!is.na(res$padj)]
res <- res %>% na.omit()

5 Methylation vs Expression

Here we’ll create several figures exploring the relationship between methylation and expression, using effect size (methylation proportion difference) as the statistic of interest to rank DMRs.

5.1 Associate dmrseq DMRs with genes

Next we need to associate each DMR with a gene by checking overlap with promoters. We’ll use the promoter annotation from annotatr.

regions.sig <- regions[regions$qval < 0.1,]
ol <- distanceToNearest(regions.sig, annot)
dmrs <- regions.sig[ol@from[mcols(ol)$distance <= 2000],]
dmrs$gene <- annot$symbol[ol@to[mcols(ol)$distance <= 2000]]
dmrs$prom <- annot[ol@to[mcols(ol)$distance <= 2000],]
dmrs <- dmrs[!is.na(dmrs$gene),]

Now that we have a set of DMR-Gene associations, we’ll add the the DE information to the DMR data frame.

# add DE results
baseMeanPerLvl <- sapply( levels(dds$condition), 
                          function(lvl) 
                            rowMeans(counts(dds,normalized=TRUE)[,dds$condition == lvl] ))
res <- cbind(res, baseMeanPerLvl)
x <- match(dmrs$gene, rownames(res)) 
dmrs <- dmrs[!is.na(x),]
x <- x[!is.na(x)]
res.dmrs <- res[x,]
dmrs <- cbind(DataFrame(dmrs), res.dmrs)

There are 70 DE genes (at DESeq2 FDR < 0.05) with dmrseq FDR < 0.05 that are located within 2kb of a gene promoter.

5.2 Scatterplot of methylation versus expression

Now we create the scatterplot of methylation versus expression for dmrs.

dmrsa <- data.frame(dmrs@listData) %>% 
  dplyr::filter(control_expr > 50) %>%
  mutate(sig = padj < 0.05,
         FC = 2^log2FoldChange,
         delta_mC = meanDiff) %>%
  mutate(ptcol = ifelse(qval > 0.05, "Not significant (DMR)", 
                    ifelse(sig, "DMR and DE", "Not significant (DE)"))) %>%
  dplyr::filter(!is.na(log2FoldChange) & !is.na(sig) & !is.na(delta_mC)) 
ggplot(dmrsa, aes(x = delta_mC, y = log2FoldChange)) +
  geom_hline(yintercept=0, col="black") +
  geom_hline(yintercept=0, col="white", linetype="dashed") +
  geom_point(size=0.5, alpha=0.75, aes(color = sig)) + 
  theme_bw() + 
  xlab(expression(paste(Delta, "mCG in DMR"))) +
  ylab("log2 fold change mRNA abundance") +
  scale_color_manual(values=c("black", "red")) +
  geom_smooth(method = "loess", span = 1) +
  labs(color="Differentially\n Expressed")  +
  ggtitle("dmrseq DMRs (chr20)") +
  geom_vline(xintercept=0.30, 
             linetype="dashed", color="grey20")

From this figure we see that more genes with high increased methylation percentage (above 30%) are transcriptionally repressed. Specifically, 55 genes with 30% methylation increase showed decreased expression (62.5%, odds of 1.67). Considering only DE at FDR 0.05, 20 genes with 30% methylation increase showed decreased expression (69%, odds of 2.22)

5.3 Exercise

What is the pearson correlation between absolute methylation difference and expression foldchange?

cor(dmrsa$delta_mC, dmrsa$log2FoldChange)

## [1] -0.1996709

5.4 Scatterplot with dmrseq statistic

Here we’ll create a scatterplot exploring the relationship between methylation and expression using the dmrseq statistic as the statistic of interest to rank DMRs. This should be more robust than ordering regions simply on their average methylation difference.

ggplot(dmrsa, aes(x = stat, y = log2FoldChange)) +
  geom_hline(yintercept=0, col="black") +
  geom_hline(yintercept=0, col="white", linetype="dashed") +
  geom_point(size=0.5, alpha=0.75, aes(color = ptcol)) + 
  theme_bw() + 
  xlab("Region test statistic") +
  ylab("log2 fold change mRNA abundance") +
  scale_color_manual(values=c("red", "black", "grey")) +
  geom_smooth(method = "loess", span = 1) +
  labs(color="Significance")  +
  ggtitle("dmrseq DMRs (chr20)") +
  geom_vline(xintercept=min(dmrsa$stat[dmrsa$qval<0.05]), 
             linetype="dashed", color="grey20")

We can see that the relationship between increased methylation and decreased expression is even clearer using the dmrseq region statistic.

5.5 Exercise

What is the pearson correlation between methylation statistic and expression fold change?

cor(dmrsa$stat, dmrsa$log2FoldChange)

## [1] -0.2364986

6 Session Information

sessionInfo()

## R version 3.6.0 (2025-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] EnsDb.Hsapiens.v86_2.99.0              
##  [2] ensembldb_2.9.2                        
##  [3] AnnotationFilter_1.9.0                 
##  [4] org.Hs.eg.db_3.8.2                     
##  [5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
##  [6] GenomicFeatures_1.37.3                 
##  [7] AnnotationDbi_1.47.0                   
##  [8] annotatr_1.11.1                        
##  [9] DESeq2_1.25.5                          
## [10] data.table_1.12.2                      
## [11] dmrseq_1.5.12                          
## [12] tidyr_0.8.3                            
## [13] dplyr_0.8.3                            
## [14] dendextend_1.12.0                      
## [15] ggplot2_3.2.0                          
## [16] DelayedMatrixStats_1.7.1               
## [17] aod_1.3.1                              
## [18] bsseq_1.21.1                           
## [19] SummarizedExperiment_1.15.5            
## [20] DelayedArray_0.11.2                    
## [21] BiocParallel_1.19.0                    
## [22] matrixStats_0.54.0                     
## [23] Biobase_2.45.0                         
## [24] GenomicRanges_1.37.14                  
## [25] GenomeInfoDb_1.21.1                    
## [26] IRanges_2.19.10                        
## [27] S4Vectors_0.23.17                      
## [28] BiocGenerics_0.31.4                    
## [29] BiocStyle_2.13.2                       
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.4               Hmisc_4.2-0                  
##   [3] AnnotationHub_2.17.3          BiocFileCache_1.9.1          
##   [5] plyr_1.8.4                    lazyeval_0.2.2               
##   [7] splines_3.6.0                 digest_0.6.19                
##   [9] foreach_1.4.4                 htmltools_0.3.6              
##  [11] viridis_0.5.1                 checkmate_1.9.3              
##  [13] magrittr_1.5                  memoise_1.1.0                
##  [15] BSgenome_1.53.0               cluster_2.0.8                
##  [17] limma_3.41.6                  annotate_1.63.0              
##  [19] Biostrings_2.53.0             readr_1.3.1                  
##  [21] R.utils_2.9.0                 askpass_1.1                  
##  [23] prettyunits_1.0.2             colorspace_1.4-1             
##  [25] blob_1.1.1                    rappdirs_0.3.1               
##  [27] xfun_0.8                      crayon_1.3.4                 
##  [29] RCurl_1.95-4.12               genefilter_1.67.1            
##  [31] survival_2.44-1.1             iterators_1.0.10             
##  [33] glue_1.3.1                    registry_0.5-1               
##  [35] gtable_0.3.0                  zlibbioc_1.31.0              
##  [37] XVector_0.25.0                Rhdf5lib_1.7.2               
##  [39] HDF5Array_1.13.2              scales_1.0.0                 
##  [41] DBI_1.0.0                     rngtools_1.4                 
##  [43] bibtex_0.4.2                  Rcpp_1.0.1                   
##  [45] htmlTable_1.13.1              viridisLite_0.3.0            
##  [47] xtable_1.8-4                  progress_1.2.2               
##  [49] bumphunter_1.27.0             foreign_0.8-71               
##  [51] bit_1.1-14                    Formula_1.2-3                
##  [53] htmlwidgets_1.3               httr_1.4.0                   
##  [55] RColorBrewer_1.1-2            acepack_1.4.1                
##  [57] pkgconfig_2.0.2               XML_3.98-1.20                
##  [59] R.methodsS3_1.7.1             nnet_7.3-12                  
##  [61] dbplyr_1.4.2                  locfit_1.5-9.1               
##  [63] tidyselect_0.2.5              labeling_0.3                 
##  [65] rlang_0.4.0                   reshape2_1.4.3               
##  [67] later_0.8.0                   munsell_0.5.0                
##  [69] tools_3.6.0                   RSQLite_2.1.1                
##  [71] evaluate_0.14                 stringr_1.4.0                
##  [73] yaml_2.2.0                    outliers_0.14                
##  [75] knitr_1.23                    bit64_0.9-7                  
##  [77] purrr_0.3.2                   nlme_3.1-139                 
##  [79] doRNG_1.7.1                   mime_0.7                     
##  [81] R.oo_1.22.0                   biomaRt_2.41.5               
##  [83] rstudioapi_0.10               compiler_3.6.0               
##  [85] curl_3.3                      interactiveDisplayBase_1.23.0
##  [87] geneplotter_1.63.0            tibble_2.1.3                 
##  [89] stringi_1.4.3                 lattice_0.20-38              
##  [91] ProtGenerics_1.17.2           Matrix_1.2-17                
##  [93] permute_0.9-5                 pillar_1.4.2                 
##  [95] BiocManager_1.30.4            bitops_1.0-6                 
##  [97] httpuv_1.5.1                  rtracklayer_1.45.1           
##  [99] R6_2.4.0                      latticeExtra_0.6-28          
## [101] bookdown_0.11                 promises_1.0.1               
## [103] gridExtra_2.3                 codetools_0.2-16             
## [105] gtools_3.8.1                  assertthat_0.2.1             
## [107] rhdf5_2.29.0                  openssl_1.4                  
## [109] pkgmaker_0.27                 withr_2.1.2                  
## [111] regioneR_1.17.3               GenomicAlignments_1.21.4     
## [113] Rsamtools_2.1.2               GenomeInfoDbData_1.2.1       
## [115] hms_0.4.2                     rpart_4.1-15                 
## [117] grid_3.6.0                    rmarkdown_1.13               
## [119] shiny_1.3.2                   base64enc_0.1-3

References

Ford, Ethan Edward, Matthew R Grimmer, Sabine Stolzenburg, Ozren Bogdanovic, Alex de Mendoza, Peggy J Farnham, Pilar Blancafort, and Ryan Lister. 2017. “Frequent Lack of Repressive Capacity of Promoter Dna Methylation Identified Through Genome-Wide Epigenomic Manipulation.” bioRxiv. Cold Spring Harbor Laboratory, 170506. https://doi.org/https://doi.org/10.1101/170506.

Korthauer, Keegan, Sutirtha Chakraborty, Yuval Benjamini, and Rafael A Irizarry. 2018. “Detection and Accurate False Discovery Rate Control of Differentially Methylated Regions from Whole Genome Bisulfite Sequencing.” Biostatistics 20 (3). Oxford University Press: 367–83. https://doi.org/https://doi.org/10.1093/biostatistics/kxy007.

Korthauer, Keegan, and Rafael A Irizarry. 2018. “Genome-Wide Repressive Capacity of Promoter Dna Methylation Is Revealed Through Epigenomic Manipulation.” BioRxiv. Cold Spring Harbor Laboratory, 381145. https://doi.org/https://doi.org/10.1101/381145.

Ziemann, Mark, Yotam Eren, and Assam El-Osta. 2016. “Gene Name Errors Are Widespread in the Scientific Literature.” Genome Biology 17 (1). BioMed Central: 177. https://doi.org/https://doi.org/10.1186/s13059-016-1044-7.

Exploring the regulatory capacity of DNA methylation

July 10, 2025

1 Download Ford et al. Data

2 `BSseq` Object Anatomy

2.1 Exercise

2.2 Test for DMC

2.3 Exercise

2.4 Exploratory analysis

3 DMR Analysis

3.1 Exercise

4 DE Analysis

5 Methylation vs Expression

5.1 Associate dmrseq DMRs with genes

5.2 Scatterplot of methylation versus expression

5.3 Exercise

5.4 Scatterplot with dmrseq statistic

5.5 Exercise

6 Session Information

References

Exploring the regulatory capacity of DNA methylation

July 10, 2025

1 Download Ford et al. Data

2 BSseq Object Anatomy

2.1 Exercise

2.2 Test for DMC

2.3 Exercise

2.4 Exploratory analysis

3 DMR Analysis

3.1 Exercise

4 DE Analysis

5 Methylation vs Expression

5.1 Associate dmrseq DMRs with genes

5.2 Scatterplot of methylation versus expression

5.3 Exercise

5.4 Scatterplot with dmrseq statistic

5.5 Exercise

6 Session Information

References

2 `BSseq` Object Anatomy