pseudogp
fits probabilistic pseudotime trajectories to two-dimensional reduced-dimension representations of genomic data using Bayesian Gaussian Process Latent Variable Models. Under the hood it uses Stan for posterior inference, and provides a few useful functions for plotting the resulting traces (notably posteriorCurvePlot
and posteriorBoxPlot
).
Let’s first trying to fit a probabilistic trajectory to a PCA representation of the Trapnell et al. (2014) dataset. This is included in the package under monocle_pca
:
library(pseudogp)
library(ggplot2)
data(monocle_le)
ggplot(data.frame(monocle_le)) + geom_point(aes(x = X1, y = X2)) + theme_bw()
Then we can do a simple fit using the fitPseudotime
function provided by pseudogp
:
le_fit <- fitPseudotime(monocle_le, smoothing_alpha = 30, smoothing_beta = 6, iter = 1000, chains = 1)
However, this takes a long time for a vignette, so we can use the le_fit
example bundled with the package as our reference:
data(le_fit)
We can plot the posterior curve in the reduced space:
posteriorCurvePlot(monocle_le, le_fit)
we can plot individual traces by setting posterior_mean = FALSE
:
posteriorCurvePlot(monocle_le, le_fit, posterior_mean = FALSE)
and we can visualise the uncertainty using a boxplot
posteriorBoxplot(le_fit)
stanfit
objectAll the posterior samples of pseudotime and kernel parameters are contained in the object returned by rstan::stan
. We can use rstan
’s built in plotting functions to examine the traces of the kernel parameters too. For example, we can look at a boxplot of the lambda parameters or plot the trace of the sigma (noise) parameters:
plot(le_fit, pars="lambda")
rstan::traceplot(le_fit, pars="sigma")
We can also extract the pseudotime traces from the object using rstan:extract
:
pst <- extract(le_fit, pars="t")$t
print(str(pst))
## num [1:500, 1:155] 0.1106 0.1466 0.09 0.0906 0.0164 ...
## - attr(*, "dimnames")=List of 2
## ..$ iterations: NULL
## ..$ : NULL
## NULL
We can then use ggplot2
to plot posterior distributions to get a handle on the uncertainty:
set.seed(1L)
to_sample <- sample(nrow(monocle_le), 4)
traces <- data.frame(pst[,to_sample])
names(traces) <- paste0("Cell",1:4)
traces_melted <- reshape2::melt(traces, variable.name="Cell", value.name="Pseudotime")
pstplt <- ggplot(traces_melted) + geom_density(aes(x = Pseudotime, fill = Cell), alpha = 0.5) +
theme_bw()
pstplt
We can also find the MAP pseudotime estimates using the posterior.mode
function from the MCMCglmm
package:
library(MCMCglmm)
library(coda)
tmcmc <- mcmc(traces)
tmap <- posterior.mode(tmcmc)
for(i in 1:length(tmap)) {
pstplt <- pstplt + geom_vline(xintercept = tmap[i], linetype = 2)
}
pstplt
Gaussian Process Latent Variable Models are brilliant, but not magic. There needs to be some structure in the representation you choose to get a consistent curve fit. Let’s look at an example of a representation with ‘structure’ and one without.
x <- runif(100, -1, 1)
y_structured <- rnorm(100, x^2, sd = 0.1)
y_unstructured <- rnorm(100, x^2, sd = 1)
dfplt <- data.frame(x, y_structured, y_unstructured)
dfmelt <- reshape2::melt(dfplt, id.vars = "x", value.name = "y")
ggplot(dfmelt) + geom_point(aes(x=x, y=y)) + facet_wrap(~variable) + theme_bw()
This method will consistently fit the structured data, but not the unstructured. To test for ‘consistency’, call fitPseudotime
with multiple chains (by using the chains
parameter) and then plot the posterior curves (posteriorCurvePlot
automatically handles multiple chains and colours each differently). If the curves from each chain line up then you have a consistent fit. If not, look to a different representation of your data or select genes to encourage structure.
## first let's load the data
data(monocle_le)
data(monocle_pca)
data(monocle_tsne)
X <- data.frame(rbind(monocle_le, monocle_pca, monocle_tsne))
names(X) <- c("x1", "x2")
X$cell <- rep(1:nrow(monocle_le), times = 3)
X$representation <- rep(c("LE", "PCA", "tSNE"), each = nrow(monocle_le))
ggplot(X) + geom_point(aes(x = x1, y = x2)) + facet_wrap( ~ representation) + theme_bw()
Now we can prepare data and call the stan fit. This is left unevaluated here due to the significant runtime. However, all plotting functions can be called as before.
data <- list(LE = monocle_le, PCA = monocle_pca, tSNE = monocle_tsne)
fit <- fitPseudotime(data, chains = 2, iter = 1000, smoothing_alpha = 12, smoothing_beta = 3)
devtools::session_info()
## setting value
## version R version 3.2.3 (2015-12-10)
## system x86_64, darwin13.4.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## tz Europe/London
## date 2016-04-06
##
## package * version date source
## ape * 3.4 2015-11-29 CRAN (R 3.2.3)
## assertthat 0.1 2013-12-06 CRAN (R 3.2.0)
## coda * 0.18-1 2015-10-16 CRAN (R 3.2.0)
## codetools 0.2-14 2015-07-15 CRAN (R 3.2.3)
## colorspace 1.2-6 2015-03-11 CRAN (R 3.2.0)
## corpcor 1.6.8 2015-07-08 CRAN (R 3.2.0)
## cowplot 0.6.1.9999 2016-03-16 Github (wilkelab/cowplot@1f87149)
## cubature 1.1-2 2013-02-25 CRAN (R 3.2.0)
## DBI 0.3.1 2014-09-24 CRAN (R 3.2.0)
## devtools 1.9.1 2015-09-11 CRAN (R 3.2.0)
## digest 0.6.9 2016-01-08 CRAN (R 3.2.3)
## dplyr 0.4.3 2015-09-01 CRAN (R 3.2.0)
## evaluate 0.8 2015-09-18 CRAN (R 3.2.0)
## formatR 1.2.1 2015-09-18 CRAN (R 3.2.0)
## ggplot2 * 2.1.0 2016-03-01 CRAN (R 3.2.4)
## gridExtra 2.2.1 2016-02-29 CRAN (R 3.2.4)
## gtable 0.2.0 2016-02-26 CRAN (R 3.2.3)
## htmltools 0.3 2015-12-29 CRAN (R 3.2.3)
## inline 0.3.14 2015-04-13 CRAN (R 3.2.0)
## knitr 1.12.3 2016-01-22 CRAN (R 3.2.3)
## labeling 0.3 2014-08-23 CRAN (R 3.2.0)
## lattice 0.20-33 2015-07-14 CRAN (R 3.2.3)
## lazyeval 0.1.10 2015-01-02 CRAN (R 3.2.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.2.0)
## Matrix * 1.2-3 2015-11-28 CRAN (R 3.2.3)
## matrixStats * 0.50.1 2015-12-15 CRAN (R 3.2.3)
## MCMCglmm * 2.22.1 2016-01-30 CRAN (R 3.2.3)
## memoise 0.2.1 2014-04-22 CRAN (R 3.2.0)
## munsell 0.4.3 2016-02-13 CRAN (R 3.2.3)
## nlme 3.1-122 2015-08-19 CRAN (R 3.2.3)
## plyr 1.8.3 2015-06-12 CRAN (R 3.2.0)
## princurve * 1.1-12 2013-04-25 CRAN (R 3.2.0)
## pseudogp * 0.1 2016-04-06 local (/@b476b43)
## R6 2.1.2 2016-01-26 CRAN (R 3.2.3)
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.2.0)
## Rcpp * 0.12.4 2016-03-26 CRAN (R 3.2.4)
## reshape2 1.4.1 2014-12-06 CRAN (R 3.2.0)
## rmarkdown 0.9.5 2016-02-22 CRAN (R 3.2.3)
## rstan * 2.9.0-3 2016-02-12 CRAN (R 3.2.3)
## scales 0.4.0 2016-02-26 CRAN (R 3.2.3)
## stringi 1.0-1 2015-10-22 CRAN (R 3.2.0)
## stringr 1.0.0 2015-04-30 CRAN (R 3.2.0)
## tensorA 0.36 2010-12-01 CRAN (R 3.2.0)
## yaml 2.1.13 2014-06-12 CRAN (R 3.2.0)