Overview of this tutorial

After going through this tutorial you should be able to:

Read 10X single-cell RNA-seq data into SingleCellExperiment objects
Know how to navigate SingleCellExperiment objects
Basic QC of single-cell RNA-seq data using scater
Create low dimensional plots (PCA, UMAP, tSNE)
Assign cells to known cell types using cellassign

Resources

This document compiled: http://kieranrcampbell.github.io/r-workshop-march-2019

Required packages

suppressPackageStartupMessages({
  library(scater) # BioConductor
  library(SingleCellExperiment) # BioConductor
  library(DropletUtils) # BioConductor
  library(tidyverse) # CRAN
  library(here) # CRAN
  library(DT) # CRAN
  library(pheatmap) # CRAN
})

## Warning: package 'tibble' was built under R version 3.5.2

## Warning: package 'purrr' was built under R version 3.5.2

## Warning: package 'pheatmap' was built under R version 3.5.2

knitr::opts_chunk$set(echo = TRUE,
                      cache = TRUE)

If you need to install any of these, you can do so by:

install.packages(c("tidyverse", "here", "DT", "pheatmap", "BiocManager"))

BiocManager::install(c("scater", "SingleCellExperiment", "DropletUtils"))

It is highly recommended you do so before the tutorial as this can take significant time!

Installation of cellassign

To install cellassign, we need to install Google’s Tensorflow framework. If this doesn’t work in the tutorial - don’t worry, you won’t need it for 80% of what we cover.

install.packages("tensorflow", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/wh/6v_w5j853g11zfhlb7f2rcvw0000gn/T//RtmpbGDyNc/downloaded_packages

tensorflow::install_tensorflow()

## Creating virtualenv for TensorFlow at  /Users/kierancampbell/.virtualenvs/r-tensorflow 
## Installing TensorFlow ...
## 
## Installation complete.

install.packages("devtools", repos = "http://cran.us.r-project.org") # If not already installed

## 
##   There is a binary version available but the source version is
##   later:
##          binary source needs_compilation
## devtools 1.13.6  2.0.1             FALSE

## installing the source package 'devtools'

devtools::install_github("Irrationone/cellassign")

## Skipping install of 'cellassign' from a github remote, the SHA1 (3ce59232) has not changed since last install.
##   Use `force = TRUE` to force installation

Read in 10X scRNA-seq data

We’re going to use some re-processed data from Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations (patient 1 specifically). If you git cloned this repository (http://github.com/kieranrcampbell/https://github.com/kieranrcampbell/r-workshop-march-2019) this can be found in the directory data/outs/filtered_gene_bc_matrices/GRCh38:

data_dir <- here("data", "outs", "filtered_gene_bc_matrices", "GRCh38")

print(data_dir)

## [1] "/Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38"

print(dir(data_dir))

## [1] "barcodes.tsv" "genes.tsv"    "matrix.mtx"

10X data typically has a barcodes file (indicating cell barcodes), a genes file (indicating a mapping between genes and indices) and the actual expression data as raw counts in matrix.mtx.

We can read these in to a SingleCellExperiment object using the read10xCounts function

sce <- read10xCounts(data_dir)

sce

## class: SingleCellExperiment 
## dim: 33694 1767 
## metadata(0):
## assays(1): counts
## rownames(33694): ENSG00000243485 ENSG00000237613 ...
##   ENSG00000277475 ENSG00000268674
## rowData names(2): ID Symbol
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## spikeNames(0):

A brief tour of the `SingleCellExperiment` package

Overall idea: think of your count matrix where the columns are cells and the rows are genes.

So when you see things like rowData think geneData, and colData think cellData!

Data dimensions

Getting data dimensions

nrow(sce) # Number of rows = genes

## [1] 33694

ncol(sce) # Number of columns = cells

## [1] 1767

print(sce)

## class: SingleCellExperiment 
## dim: 33694 1767 
## metadata(0):
## assays(1): counts
## rownames(33694): ENSG00000243485 ENSG00000237613 ...
##   ENSG00000277475 ENSG00000268674
## rowData names(2): ID Symbol
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## spikeNames(0):

Subsetting

sce[, c(1,3,5)] # Subset to cells 1, 3, 5

## class: SingleCellExperiment 
## dim: 33694 3 
## metadata(0):
## assays(1): counts
## rownames(33694): ENSG00000243485 ENSG00000237613 ...
##   ENSG00000277475 ENSG00000268674
## rowData names(2): ID Symbol
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## spikeNames(0):

sce[c(2,4,6), ] # Subset to genes 2, 4, 6

## class: SingleCellExperiment 
## dim: 3 1767 
## metadata(0):
## assays(1): counts
## rownames(3): ENSG00000237613 ENSG00000238009 ENSG00000239906
## rowData names(2): ID Symbol
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## spikeNames(0):

Feature and cell metadata

rownames = gene names

head(rownames(sce))

## [1] "ENSG00000243485" "ENSG00000237613" "ENSG00000186092" "ENSG00000238009"
## [5] "ENSG00000239945" "ENSG00000239906"

colnames = cell names = barcodes (sometimes)

head(colnames(sce))

## NULL

Column data (cell specific metadata)

head(colData(sce))

## DataFrame with 6 rows and 2 columns
##                                                                                       Sample
##                                                                                  <character>
## 1 /Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38
## 2 /Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38
## 3 /Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38
## 4 /Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38
## 5 /Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38
## 6 /Users/kierancampbell/ubc/r-workshop-march-2019/data/outs/filtered_gene_bc_matrices/GRCh38
##              Barcode
##          <character>
## 1 AAACCTGCAGTAAGCG-1
## 2 AAACCTGGTCCGTTAA-1
## 3 AAACCTGGTTCTGTTT-1
## 4 AAACCTGTCCTCATTA-1
## 5 AAACGGGAGTAGGCCA-1
## 6 AAACGGGAGTGTACTC-1

I might want to set the column names to the barcode:

colnames(sce) <- colData(sce)$Barcode

head(colnames(sce))

## [1] "AAACCTGCAGTAAGCG-1" "AAACCTGGTCCGTTAA-1" "AAACCTGGTTCTGTTT-1"
## [4] "AAACCTGTCCTCATTA-1" "AAACGGGAGTAGGCCA-1" "AAACGGGAGTGTACTC-1"

And I can subset on barcode:

sce[, "AAACCTGCAGTAAGCG-1"]

## class: SingleCellExperiment 
## dim: 33694 1 
## metadata(0):
## assays(1): counts
## rownames(33694): ENSG00000243485 ENSG00000237613 ...
##   ENSG00000277475 ENSG00000268674
## rowData names(2): ID Symbol
## colnames(1): AAACCTGCAGTAAGCG-1
## colData names(2): Sample Barcode
## reducedDimNames(0):
## spikeNames(0):

Row data (gene specific metadata)

head(rowData(sce))

## DataFrame with 6 rows and 2 columns
##                              ID        Symbol
##                     <character>   <character>
## ENSG00000243485 ENSG00000243485  RP11-34P13.3
## ENSG00000237613 ENSG00000237613       FAM138A
## ENSG00000186092 ENSG00000186092         OR4F5
## ENSG00000238009 ENSG00000238009  RP11-34P13.7
## ENSG00000239945 ENSG00000239945  RP11-34P13.8
## ENSG00000239906 ENSG00000239906 RP11-34P13.14

reducedDims is where our PCA,UMAP,tSNE representations will live - but we haven’t made them yet

reducedDims(sce)

## List of length 0
## names(0):

sizeFactors is where our cell size factors will live - but we haven’t calculated them yet

head(sizeFactors(sce))

## NULL

The ability to have multiple assays is one of the unique advantages of the SingleCellExperiment approach - I can carry around counts, logcounts, and any other weird data transformation I might like. Right now we only have raw counts, because that’s what we’ve read in:

names(assays(sce))

## [1] "counts"

assay(sce, "counts")
counts(sce)

I can make my own:

assay(sce, "my_super_strange_assay") <- sin(as.matrix(counts(sce)))

names(assays(sce))

## [1] "counts"                 "my_super_strange_assay"

class(assay(sce, "my_super_strange_assay"))

## [1] "matrix"

Note the beauty of SingleCellExperiments is that subsetting is consistent: if I want to subset only some cells and genes:

sce_subset <- sce[c(1,3,5), c(2,4,6,8)]

Then everything else is subset too!

print(dim(counts(sce_subset)))

## [1] 3 4

print(length(sizeFactors(sce_subset)))

## [1] 0

print(dim(rowData(sce_subset)))

## [1] 3 2

So the approach may seem like a lot of work up front to just carry around a count matrix and some metadata, but this sort of consistent subsetting makes it much harder (but still not impossible) to introduce bugs into your analysis.

Quality control of scRNA-seq data

Getting started

First we do some key things to our data:

Get some extra gene data, including the chromosome name
Compute the size factors
Compute normalized log counts

rowData(sce)$ensembl_gene_id <- rownames(sce)
  
sce <- getBMFeatureAnnos(sce, 
                         filters = "ensembl_gene_id",
                         attributes = c("ensembl_gene_id", "hgnc_symbol", "entrezgene",
                                        "start_position", "end_position", "chromosome_name"),
                         dataset = "hsapiens_gene_ensembl")

## Batch submitting query [>-------------------------] 3% eta: 1m Batch
## submitting query [>-------------------------] 4% eta: 1m Batch submitting
## query [=>------------------------] 6% eta: 50s Batch submitting
## query [=>------------------------] 7% eta: 45s Batch submitting
## query [=>------------------------] 9% eta: 42s Batch submitting
## query [==>-----------------------] 10% eta: 39s Batch submitting
## query [==>-----------------------] 12% eta: 37s Batch submitting
## query [==>-----------------------] 13% eta: 35s Batch submitting
## query [===>----------------------] 15% eta: 34s Batch submitting
## query [===>----------------------] 16% eta: 33s Batch submitting
## query [====>---------------------] 18% eta: 32s Batch submitting
## query [====>---------------------] 19% eta: 31s Batch submitting
## query [====>---------------------] 21% eta: 30s Batch submitting
## query [=====>--------------------] 22% eta: 33s Batch submitting
## query [=====>--------------------] 24% eta: 32s Batch submitting
## query [=====>--------------------] 25% eta: 31s Batch submitting
## query [======>-------------------] 26% eta: 31s Batch submitting
## query [======>-------------------] 28% eta: 30s Batch submitting
## query [=======>------------------] 29% eta: 30s Batch submitting
## query [=======>------------------] 31% eta: 29s Batch submitting
## query [=======>------------------] 32% eta: 28s Batch submitting
## query [========>-----------------] 34% eta: 27s Batch submitting
## query [========>-----------------] 35% eta: 27s Batch submitting
## query [=========>----------------] 37% eta: 26s Batch submitting
## query [=========>----------------] 38% eta: 25s Batch submitting
## query [=========>----------------] 40% eta: 24s Batch submitting
## query [==========>---------------] 41% eta: 23s Batch submitting
## query [==========>---------------] 43% eta: 23s Batch submitting
## query [==========>---------------] 44% eta: 22s Batch submitting
## query [===========>--------------] 46% eta: 22s Batch submitting
## query [===========>--------------] 47% eta: 21s Batch submitting
## query [============>-------------] 49% eta: 20s Batch submitting
## query [============>-------------] 50% eta: 20s Batch submitting
## query [============>-------------] 51% eta: 19s Batch submitting
## query [=============>------------] 53% eta: 18s Batch submitting
## query [=============>------------] 54% eta: 18s Batch submitting
## query [==============>-----------] 56% eta: 17s Batch submitting
## query [==============>-----------] 57% eta: 16s Batch submitting
## query [==============>-----------] 59% eta: 16s Batch submitting
## query [===============>----------] 60% eta: 15s Batch submitting
## query [===============>----------] 62% eta: 15s Batch submitting
## query [===============>----------] 63% eta: 14s Batch submitting
## query [================>---------] 65% eta: 13s Batch submitting
## query [================>---------] 66% eta: 13s Batch submitting
## query [=================>--------] 68% eta: 12s Batch submitting
## query [=================>--------] 69% eta: 12s Batch submitting
## query [=================>--------] 71% eta: 11s Batch submitting
## query [==================>-------] 72% eta: 10s Batch submitting
## query [==================>-------] 74% eta: 10s Batch submitting
## query [===================>------] 75% eta: 9s Batch submitting
## query [===================>------] 76% eta: 9s Batch submitting
## query [===================>------] 78% eta: 8s Batch submitting
## query [====================>-----] 79% eta: 8s Batch submitting
## query [====================>-----] 81% eta: 7s Batch submitting
## query [====================>-----] 82% eta: 7s Batch submitting
## query [=====================>----] 84% eta: 6s Batch submitting
## query [=====================>----] 85% eta: 5s Batch submitting
## query [======================>---] 87% eta: 5s Batch submitting
## query [======================>---] 88% eta: 4s Batch submitting
## query [======================>---] 90% eta: 4s Batch submitting
## query [=======================>--] 91% eta: 3s Batch submitting
## query [=======================>--] 93% eta: 3s Batch submitting
## query [=======================>--] 94% eta: 2s Batch submitting
## query [========================>-] 96% eta: 2s Batch submitting
## query [========================>-] 97% eta: 1s Batch submitting
## query [=========================>] 99% eta: 1s Batch submitting query
## [==========================] 100% eta: 0s

# Calculate size factors
sce <- scran::computeSumFactors(sce, BPPARAM = MulticoreParam(10))

## Warning in FUN(...): encountered negative size factor estimates

# Compute log normal expression values
sce <- normalize(sce)

names(assays(sce))

## [1] "counts"                 "my_super_strange_assay"
## [3] "logcounts"

head(sizeFactors(sce))

## [1] 0.033215558 0.071659554 0.009291482 3.408452925 2.399221881 0.106118551

sce <- runPCA(sce)
sce <- runUMAP(sce)

reducedDims(sce)

## List of length 2
## names(2): PCA UMAP

Just give me my PCA!

head(reducedDims(sce)[['PCA']])

##                          PC1       PC2
## AAACCTGCAGTAAGCG-1 -6.822310 -1.921039
## AAACCTGGTCCGTTAA-1 -3.898654 -2.770704
## AAACCTGGTTCTGTTT-1 -7.841220 -1.896710
## AAACCTGTCCTCATTA-1  7.392807  2.167728
## AAACGGGAGTAGGCCA-1 12.452015 -4.042698
## AAACGGGAGTGTACTC-1 -4.069171 -1.635552

I like to add the symbols to the rownames:

rownames(sce) <- paste0(rowData(sce)$Symbol, "_", rownames(sce))
head(rownames(sce))

## [1] "RP11-34P13.3_ENSG00000243485"  "FAM138A_ENSG00000237613"      
## [3] "OR4F5_ENSG00000186092"         "RP11-34P13.7_ENSG00000238009" 
## [5] "RP11-34P13.8_ENSG00000239945"  "RP11-34P13.14_ENSG00000239906"

We next need to work out which genes are mitochondrial and ribosomal as these work well for QC:

# Get Mitochondrial genes for QC:
mt_genes <- which(rowData(sce)$chromosome_name == "MT")
ribo_genes <- grepl("^RP[LS]", rowData(sce)$Symbol)
feature_ctrls <- list(mito = rownames(sce)[mt_genes],
                      ribo = rownames(sce)[ribo_genes])

lapply(feature_ctrls, head)

## $mito
## [1] "MT-ND1_ENSG00000198888"  "MT-ND2_ENSG00000198763" 
## [3] "MT-CO1_ENSG00000198804"  "MT-CO2_ENSG00000198712" 
## [5] "MT-ATP8_ENSG00000228253" "MT-ATP6_ENSG00000198899"
## 
## $ribo
## [1] "RPL22_ENSG00000116251"   "RPL11_ENSG00000142676"  
## [3] "RPS6KA1_ENSG00000117676" "RPS8_ENSG00000142937"   
## [5] "RPL5_ENSG00000122406"    "RPS27_ENSG00000177954"

And call the calcualteQCMetrics function in scater:

sce <- calculateQCMetrics(sce, feature_controls = feature_ctrls)

datatable(head(as.data.frame(colData(sce))))

What to look for

My personal favourite plot to QC scRNA-seq data:

plotColData(sce, x = "total_features_by_counts", y = "pct_counts_mito")

Typically retain cells that have < 10% mitochondrial transcripts and > 1000 features, but this is dataset dependent - for example, tumour cells typically have higher metabolic burden, leading to higher % mitochondrial (we typically use 20% as a filter then).

plotPCA(sce, colour_by = "pct_counts_mito")

## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

For this going to use a simple threshold of 10% mitochondrial. Importantly, we re-compute the QC metrics, size factors and normalization

mito_thresh <- 10

sce_qc <- sce[, sce$pct_counts_mito < mito_thresh]

sce_qc <- scran::computeSumFactors(sce_qc, BPPARAM = MulticoreParam(10))

sce_qc <- normalize(sce_qc)

sce_qc <- calculateQCMetrics(sce_qc, feature_controls = feature_ctrls)

sce_qc <- runPCA(sce_qc)
sce_qc <- runUMAP(sce_qc)

# plotPCA(sce_qc, colour_by = "total_features_by_counts")
plotUMAP(sce_qc, colour_by = "pct_counts_mito")

## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

plotScater(sce)

plotScater(sce_qc)

plotHighestExprs(sce_qc)

Some useful plotting functions in Scater

Reduced dimension plots

I can call

plotPCA(sce, colour_by = "x")
plotUMAP(sce, colour_by = "x")
plotTSNE(sce, colour_by = "x")

where x is:

Any column of colData(sce) (= the cell specific data) to colour by metadata
Any gene name in rownames(sce) to colour by expression

plotPCA(sce, colour_by = "SAA1_ENSG00000173432")

## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

Additional plots

plotColData(sce_qc, x = "total_counts", y = "pct_counts_mito")

plotExpression(sce_qc, 
               x = "total_counts", 
               features = "GAPDH_ENSG00000111640")

Using CellAssign to assign cells to known types

CellAssign is our new method to assign cells to known cell types. It relies on assuming cell types over-express their own markers, e.g. an epithelial tumour cell should overexpress EPCAM relative to other cell types.

library(cellassign)

In this example, the data we have just performed QC and exploratory analysis of is liver cells, that we expect to contain a certain number of cell types. To begin, we specify a list, where each item corresponds to a set of marker genes for a given cell type:

liver_marker_list <- list(
        Hepatocytes = c("ALB", "HAMP", "ARG1", "PCK1", "AFP", "BCHE"), 
        LSECs = c("CALCRL", "FCGR2B", "VWF"),
        Cholangiocytes = c("KRT19", "EPCAM", "CLDN4", "CLDN10", "SOX9", "MMP7", "CXCL1", "CFTR", "TFF2", "KRT7", "CD24"), 
        `Hepatic Stellate Cells` = c("ACTA2", "COL1A1", "COL1A2", "COL3A1", "DCN", "MYL9"),
        Macrophages = c("CD68", "MARCO", "FCGR3A", "LYZ", "PTPRC"),
        `ab T cells` = c("CD2", "CD3D", "TRAC", "IL32", "CD3E", "PTPRC"),
        `gd T cells` = c("NKG7", "FCGR3A", "HOPX", "GNLY", "KLRF1", "CMC1", "CCL3", "PTPRC"),
        `NK cells` = c("GZMK", "KLRF1", "CCL3", "CMC1", "NKG7", "PTPRC"),
        `Plasma cells` = c("CD27", "IGHG1", "CD79A", "IGHG2", "PTPRC", "IGKC"),
        `Mature B cells` = c("MS4A1", "LTB", "CD52", "IGHD", "CD79A", "PTPRC", "IGKC"),
        `Erythroid cells` = c("HBB", "SLC25A37", "CA1", "ALAS2")
)

To begin, we use cellassign’s marker_list_to_mat function to convert this into a (binary) cell type by marker matrix:

mgi <- marker_list_to_mat(liver_marker_list, include_other = FALSE)

pheatmap(mgi)

We then make sure all of these markers exist in our SingleCellExperiment:

marker_in_sce <- match(rownames(mgi), rowData(sce_qc)$Symbol)
stopifnot(all(!is.na(marker_in_sce)))

And finally we subset sce to just the marker genes:

sce_marker <- sce_qc[marker_in_sce, ]

stopifnot(all.equal(rownames(mgi), rowData(sce_marker)$Symbol))

We then call cellassign passing in the SingleCellExperiment, marker info, the size factors we’ve calculated, as well as various other options:

counts(sce_marker) <- as.matrix(counts(sce_marker))

print(dim(sce_marker))

## [1]  56 331

print(dim(mgi))

## [1] 56 11

fit <- cellassign(
  exprs_obj = sce_marker,
  marker_gene_info = mgi,
  s = sizeFactors(sce_qc),
  shrinkage = TRUE,
  max_iter_adam = 50,
  min_delta = 2,
  verbose = TRUE
)

## 50   L old: -49198.13364057; L new: -12797.9958710856; Difference (%): 0.739868264829217
## 50   L old: -12797.9958710856; L new: -12240.3944291952; Difference (%): 0.0435694344260704
## 50   L old: -12240.3944291952; L new: -11957.9641964594; Difference (%): 0.0230736218811856
## 50   L old: -11957.9641964594; L new: -11788.814823868; Difference (%): 0.014145331915394
## 50   L old: -11788.814823868; L new: -11721.4394635294; Difference (%): 0.00571519371075677
## 50   L old: -11721.4394635294; L new: -11697.3827048211; Difference (%): 0.0020523723884898
## 50   L old: -11697.3827048211; L new: -11687.9651451312; Difference (%): 0.000805099732778434
## 50   L old: -11687.9651451312; L new: -11681.5548969882; Difference (%): 0.000548448601906885
## 50   L old: -11681.5548969882; L new: -11676.25864483; Difference (%): 0.000453385889548398
## 50   L old: -11676.25864483; L new: -11671.4496068477; Difference (%): 0.000411864633057001
## 50   L old: -11671.4496068477; L new: -11666.892848577; Difference (%): 0.000390419221623123
## 50   L old: -11666.892848577; L new: -11662.6015473256; Difference (%): 0.000367818690637439
## 50   L old: -11662.6015473256; L new: -11658.0910340668; Difference (%): 0.000386750180952966
## 50   L old: -11658.0910340668; L new: -11653.7533749585; Difference (%): 0.000372072845859803
## 50   L old: -11653.7533749585; L new: -11649.4509262191; Difference (%): 0.000369189959747849
## 50   L old: -11649.4509262191; L new: -11645.0699987986; Difference (%): 0.00037606299629168
## 50   L old: -11645.0699987986; L new: -11640.7297622289; Difference (%): 0.000372710217296541
## 50   L old: -11640.7297622289; L new: -11636.2901717828; Difference (%): 0.000381384203296699
## 50   L old: -11636.2901717828; L new: -11631.8513559524; Difference (%): 0.000381463143740781
## 50   L old: -11631.8513559524; L new: -11627.3750041921; Difference (%): 0.000384835708719969

fit

## A cellassign fit for 331 cells, 56 genes, 11 cell types with 0 covariates
##            To access MLE cell types, call x$cell_type
##            To access MLE parameter estimates, call x$mle_params

Add the cell types to the sce:

sce_qc$cell_type <- fit$cell_type

plotUMAP(sce_qc, colour_by = "cell_type")

## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.

acol <- data.frame(`cellassign cell type` = sce_qc$cell_type)
rownames(acol) <- colnames(sce_qc)

pheatmap(as.matrix(logcounts(sce_marker)),
         annotation_col = acol)

pheatmap(fit$mle_params$gamma)

Using R & Bioconductor to assign cell types to single-cell RNA-seq data

Kieran R Campbell (kicampbell@bccrc.ca)

21 March, 2019

Overview of this tutorial

Resources

Required packages

Installation of cellassign

Read in 10X scRNA-seq data

A brief tour of the `SingleCellExperiment` package

Data dimensions

Subsetting

Feature and cell metadata

Quality control of scRNA-seq data

Getting started

What to look for

Some useful plotting functions in Scater

Reduced dimension plots

Additional plots

Using CellAssign to assign cells to known types

Using R & Bioconductor to assign cell types to single-cell RNA-seq data

Kieran R Campbell (kicampbell@bccrc.ca)

21 March, 2019

Overview of this tutorial

Resources

Required packages

Installation of cellassign

Read in 10X scRNA-seq data

A brief tour of the SingleCellExperiment package

Data dimensions

Subsetting

Feature and cell metadata

Quality control of scRNA-seq data

Getting started

What to look for

Some useful plotting functions in Scater

Reduced dimension plots

Additional plots

Using CellAssign to assign cells to known types

A brief tour of the `SingleCellExperiment` package