The Parse Whole Transcriptome Kit observes very low levels of ambient RNA in samples compared to existing single cell solutions. Please read additional information below to understand why this matters and how we are able to prevent detection of ambient RNA in our workflow.
What is ambient RNA?
Ambient RNA refers to RNA molecules that leave one cell or nuclei and become incorrectly barcoded to represented a different cell's molecule.
The most likely cause for ambient RNA can be attributed to sample preparation. During sample preparation, a subset of your cells will naturally break open and release their RNA molecules into the suspension. For cell lines, the number of cells that burst are typically much lower. However, for tissues that are being dissociated into single cells or single nuclei, this subset of cells that naturally burst can be quite high depending on the dissociation method that you have selected to arrive to your single cell or nuclei suspension.
Can ambient RNA affect my downstream data analysis?
Yes, ambient RNA can impact the ability for cells to accurately cluster and may prevent you from answering your biological question. If we were to compare two datasets with identical numbers of gene and unique molecules per cell, a dataset with a higher amount of ambient RNA will experience difficulties parsing subpopulations of cells where separation is made based on a small subset of genes. This is the case in many biologicals applications - to name a few, dissecting different T cell populations in PBMC samples, or defining what transcription factors drive development in subsets of neurons in the brain are great examples.
It is worth pointing out that ambient RNA is significantly worse than simply not seeing a particular gene (also known as "dropout"), as ambient RNA will actually assign a gene incorrectly to a cell. For instance, if you are looking at a CD8+ T cell in a dataset that is subject to high amount of ambient RNA, you might see RNA transcripts corresponding to both CD8 and CD4 in the cell, which would lead to confusion in clustering and cell type assignment.
Why does Parse have less ambient RNA than other technologies?
As mentioned earlier, ambient RNA's presence is often a result of cell or nuclei dissociation when cells break open or become "leaky", thereby expelling these molecules into the single cell or nuclei suspension. In other scRNA-seq technologies, these events may lead to high amounts of ambient RNA because these free floating molecules in solution are encapsulated alongside intact cells in the microwells or droplets and are subsequently barcoded with the same cell barcode.
Parse's workflow overcomes these challenges by two mechanisms. First, by using a combination of barcodes to assign cell identity to every transcript, free floating ambient RNA would have to travel the exact same path as a specific cell through 4 rounds of split-pool barcoding in order to be incorrectly assigned to a cell - these chances are 1 in 3,538,944. As a consequence, most ambient RNA molecules will receive a barcode combination that is unrelated to any barcode combination that a cell received. To make Illumina sequencing more efficient, the workflow also involves a wash step that physically removes free floating molecules from barcoded cells before they are lysed and processed for sequencing, further reducing the chances that any ambient RNA will be incorrectly grouped with a cell in downstream analysis.