Differential expression analysis allows you to determine which genes are expressed at different levels between experimental groups. Differentially expressed genes can then be used in pathway analysis to offer insight into the biological processes affected by the condition of interest.
Using TrailmakerTM, you can find the differentially expressed genes between two groups of cells, where each group must have at least 3 cells. You can compare cell sets within a sample/group, which allows you to find marker genes that distinguish clusters from one another. Alternatively, you can compare a selected cell set between samples or groups to find genes that are differentially expressed in a particular cell type of interest between two experimental groups. This article provides guidance on performing differential expression analysis between samples or groups.
Performing 1 sample vs 1 sample differential expression comparisons in Trailmaker
When you perform a differential expression comparison between samples or groups, Trailmaker uses a pseudobulk limma-voom workflow, a well established method for bulk RNA-seq analysis adapted for scRNA-seq. This method allows users to identify differentially expressed genes by grouping cells based on conditions (e.g., disease vs. control) and treating those groups as independent samples.
However, when comparing 1 sample against 1 other sample in Trailmaker, a warning message is shown:
The differential expression comparison can still be made and the results will return the LogFC difference but not the adjusted p-value. Let’s explore why this happens and why increasing the number of samples in your differential expression comparison is important.
Why adjusted p-values are not calculated in 1-vs-1 sample comparisons in Trailmaker
The reasoning behind not calculating a p-value in a 1-vs-1 sample comparison is that it is not statistically sound to do so. In these cases, the differential expression calculation can still be performed and the LogFC will be returned in the differential expression results, which will give a sense of the magnitude of the differences between your comparison samples.
In differential expression analysis, p-values are designed to help us assess the significance of observed differences. They tell us how likely it is that the difference in gene expression between two groups occurred by chance. To reliably calculate this, we need multiple replicates within each group to estimate the natural variation in the data.
When only two samples are compared, the degrees of freedom required to estimate variation in each group are extremely limited. In other words, there is no room to understand whether the differences are due to biological variability or just random noise. This makes p-value calculations unreliable, as there is no way to discern whether the difference is real or just coincidental.
Moreover, without enough replicates, the statistical power of the test is severely compromised. Statistical power is the ability of a test to detect a true effect if one exists. In a 1-vs-1 comparison, there is a much higher chance of either missing a true biological signal (false negatives) or mistakenly identifying noise as a real signal (false positives).
This unreliability is why Trailmaker doesn’t calculate p-values for 1-vs-1 comparisons.
Should I do anything about the warning message?
The way to address this warning and lack of adjusted p-value is to increase the number of samples in your differential expression comparison e.g. to instead compare groups with multiple samples so that there are more than 3 samples in total (such as 1 vs 2, or 2 vs 2).