How to reproduce a published trajectory analysis plot with Trailmaker – Support Suite - Parse Biosciences

Trailmaker™ Plots and Tables module offers the ability to perform trajectory analysis on your scRNA-seq dataset. But what exactly is a trajectory analysis?

It’s all about determining a pattern of a dynamic biological process experienced by cells or, in other words, a "trajectory" of gene expression changes. Then, the cells are arranged according to their progression through that process, which means that they are placed at their proper position in the trajectory.

But how can we quantitatively measure this progression? Well, we can do it using the so-called pseudotime and the Monocle algorithm [1]. Pseudotime is defined as “an abstract unit of progress: simply the distance between a cell and the start of the trajectory, measured along the shortest path.”

To show how Trailmaker can be used to perform a trajectory analysis in a very easy and reliable way, we reproduced the findings from a paper that was recently published in Science by Shi et al. [2] using Trailmaker, by Parse Biosciences.

“Mouse and human share conserved transcriptional programs for interneuron development” by Shi et al. (2021)

The authors investigated the transcriptional trajectories of cells in the developing human ganglionic eminences to assess conservation in the genetic programs controlling the development of GABAergic neurons in mice and humans. There are a lot of interesting findings in this paper, but for the purpose of reproducing a trajectory analysis plot using the Trailmaker tool, we will focus only on a few of these findings.

In the first part of the paper, the authors describe how they identified 10 different cell types within 56,412 cells coming from 9 samples of dissected ganglionic eminences across gestational weeks (GW) 9-18. This is the UMAP from figure 1B of the paper that represents the cell types that they identified:

Figure 1B.png

The authors annotated these cells manually, using the expression of some known marker genes, as shown in figure 1C from the paper:

Figure 1C.png

Next, they decided to focus on the ganglionic eminences (GE) progenitors cluster with the goal to investigate 2 types of progenitors inside this cluster: radial glial cells (RGCs) with neural stem characteristics and intermediate progenitor cells (IPCs), which derived from RGCs and are committed towards the neuronal lineage. Here you can see a new UMAP from figure 2A that includes only RGCs and IPCs identified within the GE progenitors:

Figure 2A.png

Once again, they exploited a list of established markers to identify these two cell types of interest, as seen in figure 2A:

Figure 2A-2.JPG

And here comes the interesting part! They performed a trajectory analysis using Monocle3 to visualize the developmental trajectories of RGCs and IPCs using pseudotime alignment. This nice plot from figure 2A shows a trajectory that starts from the less differentiated RGCs and follows a path that ends at the more differentiated IPCs:

Figure 2A-3.png

Sure, this is not the most exciting finding in this study, they just confirmed that there are some cells in this experiment that are differentiating from a more stem cell state to an intermediate progenitor state. But this seems like a clean example to exploit for our purpose. Remember: here we would like to learn how to perform trajectory analysis using Trailmaker.

So, let’s jump to the fun part. What did we do to reproduce these findings?

Reproducing the analysis using Trailmaker

Demultiplexing and uploading the data to Trailmaker

First, we downloaded the data from here [3]. Specifically, we downloaded the file named “GSE135827_GE_mat_raw_count_with_week_info.txt.gz”, which is a raw count matrix including all the samples. This means that, as a first step, we needed to demultiplex the data to obtain matrices corresponding to each sample, and then convert the raw counts to a format that is accepted by Trailmaker. We did this by following an adapted version of this tutorial.

To skip this step, demultiplexed data can instead be downloaded from here.

Note that the downloaded dataset included an extra sample containing only 46 cells. Considering that this sample was not reported in the paper, and it included a very low number of cells, we decided to exclude this sample from further analysis.

Processing the data

After uploading our samples to Trailmaker (https://app.trailmaker.parsebiosciences.com/) Insights module, processing them was as easy as clicking the “Process project” button. After a few minutes, we had our analysis ready to be explored.

Before proceeding further, we changed the integration method in step 6 of the Data Processing tab from the default Harmony to the one used in the paper: fastMNN. We also modified the cluster resolution in step 7 of Data Processing. This value was not reported in the paper, but, luckily, trying different values in Trailmaker is very easy: it’s just a matter of moving a slider. In the end, we decided to use a resolution of 0.1 because it produces a number of clusters more similar to what is reported in the paper.

Annotating cell types

Having our data ready to be explored, we aimed to first identify the same cell types that the authors identified in the published paper. To do this, we looked at the expression of some key genes reported in the paper. Specifically, we used MKI67 to identify GE progenitors, LHX6 for MGE, NR2F1 for CGE, MEIS2 for LGE, TCF7L2 for thalamic neurons, and NEUROD2 for pallial cells.

Plotting the expression of these genes on the UMAP produced by Trailmaker was very straightforward. We searched for the genes in the gene list present in the Data Exploration module, and then clicked on the little eye icon next to the gene name to view it on the UMAP embedding plot. Then, to export these plots, we did the same using the Continuous Embedding module in Plots and Tables.

Here are the results:

UMAP of LHX6 gene expression UMAP of MEIS2 expression

UMAP of MKI67 gene expression UMAP of NEUROD2 gene expression

UMAP of NR2F1 gene expression UMAP of TCF7L2 gene expression

Using the expression of these marker genes, we identified the same 6 main clusters reported in the paper. We didn’t annotate 3 minor clusters (microglia, OPCs, and endothelial cells) because we didn’t find strong marker genes reported in the paper for those clusters. We didn’t investigate further because they’re very small and not relevant for the following trajectory analysis.

Here is our annotated UMAP produced with Trailmaker (A) compared to the UMAP from the paper (B):

The shapes are slightly different, and that’s perfectly acceptable considering that UMAP is a stochastic algorithm, which means it makes use of randomness. However, we can observe that the relative positions of the clusters are concordant between the 2 plots. There is a big grey cluster (GE progenitors) which is connected to 3 main clusters (MGE, CGE, and LGE), then a green cluster (thalamic neurons) which is partially connected to them, and finally a separate red cluster which includes pallial cells.

The table below shows the number of cells in each cluster. This reveals how numbers reported by Shi et al. and numbers found in Trailmaker are comparable:

	Shi et al.	Trailmaker
GE progenitors	12,118	11,197
CGE	7,196	9,256
LGE	11,866	8,962
MGE	15,472	13,551
Thalamic neurons	4,430	5,133
Pallial cells	4,881	4,389

Extracting GE progenitors

Being satisfied that we have effectively reproduced the annotated UMAP in Trailmaker, we moved on to focus on the GE progenitors cluster, as the authors did in the paper.

To do so, we select the GE progenitors cluster and click on the "Create subset" button in the Cell Sets and Metadata panel within Data Exploration. This will create a new project with only the subset cluster.

Create subset.png

Create new project.png

Identifying RGCs and IPCs

To reproduce the findings from the paper, we wanted to identify IPCs and RGCs within the UMAP of this new subsetted project in Trailmaker. For this, we chose to focus on 2 genes from the paper that identify these cell types in a clearer way: DLX2 for IPCs, and VIM for RGCs). This is what we found:

IPCs and RGCs.JPG

From these plots it seems like IPCs are in the upper part of the UMAP, while RGCs are in the lower part. In Trailmaker, you can easily create custom clusters based on gene expression. Exploiting this feature, we created 2 custom clusters according to the expression of these genes, and this is what we obtained:

IPCs and RGCs_2.JPG

Performing trajectory analysis

And here we are at the final step. We aimed to reproduce the trajectory analysis plot from the paper. Based on the findings reported on the paper, we would expect to see a trajectory that starts from the less differentiated RGCs and follows a path that ends at the more differentiated IPCs.

To do this, we used the Trajectory Analysis plot in the Plots and Tables module of Trailmaker. All we had to do were two things:

Select the root nodes (where we want the trajectory to start): we chose the nodes corresponding to the RGCs;
Click on the “Calculate” button.

And here it is, our trajectory plot coloured by pseudotime (the root nodes are highlighted in red):

Trajectory plot.png

Even though the shape of the UMAP looks different from the plot reported in the paper (which is to be expected), we can see the pseudotime as a color gradient that goes from RGCs to IPCs.

This is exactly what we expected to see!

Now you know how to perform a nice trajectory analysis with Trailmaker. You can go and try it yourself!

References

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4122333/

[2] https://pubmed.ncbi.nlm.nih.gov/34882453/

[3] https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135827