Trailmaker™, our new user-friendly data analysis platform, guides you through your Evercode Whole Transcriptome data analysis journey from FASTQ files to publication-ready figures. The platform enables FASTQ file processing in the Pipeline module and downstream analysis and visualization in the Insights module, enabling scientists without coding or bioinformatics expertise to run the end-to-end data analysis workflow.
Watch this 13-minute video to learn more about the key features of Trailmaker and how to get started with your analysis. Full details of Trailmaker features are provided in our user guide.
Trailmaker is available for free to all Parse customers. Access Trailmaker here to start your analysis today: https://app.trailmaker.parsebiosciences.com/
Video Transcript
[SLIDE 1]
At Parse Biosciences, we are really excited to introduce Trailmaker, our NEW user-friendly platform that guides you through your Evercode Whole Transcriptome data analysis journey.
[SLIDE 2]
Trailmaker’s user-friendly interface allows you to analyze and explore your dataset without writing a single line of code. Our end-to-end workflow takes you from FASTQ files to publication-ready figures in just a few simple clicks.
[SLIDE 3]
The flexibility of multiple data entry and exit points together with data sharing capabilities lead you and your collaborators to biological insight faster. And it’s available for free to all Parse customers!
[SLIDE 4]
To get started, create an account by visiting: https://app.trailmaker.parsebiosciences.com/
The sign up process takes just a few minutes, and you can immediately begin uploading your data or exploring one of the example datasets.
For more information and support, visit our website or email us at support@parsebiosciences.com
[TRAILMAKER DEMO PIPELINE MODULE]
In this demo, we’re going to take a walk through the end-to-end journey from FASTQ files to figures using Trailmaker.
When you log into Trailmaker, you’ll see on the left side navigation controls that there are two main modules in the platform. For FASTQ file processing, you should start with the Pipeline module. If you already have count matrices that are output from the Parse pipeline or other FASTQ processing pipelines, you should navigate to the Insights module for downstream analysis and visualization.
Let’s start by processing our FASTQ files using the Pipeline module. For this, you’ll need your sample loading table, FASTQ files and knowledge of which species you used in your experiment.
Our wizard guides you through the steps required to create and name a new pipeline run, and to specify the kit type, chemistry version and number of sublibraries that you will be processing. Next, you upload your sample loading table by dragging and dropping it here. You can check the details of your sample loading table by clicking ‘view sample names’. In the next step, select the reference genome from the drop-down menu. If the genome you require is not available, please contact us at support@parsebiosciences.com. In the final step of the wizard, the FASTQ files are uploaded. You can choose to upload your files by dragging and dropping them into the box, or you can use the command line upload function.
For command line upload, you should first download the Parse-upload python script, then generate your token and copy the command shown on screen. After pasting the script to your console, you need to specify the path to your FASTQ files and to the downloaded python script. When you run the command, you will need to confirm that the files are correct before upload will begin. Upload progress is shown both on the console and within Trailmaker.
Note that uploading FASTQ files from a WT Mini or WT standard kit can take several hours, and for WT Mega kits can take a day or more. Don’t worry if your internet connection is interrupted during the upload process - the files will continue to upload when your connection resumes.
Once FASTQ file upload is complete and all other information has been input, you can begin to process your data by clicking “Run the pipeline”. Your pipeline run is launched in the first few minutes. Then, the progress of your pipeline run is shown, together with the current logs. The duration of your pipeline run depends on the number of cells in your experiment as well as the sequencing depth. As rough estimates, a WT Mini kit pipeline run might take 6-8 hours, with a WT kit pipeline taking 12-24 hours, and a WT Mega kit taking a day or more.
Whilst your pipeline is running, you can navigate away from Trailmaker and shut down your computer - the pipeline will continue to run. You can choose to receive an email notification when your run is finished.
Successful pipeline runs will display the reports in the Pipeline Outputs tab for you to explore. The all_samples report is shown by default, and you can choose to view individual sample reports using the dropdown menu.
- In the interactive barcode rank plot, you’re looking for a clearly defined ‘knee’ with the threshold in the steepest part of the drop. This threshold is dynamically set and is likely to be different for different samples.
- The QC metrics include the estimated number of cells as well as the median number of genes and transcripts per cell. These metrics can be compared across samples and in the context of published data or your previous experimental results.
The plate heatmaps underneath the plots display transcripts and cells per well and are useful for catching pipetting and plate loading errors. Ideally, you'd like to see a homogenous distribution across the plates with no streaks or outliers.
Pipeline outputs are available to download.
- The count matrices can be found in the combined output option, which can be useful if you choose to perform downstream analysis outside of Trailmaker.
- The Reports option includes the Html reports, log files, and additional QC metrics in a csv file.
- Note that downloading all files might take a long time for large datasets.
Failed pipeline runs give the option to download the logs for troubleshooting purposes. Contact us for support if you encounter a failed pipeline run.
The outputs of successful pipeline runs are automatically sent to the Insights module for downstream analysis and visualization. Simply click the “Go to Insights downstream analysis” button to navigate to the Data Exploration tab of the Insights module where you can begin to deep dive into your dataset.
[TRAILMAKER DEMO INSIGHTS MODULE]
We’ll come back to Data Exploration later, but let’s take a step back to look at how you would upload data to the Insights module if you already have output files from the Parse pipeline or another FASTQ processing pipeline, or if you would like to explore one of the example datasets from our datasets repository.
From the Insights module “Create new project” button, you can navigate to our datasets repository where we have a range of published datasets available, including some examples from the technical reports on the Parse Biosciences website. Simply click “Explore” to start exploring these example datasets.
Alternatively, you can create a new Project to upload your own data. To upload the count matrices that are output from the Parse pipeline, select the Parse Evercode WT option from the drop-down menu. When selecting this option, we recommend uploading the unfiltered matrices from the “DGE_unfiltered” folder. Alternatively, you can choose to upload count matrices from other technologies including 10x Chromium and BD Rhapsody, or to upload a Seurat object. Full instructions for each data type are provided on screen.
For multi-sample experiments, we recommend adding metadata in order to assign samples to groups for downstream plotting or computing differentially expressed genes. You can do this manually or by bulk upload of a TSV file.
From this page you can also share Projects with colleagues, download your processed data files, and create clones of your analysis.
When your data files and metadata are uploaded, click “Process project” to continue. Projects that were generated from a successful Pipeline run in the Trailmaker Pipeline module are processed automatically.
[TRAILMAKER DEMO INSIGHTS MODULE - DATA PROCESSING]
The Insights Data Processing module is designed to remove the ‘black box’ around data analysis and make data clean-up transparent and accessible.
The 7-step data processing pipeline automatically applies default filtering parameters that are calculated based on your individual sample data and visualization settings that are gold standard in the field.
Information tooltips are available at each step to explain the methods used and to provide relevant references.
In the first 5 filtering steps, the samples are listed vertically and the statistics table underneath each sample plot shows the number and proportion of cells that are being filtered out by each sample at each step.
The first two steps remove background noise from the dataset. Dead and dying cells are filtered out in step 3 based on their mitochondrial content, with poor quality cells removed in step 4, and doublets and multiplets removed in step 5. In all steps, the automatic settings can be changed using the settings controls. In this case, you will be prompted to re-run this 7-step data processing pipeline.
In step 6, samples are integrated to remove batch effects. Trailmaker applies the Harmony method by default, though you can choose to select another option. Integration can be visually checked using the embedding and frequency plots.
Finally in step 7, you can select to view a UMAP or t-SNE embedding plot, and show Leiden or Louvain clustering. Clustering resolution can be easily changed to a level that makes biological sense for your dataset using the slider.
For visualizing the quality of your samples, various quality control metrics can be plotted on the violin plot which can help to identify outliers.
Now you’re ready to deep dive into your dataset using the Data Exploration tab!
[TRAILMAKER DEMO INSIGHTS MODULE - DATA EXPLORATION]
The Data Exploration tab allows you to identify and annotate the cell types within your dataset and query your biological questions using gene expression and differential expression analysis.
On the top left you have the UMAP or t-SNE embedding plot showing clusters that you configured in the last step of Data Processing. The embedding plot can be colored by sample or metadata using the middle tile.
On the right side, you have a full list of all genes in your dataset, ordered by dispersion, which is a measure of variability in expression across the cells. Use the eye icon to color the embedding with a gene. Clicking a gene name takes you to GeneCards where more information about expression and function is available. It’s easy to search for a gene of interest in the gene list.
Underneath, you have an interactive heatmap showing marker genes for your clusters. For manual cluster identification, you can zoom into the heatmap or use the differential expression tool to generate a full list of marker genes for each cluster.
Custom cell sets can be created using the lasso tool on the embedding. Alternatively, custom cell sets can be generated based on the expression of one or more genes in the gene list using the ‘CellSet’ button.
All cell sets can easily be re-named and re-coloured.
Automatic cluster annotation can be performed using the scType method, by selecting the relevant tissue type and species.
Once you have identified your cell type of interest, differential expression can be calculated across samples or metadata groups. In this example, we’re comparing classical monocytes between males and females.
The resulting list of differentially expressed genes are ordered by descending log fold change. The genes can be viewed on the heatmap by clicking ‘heatmap’ and ‘overwrite’. The heatmap can be customized and re-ordered using the settings menu in order to present the differences in an effective manner.
The differentially expressed gene list can also be filtered, for example to show only the most significantly different genes, and then sent for pathway analysis. This feature uses external pathway analysis databases PantherDB and Enrichr.
If you would like to subset a particular cell set of interest to a new analysis for more granular exploration, you can do this by checking the box next to the relevant cell set and clicking the ‘subset to a new project’ button.
After fully exploring gene expression and differential expression in your cells and groups of interest, you will find interesting insight that you can plot and investigate further in the Plots and Tables tab of the Insights module.
[TRAILMAKER DEMO INSIGHTS MODULE - PLOTS AND TABLES]
A range of plot types are available in the Plots and Tables tab to visualize your data in different ways. Each plot can be fully customized to suit your design preferences, and exported as high resolution images for publication.
Let’s take a look at the dot plot as an example.
In the dot plot, you can paste in or search for custom genes you would like to plot or use the marker gene option. The data displayed in the plot can be changed using the ‘select data’ dropdown menus, such as to show cell sets, samples or metadata. All aspects of the plot view can be customized, such as the dimensions and colors. Once you’re happy with the plot view, it can be exported as a high resolution image for publication. The data can also be exported as a csv file.
Trailmaker has a range of plots and tables available for customization and download.
- The frequency plot can be plotted as proportion or count to visualize cluster composition across samples or metadata groups.
- Gene expression can be plotted using embedding plots, heatmaps and violin plots. Customization options for the violin plot include showing or hiding the cell markers, and legend placement.
- The volcano plot can showcase differential expression between cell sets or groups of interest. Gene labels can also be annotated in the volcano plot.
- Trajectory analysis plots show cell trajectories and pseudotime calculations.
- Normalized expression matrices and differential expression gene lists can also be exported from the Plots and Tables tab.
If you need help using Trailmaker, use the ‘Support’ button to access the user guide or our dedicated single cell RNA-seq data analysis course. You can also contact our team to report issues, request new features or for 1-to-1 support.
It really is as easy as that! Discover your single cell journey with Trailmaker.