Overview
This article outlines the data analysis options available for processing FASTQ files using Parse Biosciences’ pipeline and for downstream data analysis and visualization using interactive or code-based methods. Note that BCL to FASTQ file pre-processing is not covered within this article; see “Generating fastq files with bcl2fastq” for assistance generating FASTQ files before proceeding with data analysis.
FASTQ file processing with the Parse Biosciences Pipeline (FASTQ files to count matrices)
Parse Biosciences offers a computational pipeline that processes your FASTQ files, handling essential tasks such as barcode correction, read alignment, read deduplication, and transcript quantification. These quantified transcripts are then used to generate a cell-by-gene count matrix used for downstream analyses. The Pipeline is free for all Parse customers and there are two options for running the pipeline:
1. Trailmaker™
Trailmaker, our user-friendly data analysis platform, enables scientists without coding or bioinformatics expertise to access and run the end-to-end data analysis workflow. The platform includes the Pipeline module which processes FASTQ files and the Insights module for downstream analysis. Trailmaker is available for free to all Parse customers. Access Trailmaker here to start your analysis today: https://app.trailmaker.parsebiosciences.com/
Note: When creating an account, it is highly recommended to use the same email address that you currently use to login into the Support Suite.
Figure 1. Trailmaker Pipeline module workflow.
Additional resources: An in-depth demo of Trailmaker offers the opportunity to understand its features and capabilities. Further support for using Trailmaker is available in the user guide. Note that the Trailmaker Pipeline module currently only supports Evercode™ Whole Transcriptome data, with support for other Parse kits including Evercode BCR, Evercode TCR, Gene Select and CRISPR Detect coming soon.
2. Download and install the Parse Pipeline
As an alternative option to running the pipeline using Trailmaker, the pipeline can be run on a Linux operating system, either locally (assuming the hardware requirements are met) or on a server. This option offers the flexibility of installing and running the pipeline in an environment of your choice and is best suited to users who are comfortable with Linux command-line tools and system configuration commands. This option is compatible with data from our Evercode Whole Transcriptome, Evercode TCR, Evercode BCR, Gene Select, and CRISPR Detect kits. Our support suite provides detailed instructions on how to download, install and run the pipeline. If you are a current customer, and do not have access to the support suite, please fill out your contact information in the Pipeline Download Request Form for access.
Downstream data analysis and visualization (count matrices to figures)
The outputs from the Parse Pipeline, whether obtained through the Trailmaker Pipeline module or the Linux command-line installed option, can be used for downstream analysis with various tools. We recommend using the unfiltered cell-gene count matrices, located in the ‘DGE_unfiltered’ folders, for downstream analysis, exploration, and plotting. Options for downstream analysis include:
1. Parse’s specialized user-friendly data analysis platform, Trailmaker. In Trailmaker, the pipeline output files are automatically sent from the Pipeline module to the Insights module for downstream analysis. The platform guides you step by step through filtering, quality control checks and data cleanup, integration of multi-sample datasets, and interactive data exploration to plot customization and figure generation. Create an account and explore one of the demo datasets or start analyzing your own data at https://app.trailmaker.parsebiosciences.com/
Note: When creating an account, it is highly recommended to use the same email address that you currently use to login into the Support Suite.
Figure 2. Trailmaker Insights module workflow for downstream analysis and visualization.
Figure 3. Trailmaker Data Exploration tab for investigating clustering, gene expression and differential expression within your dataset. The screenshot shows the 4-sample PBMC dataset generated with Evercode WT technology from this technical report, which can be accessed via Trailmaker’s datasets repository.
2. Code-based methods such as Seurat or Scanpy. This option offers the most flexibility and customization for data analysis and is popular among customers with bioinformatics training. Full tutorials outlining how to execute a downstream analysis using the R package Seurat and the python package Scanpy are available on the support suite.
3. Import into third party tools, such as Cellxgene or BioTuring BBrowserX. Trailmaker has multiple data export points, including the unfiltered count matrices and the Seurat object, that enable integration with third party tools.
For further support with your analysis, contact us at: support@parsebiosciences.com.
Access our free single cell RNA-seq data analysis course at: https://courses.trailmaker.parsebiosciences.com/courses/mastering-scrna-seq-with-trailmaker.