Guided walkthrough: Insights module set-up – Support Suite - Parse Biosciences

The Insights module of Trailmaker^TM is where processed data files generated after FASTQ file processing (such as count matrices, Seurat objects, etc.) are input for downstream analysis and visualization using either Seurat or Scanpy workflows. This module offers advanced filtering and data cleanup, integration of multi-sample datasets, customization of data visualization and clustering, cell set annotation, differential expression and pathway analysis, and plot customization for the generation of publication-ready figures.

To get started with the Insights module, select one of the following options to navigate to the relevant section of this document:

1. Explore a public dataset from the Trailmaker dataset repository

2. Use Insights module with Parse Biosciences data

3. Use Insights module with any non-Parse data type

Insights module set-up: the Trailmaker dataset repository

Want to explore Trailmaker features and functionality fast, without having to locate and upload your own dataset? Grab a publicly available dataset from the dataset repository!

The dataset repository can be accessed directly via this link:

https://app.trailmaker.parsebiosciences.com/repository

The dataset repository contains ~50 publicly available datasets, totalling >6 million cells. Some specific datasets to draw your attention to are:

The Parse Biosciences Evercode™ v3 human immune cells (PBMCs) dataset that was used in the Trailmaker webinar demo is top of the list in the dataset repository.
The dataset from the recent Comparison of Evercode™ WT v3 and Chromium™ GEM-X Single Cell 3’ Kit v4 in Mouse Brain Nuclei tech note is second on the list in the repository.

All datasets in the repository are available to explore for free. Simply click ‘Explore’ on a dataset of your choice to start exploring the dataset in Trailmaker’s Insights module. Further instructions are available in the article How to explore demo datasets from Trailmaker’s dataset repository.

Insights module set-up with Parse Biosciences data

Before input to the Insights module, FASTQ files from Parse Biosciences kits have to be processed using the Parse Pipeline. The Parse Pipeline is free for all Parse customers and there are two options for running it:

1. In Trailmaker’s Pipeline module

2. By downloading and installing the Pipeline.

The article How Do I Analyze my Parse Biosciences Data? can guide you towards selecting the best option for processing your raw data.

1. From Pipeline runs in Trailmaker

If you run the pipeline in Trailmaker’s Pipeline module, the unfiltered count matrices are automatically sent to the Insights module for downstream analysis. To access the Insights module Project, click ‘Go to Insights downstream analysis’ on the Pipeline outputs page:

If you have multiple Runs within Trailmaker’s Pipeline module that you want to combine into a single downstream analysis, you can do this by downloading the unfiltered count matrices from the Pipeline Outputs page of each relevant Run, and uploading those files to a new Project in the Insights module.

Note that the Insights module currently only supports the downstream analysis and visualization of WT count matrices. Therefore, for immune profiling (TCR or BCR) runs, only the WT outputs from paired runs are sent to the Insights module for downstream analysis. Downstream analysis of BCR or TCR data should be performed in Python or R using third party tools.

2. From a local installation run of the Parse Pipeline

If you download, install and run the pipeline using the Linux command line installed option, you will then need to manually upload the count matrices to the Insights module of Trailmaker for downstream analysis. This is also true if you have been given Pipeline output files from your core facility or from a collaborator.

We recommend uploading the unfiltered count matrices, located in the ‘DGE_unfiltered’ folders, to a new Project in Trailmaker’s Insights module. The user guide provides full instructions on how to upload Parse Biosciences count matrices that are output from the Parse pipeline to Trailmaker. The simplest way is to drag and drop the entire pipeline output folder into the data upload modal in Trailmaker:

For Parse Biosciences projects, you also need to specify the kit type in the Project Details page:

Regardless of the mechanism of data input to the Insights module, it is important to add metadata to multi-sample Projects in order to assign samples to groups. For example, samples within a dataset could be assigned as “control” and “treated”; or “healthy” and “disease”. Assigning metadata allows the comparison of groups to determine differentially expressed genes (e.g. to calculate differentially expressed genes in a cluster of interest comparing two groups) and visualization of groups (e.g. a dot plot showing the expression of multiple genes of interest across two or more groups) further downstream in the platform. Samples can be assigned to multiple metadata groups. It’s best to add all metadata before processing your Project, as later addition of metadata requires data processing to re-run.

With data uploaded and metadata added, your dataset is ready to be processed. By clicking ‘Process project’, your Project will soon be ready to explore in the Data Processing, Data Exploration, and Plots and Tables pages within the Insights module.

Insights module set-up with non-Parse data formats

The user guide provides full details of the file formats that are supported by Trailmaker’s Insights module, together with instructions on the data upload process. Briefly, in addition to Parse Biosciences data, the Insights module supports:

Count matrices from 10x Genomics^TM Chromium^TM technology that have been generated using Cell Ranger^TM. You should have 3 data files per sample: barcodes.tsv, features.tsv or genes.tsv, and matrix.mtx.
Data generated using BD Rhapsody^TM in the expression_data.st file format.
Seurat v4 objects in the .rds format.
H5 files in the matrix.h5 file format, such as those output from Cell Ranger.

Note that h5ad files and count matrices in CSV/TSV format (i.e. 1 file per sample or dataset) are not supported by Trailmaker. Further guidance on how to convert these file types to a Trailmaker-compatible format is provided in the user guide. Bulk sequencing data is also not supported by Trailmaker.

To upload data to the Insights module, start by creating and naming a new Project:

Then select ‘Add data’ and select the relevant technology / data format from the dropdown menu:

Full instructions on the file formats and required folder structure are provided in the data upload modal and in the user guide.

Make sure to add metadata to multi-sample Projects in order to assign samples to groups. For example, samples within a dataset could be assigned as “control” and “treated”; or “healthy” and “disease”. Assigning metadata allows the comparison of groups to determine differentially expressed genes (e.g. to calculate differentially expressed genes in a cluster of interest comparing two groups) and visualization of groups (e.g. a dot plot showing the expression of multiple genes of interest across two or more groups) further downstream in the platform. Samples can be assigned to multiple metadata groups. It’s best to add all metadata before processing your Project, as later addition of metadata requires data processing to re-run.

Key links