cellxgene (pronounced "cell-by-gene") is an easy to use single cell data exploration tool that allows one to manually define clusters and run differential expression from an interactive webpage. The only input required for cellxgene is a single "h5ad" file, which is a compressed object containing all the data necessary to analyze a single cell data set. With the addition of a single parameter, the Parse Biosciences' pipeline can produce an h5ad file ready to use with cellxgene.
Requirements
The instructions for installing cellxgene are for local (i.e. desktop or laptop) computers running MacOS or Linux that have a chromium based browser installed (e.g. Google Chrome, Opera, etc.), but please note the h5ad file is generated wherever split-pipe
is installed. Windows users will have to launch cellxgene from a WSL (Windows Subsytem for Linux) session. More information on how to setup WSL can be found on Microsoft's support site. The only other requirement for cellxgene is a python installation (version 3.6 or greater).
Generating the h5ad file
To generate an h5ad file using split-pipe
, we'll first create a parameter file containing the necessary argument. You can create this file in any directory, but we recommend creating this file in your project directory to keep things organized.
echo "ana_save_anndata True" > parfile.txt
Next, we'll pass the parameter file to split-pipe
using the full path of parfile.txt
. For example, if I created the parfile in my home directory my argument would be --parfile ~/parfile.txt
. If you've already run the pipeline, we can start from the analysis step so we don't have to repeat computational intensive steps (e.g. alignment):
split-pipe --mode ana \
--genome_dir /newvolume/genomes/hg38/ \
--output_dir /pipeline-out/ \
--parfile /newvolume/myproject/parfile.txt \
--sample all-well A1:D12
If you are starting from raw fastq files, you can pass the parameter file using the the same --parfile parfile.txt
argument to spilt-pipe --mode all
.
h5ad file location
For each sample, the h5ad file will show up in the DGE folder as anndata.h5ad
. In this example "all-well" is the only sample.
├── all-well
├── DGE_filtered
├── DGE.mtx
├── anndata.h5ad
├── cell_metadata.csv
└── genes.csv
If your pipeline output folder is on a remote computer, you can transfer the h5ad file to your local machine using the bash command scp
. Here's an example scp
command for aws ec2 users :
scp -i "~/my-key.pem" \
ubuntu@18.236.74.168:/newvolume/analysis/all-well/DGE_filtered/anndata.h5ad \
/home/user/
Where my-key.pem
is your aws key, ubuntu
is your user name, and 18.236.74.168
is the public (not private) ip address obtained on your ec2 instance summary page. The public IP address changes every time you restart your aws instance. Also, please note that the file path /home/user/
in the second argument is the destination folder on your local machine.
Installing and running cellxgene
Assuming the user already has python installed on their local computer, cellxgene can be installed using pip:
pip install cellxgene
Next, launch the cellxgene session using the following command:
cellxgene launch anndata.h5ad
Finally, there should be a dialog that prompts the user to open the link in a browser:
[cellxgene] Launching! Please go to http://localhost:5005 in your browser.
That's it! further information on cellxgene can be found on the documentation homepage and GitHub.