The following article contains instructions for running bcl2fastq, as well as example sample sheets for dual index and single index libraries.
Installing bcl2fastq on Ubuntu 20.03 LTS
Install alien to be able to use rpm installer on ubuntu:
sudo apt-get install alien
Download the linux rpm for bcl2fastq (v2.20) from Illumina:
https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html
Unzip the rpm installer:
unzip bcl2fastq2-v2-20-0-linux-x86-64.zip
Install bcl2fastq using alien:
sudo alien -i bcl2fastq2-v2.20.0.422-Linux-x86_64.rpm
Running bcl2fastq to get fastq files
First download the entire directory from your Illumina sequencer or from basespace (in this example the directory is called 210331_NB923494_0012_ADK4NFCDA1):
> ls 210331_NB923494_0012_ADK4NFCDA1
Alignment_1 InterOp RTARead3Complete.txt
CompletedJobInfo.xml Logs Recipe
Config QueuedForAnalysis.txt RunCompletionStatus.xml
CopyComplete.txt RTAComplete.txt RunInfo.xml
Data RTAConfiguration.xml RunParameters.xml
GenerateFASTQRunStatistics.xml RTALogs SoftwareVersionsFile.csv
Images RTARead1Complete.txt
InstrumentAnalyticsLogs RTARead2Complete.txt
Example sample sheet for unique dual indexed (UDI) libraries
Make a SampleSheet.csv file by filling out the below template file. A CSV version of the example is attached at the bottom of this article. You will need to specify the number of cycles for Read1 and Read2. Depending on the kit used to prepare libraries, Read 2 must be a minimum of 86 cycles (i.e., Evercode WT v2) or 58 cycles (i.e., Evercode WT v3) to detect all barcodes. This example includes sublibrary index IDs 1-8 from the UDI plate-WT that were used for WT library preparation. Please refer to the user manual for a list of which index sequences to use for demultiplexing.
Important: Please do not include an adapter sequence under settings. This will result in unnecessary trimming by bcl2fastq which can remove barcode sequences from read 2.
Note: For the i5 index, some sequencing instruments require the reverse complementary sequence (as shown in the example below) in the sample sheet instead of the forward sequence. Please input the sequence according to the sequencing instrument you are using.
[Header]
Local Run Manager Analysis Id,<fill>
Experiment Name,<fill>
Date,<fill>
Module,GenerateFASTQ - 2.0.1
Workflow,GenerateFASTQ
Assay,Nextera
Description,<fill>
Chemistry,Default
[Reads]
<read1 length>
<read2 length>
[Settings]
[Data]
Sample_ID,Sample_Name,Description,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project
<fill>,<fill>,,,CAGATCAC,,ATGTGAAG,
<fill>,<fill>,,,ACTGATAG,,GTCCAACC,
<fill>,<fill>,,,GATCAGTC,,AGAGTCAA,
<fill>,<fill>,,,CTTGTAAT,,AGTTGGCT,
<fill>,<fill>,,,AGTCAAGA,,ATAAGGCG,
<fill>,<fill>,,,CCGTCCTA,,CCGTACAG,
<fill>,<fill>,,,GTAGAGTA,,CATTCATG,
<fill>,<fill>,,,GTCCGCCT,,AGATACGG,
Note: Some sample sheets may have different terminology to specify the index reads. Typically, “Read 1 Index” refers to the i7 index and “Read 2 Index” refers to the i5 index. If you have questions on where to input read length or index sequences, please consult with your sequencing provider.
Then add your SampleSheet.csv to the top level of the sequencing directory.
> ls 210331_NB923494_0012_ADK4NFCDA1
Alignment_1 InterOp RTARead3Complete.txt
CompletedJobInfo.xml Logs Recipe
Config QueuedForAnalysis.txt RunCompletionStatus.xml
CopyComplete.txt RTAComplete.txt RunInfo.xml
Data RTAConfiguration.xml RunParameters.xml
GenerateFASTQRunStatistics.xml RTALogs SampleSheet.csv
Images RTARead1Complete.txt SoftwareVersionsFile.csv
InstrumentAnalyticsLogs RTARead2Complete.txt
Then run bcl2fastq. In the following example we are using 32 cores and outputting the fastq files into a folder called "fastq_files". The --no-lane-splitting parameter can be convenient since it ensures that all reads with a given index will be demultiplexed into the same fastq files regardless of lane.
bcl2fastq -i 210331_NB923494_0012_ADK4NFCDA1/Data/Intensities/BaseCalls/ -p 32 --output-dir 210331_NB923494_0012_ADK4NFCDA1/fastq_files --no-lane-splitting
Here are the resulting fastq files after bcl2fastq completes.
> ls fastq_files
Reports s3_S3_R2_001.fastq.gz s8_S8_R1_001.fastq.gz
Stats s4_S4_R1_001.fastq.gz s8_S8_R2_001.fastq.gz
Undetermined_S0_R1_001.fastq.gz s4_S4_R2_001.fastq.gz
Undetermined_S0_R2_001.fastq.gz s5_S5_R1_001.fastq.gz
s1_S1_R1_001.fastq.gz s5_S5_R2_001.fastq.gz
s1_S1_R2_001.fastq.gz s6_S6_R1_001.fastq.gz
s2_S2_R1_001.fastq.gz s6_S6_R2_001.fastq.gz
s2_S2_R2_001.fastq.gz s7_S7_R1_001.fastq.gz
s3_S3_R1_001.fastq.gz s7_S7_R2_001.fastq.gz
The demultiplexed fastq files are now ready to be processed. See Running the Pipeline (Current Version) for more information.
Appendix: Example sample sheet for single indexed libraries
Make a SampleSheet.csv file by filling out the below template file. A CSV version of the example is attached at the bottom of this article. You will need to specify the number of cycles for Read1 and Read2 (e.g., for Evercode WT v2, Read2 must be a minimum of 86 cycles to detect all barcodes). This example includes the sequences for eight single indices from an Evercode WT kit. Please refer to the user manual for a list of which index sequences to use for demultiplexing.
Important: Please do not include an adapter sequence under settings. This will result in unnecessary trimming by bcl2fastq which can remove barcode sequences from read 2.
[Header]
Local Run Manager Analysis Id,<fill>
Experiment Name,<fill>
Date,<fill>
Module,GenerateFASTQ - 2.0.1
Workflow,GenerateFASTQ
Assay,Nextera
Description,<fill>
Chemistry,Default
[Reads]
<read1 length>
<read2 length>
[Settings]
[Data]
Sample_ID,Sample_Name,Description,index,I7_Index_ID,Sample_Project
<fill>,<fill>,,CAGATC,CAGATC,
<fill>,<fill>,,ACTTGA,ACTTGA,
<fill>,<fill>,,GATCAG,GATCAG,
<fill>,<fill>,,TAGCTT,TAGCTT,
<fill>,<fill>,,ATGTCA,ATGTCA,
<fill>,<fill>,,CTTGTA,CTTGTA,
<fill>,<fill>,,AGTCAA,AGTCAA,
<fill>,<fill>,,AGTTCC,AGTTCC,