The nf-core/rnaseq pipeline is an automated workflow for analysing RNA sequencing data. It takes raw sequencing files, performs quality control, maps reads, and generates gene quantification.
You can find the official usage documentation here: https://nf-co.re/rnaseq/3.26.0/docs/usage/
Create a samplesheet samplesheet.csv, which links the fastQ files to samples. Follow this structure:
sample,fastq_1,fastq_2,strandedness
AB2_neg1,/data/ukdri/BUR/BUR_MM_1/raw/fastq/AB2_neg1_S1_L001_R1.fastq.gz,/data/ukdri/BUR/BUR_MM_1/raw/fastq/AB2_neg1_S1_L001_R2.fastq.gz,auto
AB2_pos1,/data/ukdri/BUR/BUR_MM_1/raw/fastq/AB2_pos1_S1_L001_R1.fastq.gz,/data/ukdri/BUR/BUR_MM_1/raw/fastq/AB2_pos1_S1_L001_R2.fastq.gz,auto
We provide a job template script for submitting a slurm job:
/nfsdata/scripts/job_scripts/run_nfcore_rnaseq.sh
To use the script, copy it to your dataset specific project folder and change the input files and parameters as desired.
As input, the pipeline needs:
samplesheet: path to a sample sheet, which specifies, which fastQ files belong to which sample (see more information below)resdir: path to the directory where results will be stored. The pipeline output will be stored in a subfolder called out# CREATE AND CHANGE PATH TO SAMPLESHEET
samplesheet=/nfsdata/${USER}/PATH_TO_SAMPLE_SHEET
# CHANGE RESULTS_DIR on your folder on /data
resdir=/data/${USER}/RESULTS_DIR
outdir=$resdir/out
Always use full file paths to avoid any complications.
Per default, the pipe line is configured to use the mouse mm39 genome and ENSEMBL version 115 gene annotation. You can change the genome assembly and annotation files:
# CHANGE GENOME AND ANNOTATION IF NEEDED
gtf=/nfsdata/genome/ensembl/release-115/GRCm39/chrMus_musculus.GRCm39.115.chr.gtf.gz
genome_fasta=/nfsdata/genome/ucsc/mm39/mm39.fa.gz
# aligner options: star_salmon/star_rsem/hisat2
aligner=star_rsem
Genome assembly files and gene annotations are stored here:
/nfsdata/genome/ucsc/
/nfsdata/genome/ensembl/
We recommend STAR for read mapping and RSEM for quantification aligner=star_rsem as specified in the job template file.
Submit the job script to run the pipeline:
sbatch run_nfcore_rnaseq.sh
The pipeline generates organized folders inside your designated output directory:
fastqc/: Raw read quality control reports.star_rsem/: Aligned reads and transcript quantification files.multiqc/: HTML report summarizing all QC metrics.Count matrix files (using RSEM):
out/star_rsem/rsem.merged.gene_counts.tsv: raw counts per geneout/star_rsem/rsem.merged.gene_tpm.tsv: normalised counts per gene as transcripts per million (TPM)out/star_rsem/rsem.merged.transcript_counts.tsv: raw counts per transcriptout/star_rsem/rsem.merged.transcript_tpm.tsv: normalised counts per transcript as transcripts per million (TPM)