Information about how to access and use the ADDI Microsoft Azure Cloud High Performance Computing (HPC) cluster.
We are very thankful to The Alzheimer's Disease Data Initiative (ADDI), which kindly provides us with the cloud compute resources.
NOTE: This is not a normal HPC environment but a Azure cloud Slurm cluster with a minimal software configuration. The cluster does not support the installation of any custom software or conda environment. All software should be run using apptainer files. The purpose of this cluster is to run Nextflow pipelines for data analysis, which the Core Informatics team is setting up.
NOTE: The cluster is not a trusted research environment (TRE). Given that it is running on Microsoft Azure Cloud, there are no strict security mechanisms in place that might be required by the General Data Protection Regulation (GDPR) of human data. Please, ensure that you storing any sensitive huamn patient on the ADDI Azure Cloud HPC.
(back to top)
The HPC cluster can only be accessed via SSH using a command-line shell. If you are a Windows user you can use PuTTY as a SSH capable shell. Please, follow these steps to gain access to the cluster:
Contact us about creating your cluster user account at the following email: ukdri-informatics {at} ucl.ac.uk. You will receive your Azure Cloud cluster username and password.
If you did not already do so, create a SSH key for your computer or laptop. See: ssh.com for Linux and MacOS or Microsoft SSH instructions for Windows.
Upload your public SSH key(s) in the Azure Cloud web interface using your username and password. We will send you the address for Azure Cloud web interface.
NOTE: the web address is not registered with an official DNS body. So you will need most-likely have to allow a security exception in your web browser.
After logging in, select My Profile:

Then Edit Profile and paste your public SSH key(s) into SSH Public Key:

For Linux and MacOS, you can find your public SSH key at: ~/.ssh/. For example, ~/.ssh/id_rsa.pub or ~/.ssh/.ssh/id_ed25519.pub.
Click Save.
We also advise you to change your password. You will only need your password to add SSH keys for additional devices if necessary.
(back to top)
After uploading your public SSH key, you can log into the cluster using its public IP-address. The IP-address will change with each restart of the cluster because it is Azure cloud cluster. We will send you the IP-address and an update in case of an IP-address change. For example, if your user name is ukdritestuser, use the following command to log into the cluster:
ssh ukdritestuser@XXX.XXX.XX.XX
Substitute XXX.XXX.XX.XX with IP address that we will send to you and use your username instead of ukdritestuser.
(back to top)
The cluster contains two kinds of storage:
temporary working storage partition: /nfsdata
data partition: /data
Each user has its personal folder on both partitions. For example, if your username is ukdritestuser:
/data/ukdritestuser
/nfsdata/ukdritestuser
We provide template job scripts for available pipelines and related jobs:
/nfsdata/scripts/job_scripts/
You should to copy the correspond template job script .sh to one of your own sub folders at /nfsdata/$USER/ and change the input files and parameters in it. See Pipelines for more details.
Files related to genomes and gene annotations can be found here:
/nfsdata/genome/
This includes :
/nfsdata/genome/gprofiler/ucsc/nfsdata/genome/ensembl//nfsdata/genome/hcop/nfsdata/genome/uniprot/nfsdata/genome/gprofiler/Utility scripts for simple file operations are stored at:
/nfsdata/scripts/utility/
Available Nextflow pipelines are added at:
/nfsdata/scripts/nf-core/
Please, use the template job scripts to run pipelines located at /nfsdata/scripts/job_scripts/.
All software on the cluster is run in containers. The the corresponding apptainer files are located here:
/nfsdata/apptainer/
(back to top)
All computational processes should be submitted through the slurm batch system. The cloud will spawn required nodes on the fly. It may take to up to 15min until a new node is successfully spawned and your job will start. Use these core Slurm commands to manage your cluster jobs properly:
sbatch run_nfcore_rnaseq.shscancel 123456squeue -u usernamesacct -j 123456The cluster is using slurm a scheduler. It contains the typical Microsoft Azure cloud slurm cluster partitions:
(back to top)
These rules ensure cluster stability, high performance, and fair resource sharing for all users.
/data for your primary, long-term storage needs./nfsdata only for temporary storage and job submissions./nfsdata./nfsdata directories regularly./data directory(back to top)
Here is a list of five installed command-line tools that help working with the HPC cluster.
tmux allows you to run multiple terminal sessions inside a single window, detach from them, and reattach later without losing running processes.
tmux new -s [session_name] (Starts a new session)tmux attach-t [session_name].nano is a user-friendly, command-line text editor designed for quick edits directly inside the terminal.
nano filename.txt (Opens or creates a file)rsync is a fast, versatile tool for copying and synchronising files locally or across remote servers.
rsync -avh /nfsdata/$USER/project_folder /data/$USER/project_folderrsync -avh ukdritestuser@remote:/data/$USER/project_folder local_project_folder/.du estimates and displays the file space usage of directories and files.
du -h --max-depth=1 /nfsdata/$USERdf displays the amount of available and used disk space on file systems.
df -h.