Usage Guide¶
This guide provides comprehensive instructions for running the fada pipeline using the commands demonstrated in the GitHub workflows.
Prerequisites¶
System Requirements¶
- Python 3.9 or higher (tested with python 3.11.13)
- Snakemake workflow management system (tested with version 7.32.4)
- Note: This pipeline has not been tested with Snakemake 8.x or 9.x versions
- Container runtime (Docker or Singularity/Apptainer)
Installation¶
- Install Python dependencies:
python3.11 -m venv fada_env
source fada_env/bin/activate
pip install -r requirements.txt
- For container-based execution, make sure that Singularity/Apptainer is installed.
Workflow Types¶
Fada currently supports three main workflow configurations, each optimized for specific sequencing strategies and analysis goals.
Input Data files¶
Each workflow requires specific samples and units input files:
This can be created using hydra-genetics create-input-files
- PacBio Twist Cancer:
samples_pacbio_twist_cancer.tsv
units_pacbio_twist_cancer.tsv
- PacBio WGS:
samples_pacbio_wgs.tsv
units_pacbio_wgs.tsv
- ONT STR:
samples_ont_str.tsv
units_ont_str.tsv
1. PacBio Twist Cancer Panel¶
Create the required samples and units input files¶
hydra-genetics create-input-files -d /path/to/pacbio/uBAM/-t N -p PACBIO --post-file-modifier pacbio_twist_cancer
Outputs:
samples_pacbio_twist_cancer.tsv
units_pacbio_twist_cancer.tsv
Dry Run (Validation)¶
snakemake -n -s workflow/Snakefile \
--configfiles config/config.yaml config/config_pacbio_twist_cancer.yaml \
--config PIPELINE_REF_DATA=/path/to/reference/data/files sequenceid="test"
Running on a cluster¶
Recommended: Use a cluster profile for better resource management, the example uses the example profiles yaml file in the github repo
snakemake --profile profiles/marvin_cpu -s workflow/Snakefile \
--configfiles config/config.yaml config/config_pacbio_twist_cancer.yaml \
--config PIPELINE_REF_DATA=/path/to/reference/data/files sequenceid="your_run_id"
2. PacBio Whole Genome Sequencing (WGS)¶
Create the required samples and units input files¶
hydra-genetics create-input-files -d /path/to/pacbio/uBAM/-t N -p PACBIO --post-file-modifier pacbio_wgs
Outputs:
samples_pacbio_wgs.tsv
units_pacbio_wgs.tsv
Production Run (Recommended: Use Profile)¶
snakemake --profile profiles/marvin_cpu -s workflow/Snakefile \
--configfiles config/config.yaml config/config_pacbio_wgs.yaml \
--config PIPELINE_REF_DATA=/path/to/reference/data/files \
--use-singularity
3. ONT Targeted STR Analysis¶
hydra-genetics create-input-files -p ONT -d /path/to/ONT/uBAM/ -t N -b 'NNNN' --post-file-modifier "ont_str"
Outputs:
samples_ont_str.tsv
units_ont_str.tsv
snakemake --profile profiles/marvin_cpu -s workflow/Snakefile \
--configfiles config/config.yaml config/config_ont_target_str.yaml \
--config PIPELINE_REF_DATA=/path/to/reference/data/files \
--use-singularity
Command Parameters Explained¶
Configuration Files¶
--configfiles config/config.yaml: Main pipeline configuration- Additional workflow-specific configs:
config/config_pacbio_twist_cancer.yaml: Cancer panel settingsconfig/config_pacbio_wgs.yaml: PacBio WGS settingsconfig/config_ont_target_str.yaml: ONT STR analysis settings
Runtime Configuration¶
--config PIPELINE_REF_DATA=reference: Specifies reference data location--config sequenceid="test": Set sequence identifier (Twist cancer only, may remove this in future)--config resources=resources.yaml: Custom resource allocation
Core Snakemake Parameters¶
-s workflow/Snakefile: Specifies path to the main Snakefile to execute-n: Dry run mode - validates the workflow without execution-p: Print shell commands being executed--show-failed-logs: Display logs from failed jobs for debugging
Container Execution¶
--use-singularity: Enable Singularity container execution--singularity-args: Container-specific arguments--no-home: Don't bind home directory--cleanenv: Use clean environment--bind /path/to/your/data: Bind data directories to container--singularity-prefix singularity_files: Directory for cached container images
Profile Configuration¶
Example Profile in This Repository¶
This repository includes an example SLURM profile at profiles/marvin_cpu/ that demonstrates:
- SLURM-DRMAA integration: Uses the DRMAA (Distributed Resource Management Application API) interface for job submission to SLURM
- Singularity container execution: Automatically enables Singularity with appropriate bind mounts and resource constraints
- Optimized settings: Configured for high-throughput execution with job parallelization and resource management
To use this profile as a template for your own cluster:
# Copy and modify the profile for your environment
cp -r profiles/marvin_cpu/ profiles/my_cluster/
# Edit profiles/my_cluster/config.yaml to match your cluster configuration
# Then run with your custom profile
snakemake --profile profiles/my_cluster/ -s workflow/Snakefile \
--configfiles config/config.yaml config/config_pacbio_twist_cancer.yaml \
--config PIPELINE_REF_DATA=reference sequenceid="your_sample_id"
For additional profile examples and documentation, see the Snakemake profiles documentation and the snakemake-profiles repository for ready-to-use cluster profiles.