Softwares used in the fada module¶

bedtools_intersect_cnvkit ¶

Exports CNVs that include the majority of the target gene from a VCF based on a design BED file, effectively filtering CNVs by genomic intervals of interest.

Rule¶

rule bedtools_intersect_cnvkit:
    input:
        left="cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.CNVS.vcf.gz",
        right=config.get("reference", {}).get("design_genes_bed", ""),
    output:
        vcf="cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf",
    params:
        extra=f"-F 0.6 -u -header {config.get('bedtools_intersect_cnvkit', {}).get('extra', '')}",
    log:
        "cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf.log",
    benchmark:
        repeat(
            "cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf.benchmark.tsv",
            config.get("bedtools_intersect_cnvkit", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("bedtools_intersect_cnvkit", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("bedtools_intersect_cnvkit", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("bedtools_intersect_cnvkit", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("bedtools_intersect_cnvkit", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("bedtools_intersect_cnvkit", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("bedtools_intersect_cnvkit", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("bedtools_intersect_cnvkit", {}).get("container", config["default_container"])
    message:
        "{rule}: export CNVS that include the majority of the target gene from {input.left} based on {input.right}"
    wrapper:
        "v1.32.0/bio/bedtools/intersect"

input / output files¶

Rule parameters	Key	Value	Description
input	left	`"cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.CNVS.vcf.gz"`	CNVkit vcf with all CNVS
_ _	right	`config.get("reference", {}).get("design_genes_bed", "")`	bed file with target genes
output	vcf	`"cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf"`	CNVkit vcf with only CNVS that include the whole target gene

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

cnvkit_segment ¶

Segments copy number ratio data (log2 bins) using CNVkit, incorporating SNP information for more accurate segmentation and germline HMM modeling.

Rule¶

rule cnvkit_segment:
    input:
        log2_bins="cnv_sv/cnvkit_batch/{sample}/{sample}_{type}.cnr",
        snps="snv_indels/deepvariant/{sample}_{type}.fix_af.bcftools_view.SNPS.vcf.gz",
    output:
        segment=temp("cnv_sv/cnvkit_segment/{sample}_{type}.cns"),
    params:
        extra=config.get("cnvkit_segment", {}).get("extra", ""),
        method=config.get("cnvkit_segment", {}).get("method", "hmm-germline"),
    log:
        "cnv_sv/cnvkit_segment/{sample}_{type}.segmetrics.cns.log",
    benchmark:
        repeat(
            "cnv_sv/cnvkit_segment/{sample}_{type}.segmetrics.cns.benchmark.tsv",
            config.get("cnvkit_segment", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("cnvkit_segment", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("cnvkit_segment", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("cnvkit_segment", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("cnvkit_segment", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("cnvkit_segment", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("cnvkit_segment", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("cnvkit_segment", {}).get("container", config["default_container"])
    message:
        "{rule}: segment {input.log2_bins} with cnvkit using SNPs from {input.snps}"
    shell:
        "cnvkit.py segment "
        "-v {input.snps} "
        "-m {params.method} "
        "-o {output.segment} "
        "{params.extra} {input.log2_bins}  > {log} 2>&1 "

input / output files¶

Rule parameters	Key	Value	Description
input	log2_bins	`"cnv_sv/cnvkit_batch/{sample}/{sample}_{type}.cnr"`	Bin-level log2 ratios file from cnvkit
_ _	snps	`"snv_indels/deepvariant/{sample}_{type}.fix_af.bcftools_view.SNPS.vcf.gz"`	SNPs vcf file to use for segmentation
output	segment	`"cnv_sv/cnvkit_segment/{sample}_{type}.cns"`	Segmented log2 ratios file from cnvkit

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

picard_bed_to_interval_list ¶

Converts a genomic design BED file into a Picard-style interval list, using the reference FASTA and sequence dictionary for header information.

Rule¶

rule picard_bed_to_interval_list:
    input:
        bed=config.get("reference", {}).get("design_bed", ""),
        reference=config["reference"]["fasta"],
        dict="qc/picard_create_sequence_dictionary/reference.dict",
    output:
        interval_list=temp("qc/picard_bed_to_interval_list/targets.interval_list"),
    params:
        extra=config.get("picard_bed_to_interval_list", {}).get("extra", ""),
    log:
        "qc/picard_bed_to_interval_list/targets.interval_list.log",
    benchmark:
        repeat(
            "qc/picard_bed_to_interval_list/targets.interval_list.benchmark.tsv",
            config.get("picard_bed_to_interval_list", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("picard_bed_to_interval_list", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("picard_bed_to_interval_list", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("picard_bed_to_interval_list", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("picard_bed_to_interval_list", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("picard_bed_to_interval_list", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("picard_bed_to_interval_list", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("picard_bed_to_interval_list", {}).get("container", config["default_container"])
    message:
        "{rule}: create interval_list file from {input.bed}"
    wrapper:
        "0.79.0/bio/picard/bedtointervallist"

input / output files¶

Rule parameters	Key	Value	Description
input	bed	`config.get("reference", {}).get("design_bed", "")`	design bed file (aka. target bed file)
	reference	`config["reference"]["fasta"]`	reference genome fasta file
_ _	dict	`"qc/picard_create_sequence_dictionary/reference.dict"`	sequence dictionary file for the reference genome
output	interval_list	`"qc/picard_bed_to_interval_list/targets.interval_list"`	picard interval_list file for design regions

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

picard_create_sequence_dictionary ¶

Generates a sequence dictionary (SAM/BAM header format) from a reference FASTA file, required by various Picard and GATK tools.

Rule¶

rule picard_create_sequence_dictionary:
    input:
        fasta=config["reference"]["fasta"],
    output:
        dict_file=temp("qc/picard_create_sequence_dictionary/reference.dict"),
    params:
        extra=config.get("picard_create_sequence_dictionary", {}).get("extra", ""),
    log:
        "qc/picard_create_sequence_dictionary/reference.dict.log",
    benchmark:
        repeat(
            "qc/picard_create_sequence_dictionary/reference.dict.benchmark.tsv",
            config.get("picard_create_sequence_dictionary", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("picard_create_sequence_dictionary", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("picard_create_sequence_dictionary", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("picard_create_sequence_dictionary", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("picard_create_sequence_dictionary", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("picard_create_sequence_dictionary", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("picard_create_sequence_dictionary", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("picard_create_sequence_dictionary", {}).get("container", config["default_container"])
    message:
        "{rule}: create a sequence dictionary file from {input}"
    wrapper:
        "0.79.0/bio/picard/createsequencedictionary"

input / output files¶

Rule parameters	Key	Value	Description
input	fasta	`config["reference"]["fasta"]`	reference genome fasta file
output	dict_file	`"qc/picard_create_sequence_dictionary/reference.dict"`	sequence dictionary file for the reference genome

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

export_qc_bedtools_intersect ¶

Intersects mosdepth per-base coverage data with an exon BED file to identify and export low-coverage regions across targeted exons.

Rule¶

rule export_qc_bedtools_intersect:
    input:
        left="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz",
        coverage_csi="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi",
        right=config["reference"]["exon_bed"],
    output:
        results=temp("qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed"),
    params:
        extra=config.get("export_qc_bedtools_intersect", {}).get("extra", ""),
    log:
        "qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.log",
    benchmark:
        repeat(
            "qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.benchmark.tsv",
            config.get("export_qc_bedtools_intersect", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("export_qc_bedtools_intersect", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("export_qc_bedtools_intersect", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("export_qc_bedtools_intersect", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("export_qc_bedtools_intersect", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("export_qc_bedtools_intersect", {}).get("container", config["default_container"])
    message:
        "{rule}: export low cov regions from {input.left} based on {input.right}"
    wrapper:
        "v1.32.0/bio/bedtools/intersect"

input / output files¶

Rule parameters	Key	Value	Description
input	left	`"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz"`	per-base coverage file from mosdepth
	coverage_csi	`"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi"`	index file for per-base.bed.gz file
_ _	right	`config["reference"]["exon_bed"]`	design bed used to only look at coverage based on bedfile
output	results	`"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed"`	bed file with coverage per base for design file

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

export_qc_bedtools_intersect_pgrs ¶

Similar to the above, but specifically intersects per-base coverage with a PGRS (polyguanosine-rich sequences) BED file to assess coverage in complex regions.

Rule¶

rule export_qc_bedtools_intersect_pgrs:
    input:
        left="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz",
        coverage_csi="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi",
        right=config["reference"]["pgrs_bed"],
    output:
        results=temp("qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed"),
    params:
        extra=config.get("export_qc_bedtools_intersect_pgrs", {}).get("extra", ""),
    log:
        "qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.log",
    benchmark:
        repeat(
            "qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.benchmark.tsv",
            config.get("export_qc_bedtools_intersect_pgrs", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("export_qc_bedtools_intersect_pgrs", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("export_qc_bedtools_intersect_pgrs", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("export_qc_bedtools_intersect_pgrs", {}).get(
            "mem_per_cpu", config["default_resources"]["mem_per_cpu"]
        ),
        partition=config.get("export_qc_bedtools_intersect_pgrs", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("export_qc_bedtools_intersect_pgrs", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("export_qc_bedtools_intersect_pgrs", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("export_qc_bedtools_intersect_pgrs", {}).get("container", config["default_container"])
    message:
        "{rule}: export low cov regions from {input.left} based on {input.right}"
    wrapper:
        "v1.32.0/bio/bedtools/intersect"

input / output files¶

Rule parameters	Key	Value	Description
input	left	`"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz"`	per-base coverage file from mosdepth
	coverage_csi	`"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi"`	index file for per-base.bed.gz file
_ _	right	`config["reference"]["pgrs_bed"]`	design bed used to only look at coverage based on bedfile, in this case pgrs positions
output	results	`"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed"`	bed file with coverage per base for design file

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

export_qc_xlsx_tc_report ¶

A comprehensive Python script that aggregates various QC metrics—including mosdepth summaries, coverage thresholds, and read statistics—into a single Excel (XLSX) report for Twist Cancer designs.

Rule¶

rule export_qc_xlsx_tc_report:
    input:
        mosdepth_summary="qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.summary.txt",
        mosdepth_thresholds="qc/mosdepth_bed_exon/{sample}_{type}.thresholds.bed.gz",
        mosdepth_regions="qc/mosdepth_bed_exon/{sample}_{type}.regions.bed.gz",
        mosdepth_perbase="qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed",
        pgrs_coverage="qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed",
        design_bed=config["reference"]["exon_bed"],
        pgrs_bed=config["reference"]["pgrs_bed"],
        reads_summary="qc/extract_read_metrics/{sample}_{type}.summary.tsv",
        wanted_transcripts=config["reference"]["wanted_transcripts"],
    output:
        results=temp("qc/xlsx_report/{sample}_{type}.xlsx"),
    params:
        extra=config.get("export_qc_xlsx_report", {}).get("extra", ""),
        coverage_thresholds=config["mosdepth_bed"]["thresholds"],
        sequenceid=config["sequenceid"],
    log:
        "qc/xlsx_report/{sample}_{type}.xlsx.log",
    benchmark:
        repeat(
            "qc/xlsx_report/{sample}_{type}.xlsx.benchmark.tsv",
            config.get("export_qc_xlsx_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("export_qc_xlsx_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("export_qc_xlsx_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("export_qc_xlsx_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("export_qc_xlsx_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("export_qc_xlsx_report", {}).get("container", config["default_container"])
    message:
        "{rule}: collecting qc values into {output} for the twist cancer design"
    script:
        "../scripts/export_qc_xlsx_tc_report.py"

input / output files¶

Rule parameters	Key	Value	Description
input	mosdepth_summary	`"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.summary.txt"`	mosdepth bed summary file
	mosdepth_thresholds	`"qc/mosdepth_bed_exon/{sample}_{type}.thresholds.bed.gz"`	Mosdepth bed thresholds file
	mosdepth_regions	`"qc/mosdepth_bed_exon/{sample}_{type}.regions.bed.gz"`	mosdepth bed coverage per region file
	mosdepth_perbase	`"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed"`	mosdepth bed per-base result file subsampled into exons in export_qc_bedtools_intersect output
	pgrs_coverage	`"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed"`	mosdepth per-base file from export_qc_bedtools_intersect_pgrs output
	design_bed	`config["reference"]["exon_bed"]`	design bed defined in config-file
	pgrs_bed	`config["reference"]["pgrs_bed"]`	bedfile with PGRS score SNPs
	reads_summary	`"qc/extract_read_metrics/{sample}_{type}.summary.tsv"`	reads summary stats with duplication rate (generated by calculate_read_metrics.py)
_ _	wanted_transcripts	`config["reference"]["wanted_transcripts"]`	path to txt-file in bedformat of transcripts of interest
output	results	`"qc/xlsx_report/{sample}_{type}.xlsx"`	.xlsx file with summarized QC-values per sample

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

extract_read_metrics ¶

Calculates and extracts summary statistics from BAM files, including read lengths, qualities, and duplicate counts to assess library quality.

Rule¶

rule extract_read_metrics:
    input:
        bam="prealignment/pbmarkdup/{sample}_{type}.bam",
    output:
        reads_metrics=temp("qc/extract_read_metrics/{sample}_{type}.metrics.tsv"),
        reads_summary=temp("qc/extract_read_metrics/{sample}_{type}.summary.tsv"),
    params:
        extra=config.get("extract_read_metrics", {}).get("extra", ""),
    log:
        "qc/extract_read_metrics/{sample}_{type}.metrics.log",
    benchmark:
        repeat(
            "extract_read_metrics/{sample}_{type}.metrics.benchmark.tsv",
            config.get("extract_read_metrics", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("extract_read_metrics", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("extract_read_metrics", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("extract_read_metrics", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("extract_read_metrics", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("extract_read_metrics", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("extract_read_metrics", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("extract_read_metrics", {}).get("container", config["default_container"])
    message:
        "{rule}: extract read metrics from {input.bam}"
    script:
        "../scripts/calculate_read_metrics.py"

input / output files¶

Rule parameters	Key	Value	Description
input	bam	`"prealignment/pbmarkdup/{sample}_{type}.bam"`	unmapped bam file with duplicates marked by pbmardup
output	reads_metrics	`"qc/extract_read_metrics/{sample}_{type}.metrics.tsv"`	tab-delimited file with summary stats for each read
_ _	reads_summary	`"qc/extract_read_metrics/{sample}_{type}.summary.tsv"`	tab-delimited file with summary stats over all the reads in reads_metrics

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

strdust ¶

Calls Short Tandem Repeats (STRs) from BAM files using the STRdust tool, guided by a reference FASTA and a catalog of repeat loci.

Rule¶

rule strdust:
    input:
        bam="alignment/minimap2_align/{sample}_{type}.bam",
        bai="alignment/minimap2_align/{sample}_{type}.bam.bai",
        fasta=config.get("reference", {}).get("fasta", ""),
        repeats=config.get("reference", {}).get("str_bed", ""),
    output:
        vcf="cnv_sv/strdust/{sample}_{type}.vcf",
    params:
        extra=config.get("strdust", {}).get("extra", ""),
        sample=lambda wildcards: f"{wildcards.sample}_{wildcards.type}",
    log:
        "cnv_sv/strdust/{sample}_{type}.output.log",
    benchmark:
        repeat("cnv_sv/strdust/{sample}_{type}.output.benchmark.tsv", config.get("strdust", {}).get("benchmark_repeats", 1))
    threads: config.get("strdust", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("strdust", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("strdust", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("strdust", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("strdust", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("strdust", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("strdust", {}).get("container", config["default_container"])
    message:
        "{rule}: Call STRs in {input.bam} using STRdust"
    shell:
        "STRdust "
        "{params.extra} "
        "-t {threads} "
        "-R {input.repeats} "
        "--sample {params.sample} "
        "{input.fasta} "
        "{input.bam} > {output.vcf} 2> {log}"

input / output files¶

Rule parameters	Key	Value	Description
input	bam	`"alignment/minimap2_align/{sample}_{type}.bam"`	bam file with alignend long-reads
	bai	`"alignment/minimap2_align/{sample}_{type}.bam.bai"`	index for the bam file
	fasta	`config.get("reference", {}).get("fasta", "")`	reference genome fasta file
_ _	repeats	`config.get("reference", {}).get("str_bed", "")`	bed file of str regions for strdust
output	vcf	`"cnv_sv/strdust/{sample}_{type}.vcf"`	vcf file with str calls from strdust

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

strkit_call ¶

An alternative STR caller that performs locus-specific genotyping from long-read BAM files using the strkit tool.

Rule¶

rule strkit_call:
    input:
        bam="alignment/minimap2_align/{sample}_{type}.bam",
        bai="alignment/minimap2_align/{sample}_{type}.bam.bai",
        fasta=config.get("reference", {}).get("fasta", ""),
        repeats=config.get("strkit_call", {}).get("bed", ""),
    output:
        vcf="cnv_sv/strkit_call/{sample}_{type}.vcf",
    params:
        extra=config.get("strkit_call", {}).get("extra", ""),
        sample_id=lambda wildcards: f"{wildcards.sample}_{wildcards.type}",
    log:
        "cnv_sv/strkit_call/{sample}_{type}.vcf.log",
    benchmark:
        repeat(
            "cnv_sv/strkit_call/{sample}_{type}.output.benchmark.tsv", config.get("strkit_call", {}).get("benchmark_repeats", 1)
        )
    threads: config.get("strkit_call", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("strkit_call", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("strkit_call", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("strkit_call", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("strkit_call", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("strkit_call", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("strkit_call", {}).get("container", config["default_container"])
    message:
        "{rule}: Call STRs in {input.bam} using strkit"
    shell:
        "strkit call "
        "{input.bam} "
        "--ref {input.fasta} "
        "--sample-id {params.sample_id} "
        "--loci {input.repeats} "
        "--vcf {output.vcf} "
        "--processes {threads} &> {log}"

input / output files¶

Rule parameters	Key	Value	Description
input	bam	`"alignment/minimap2_align/{sample}_{type}.bam"`	bam file with alignend long-reads
	bai	`"alignment/minimap2_align/{sample}_{type}.bam.bai"`	index for the bam file
	fasta	`config.get("reference", {}).get("fasta", "")`	reference genome fasta file
_ _	repeats	`config.get("strkit_call", {}).get("bed", "")`	bed file of str regions for strkit
output	vcf	`"cnv_sv/strkit_call/{sample}_{type}.vcf"`	vcf file with str calls from strkit

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

add_vcf_ref ¶

A utility script that adds reference genome information to the header of a VCF file, ensuring compatibility with downstream tools that require the reference path.

Rule¶

rule add_vcf_ref:
    input:
        vcf="{file}.vcf.gz",
        ref=config["reference"]["fasta"],
    output:
        vcf="{file}_GRCh38.vcf",
    params:
        extra=config.get("add_vcf_ref", {}).get("extra", ""),
    log:
        "fada/add_vcf_ref/{file}_GRCh38.vcf.gz.log",
    benchmark:
        repeat("fada/add_vcf_ref/{file}_GRCh38.vcf.gz.benchmark.tsv", config.get("add_vcf_ref", {}).get("benchmark_repeats", 1))
    threads: config.get("add_vcf_ref", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("add_vcf_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("add_vcf_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("add_vcf_ref", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("add_vcf_ref", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("add_vcf_ref", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("add_vcf_ref", {}).get("container", config["default_container"])
    message:
        "{rule}: Add reference to the header of the {input.vcf}"
    script:
        "../scripts/add_ref.py"

input / output files¶

Rule parameters	Key	Value	Description
input	vcf	`"{file}.vcf.gz"`	Input vcf file
_ _	ref	`config["reference"]["fasta"]`	reference fasta file path
output	vcf	`"{file}_GRCh38.vcf"`	vcf with reference genome path in vcf-header

Configuration¶

Software settings (`config.yaml`)¶

Key	Type	Description
benchmark_repeats	integer	set number of times benchmark should be repeated
container	string	name or path to docker/singularity container
extra	string	parameters that should be forwarded

Resources settings (`resources.yaml`)¶

Key	Type	Description
mem_mb	integer	max memory in MB to be available
mem_per_cpu	integer	memory in MB used per cpu
partition	string	partition to use on cluster
threads	integer	number of threads to be available
time	string	max execution time

Softwares used in the fada module¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Rule¶

input / output files¶

Configuration¶

Software settings (config.yaml)¶

Resources settings (resources.yaml)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶

Software settings (`config.yaml`)¶

Resources settings (`resources.yaml`)¶