Softwares used in the fada module
Exports CNVs that include the majority of the target gene from a VCF based on a design BED file, effectively filtering CNVs by genomic intervals of interest.
Rule
rule bedtools_intersect_cnvkit:
input:
left="cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.CNVS.vcf.gz",
right=config.get("reference", {}).get("design_genes_bed", ""),
output:
vcf="cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf",
params:
extra=f"-F 0.6 -u -header {config.get('bedtools_intersect_cnvkit', {}).get('extra', '')}",
log:
"cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf.log",
benchmark:
repeat(
"cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf.benchmark.tsv",
config.get("bedtools_intersect_cnvkit", {}).get("benchmark_repeats", 1),
)
threads: config.get("bedtools_intersect_cnvkit", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("bedtools_intersect_cnvkit", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("bedtools_intersect_cnvkit", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("bedtools_intersect_cnvkit", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("bedtools_intersect_cnvkit", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("bedtools_intersect_cnvkit", {}).get("time", config["default_resources"]["time"]),
container:
config.get("bedtools_intersect_cnvkit", {}).get("container", config["default_container"])
message:
"{rule}: export CNVS that include the majority of the target gene from {input.left} based on {input.right}"
wrapper:
"v1.32.0/bio/bedtools/intersect"
| Rule parameters |
Key |
Value |
Description |
| input |
left |
"cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.CNVS.vcf.gz" |
CNVkit vcf with all CNVS |
| _ _ |
right |
config.get("reference", {}).get("design_genes_bed", "") |
bed file with target genes |
| output |
vcf |
"cnv_sv/cnvkit_vcf/{sample}_{type}.annotate_cnv.refseq_genes.bcftools_view.gene.CNVS.vcf" |
CNVkit vcf with only CNVS that include the whole target gene |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Segments copy number ratio data (log2 bins) using CNVkit, incorporating SNP information for more accurate segmentation and germline HMM modeling.
Rule
rule cnvkit_segment:
input:
log2_bins="cnv_sv/cnvkit_batch/{sample}/{sample}_{type}.cnr",
snps="snv_indels/deepvariant/{sample}_{type}.fix_af.bcftools_view.SNPS.vcf.gz",
output:
segment=temp("cnv_sv/cnvkit_segment/{sample}_{type}.cns"),
params:
extra=config.get("cnvkit_segment", {}).get("extra", ""),
method=config.get("cnvkit_segment", {}).get("method", "hmm-germline"),
log:
"cnv_sv/cnvkit_segment/{sample}_{type}.segmetrics.cns.log",
benchmark:
repeat(
"cnv_sv/cnvkit_segment/{sample}_{type}.segmetrics.cns.benchmark.tsv",
config.get("cnvkit_segment", {}).get("benchmark_repeats", 1),
)
threads: config.get("cnvkit_segment", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("cnvkit_segment", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("cnvkit_segment", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("cnvkit_segment", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("cnvkit_segment", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("cnvkit_segment", {}).get("time", config["default_resources"]["time"]),
container:
config.get("cnvkit_segment", {}).get("container", config["default_container"])
message:
"{rule}: segment {input.log2_bins} with cnvkit using SNPs from {input.snps}"
shell:
"cnvkit.py segment "
"-v {input.snps} "
"-m {params.method} "
"-o {output.segment} "
"{params.extra} {input.log2_bins} > {log} 2>&1 "
| Rule parameters |
Key |
Value |
Description |
| input |
log2_bins |
"cnv_sv/cnvkit_batch/{sample}/{sample}_{type}.cnr" |
Bin-level log2 ratios file from cnvkit |
| _ _ |
snps |
"snv_indels/deepvariant/{sample}_{type}.fix_af.bcftools_view.SNPS.vcf.gz" |
SNPs vcf file to use for segmentation |
| output |
segment |
"cnv_sv/cnvkit_segment/{sample}_{type}.cns" |
Segmented log2 ratios file from cnvkit |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Converts a genomic design BED file into a Picard-style interval list, using the reference FASTA and sequence dictionary for header information.
Rule
rule picard_bed_to_interval_list:
input:
bed=config.get("reference", {}).get("design_bed", ""),
reference=config["reference"]["fasta"],
dict="qc/picard_create_sequence_dictionary/reference.dict",
output:
interval_list=temp("qc/picard_bed_to_interval_list/targets.interval_list"),
params:
extra=config.get("picard_bed_to_interval_list", {}).get("extra", ""),
log:
"qc/picard_bed_to_interval_list/targets.interval_list.log",
benchmark:
repeat(
"qc/picard_bed_to_interval_list/targets.interval_list.benchmark.tsv",
config.get("picard_bed_to_interval_list", {}).get("benchmark_repeats", 1),
)
threads: config.get("picard_bed_to_interval_list", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("picard_bed_to_interval_list", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("picard_bed_to_interval_list", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("picard_bed_to_interval_list", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("picard_bed_to_interval_list", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("picard_bed_to_interval_list", {}).get("time", config["default_resources"]["time"]),
container:
config.get("picard_bed_to_interval_list", {}).get("container", config["default_container"])
message:
"{rule}: create interval_list file from {input.bed}"
wrapper:
"0.79.0/bio/picard/bedtointervallist"
| Rule parameters |
Key |
Value |
Description |
| input |
bed |
config.get("reference", {}).get("design_bed", "") |
design bed file (aka. target bed file) |
|
reference |
config["reference"]["fasta"] |
reference genome fasta file |
| _ _ |
dict |
"qc/picard_create_sequence_dictionary/reference.dict" |
sequence dictionary file for the reference genome |
| output |
interval_list |
"qc/picard_bed_to_interval_list/targets.interval_list" |
picard interval_list file for design regions |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Generates a sequence dictionary (SAM/BAM header format) from a reference FASTA file, required by various Picard and GATK tools.
Rule
rule picard_create_sequence_dictionary:
input:
fasta=config["reference"]["fasta"],
output:
dict_file=temp("qc/picard_create_sequence_dictionary/reference.dict"),
params:
extra=config.get("picard_create_sequence_dictionary", {}).get("extra", ""),
log:
"qc/picard_create_sequence_dictionary/reference.dict.log",
benchmark:
repeat(
"qc/picard_create_sequence_dictionary/reference.dict.benchmark.tsv",
config.get("picard_create_sequence_dictionary", {}).get("benchmark_repeats", 1),
)
threads: config.get("picard_create_sequence_dictionary", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("picard_create_sequence_dictionary", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("picard_create_sequence_dictionary", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("picard_create_sequence_dictionary", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("picard_create_sequence_dictionary", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("picard_create_sequence_dictionary", {}).get("time", config["default_resources"]["time"]),
container:
config.get("picard_create_sequence_dictionary", {}).get("container", config["default_container"])
message:
"{rule}: create a sequence dictionary file from {input}"
wrapper:
"0.79.0/bio/picard/createsequencedictionary"
| Rule parameters |
Key |
Value |
Description |
| input |
fasta |
config["reference"]["fasta"] |
reference genome fasta file |
| output |
dict_file |
"qc/picard_create_sequence_dictionary/reference.dict" |
sequence dictionary file for the reference genome |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Intersects mosdepth per-base coverage data with an exon BED file to identify and export low-coverage regions across targeted exons.
Rule
rule export_qc_bedtools_intersect:
input:
left="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz",
coverage_csi="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi",
right=config["reference"]["exon_bed"],
output:
results=temp("qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed"),
params:
extra=config.get("export_qc_bedtools_intersect", {}).get("extra", ""),
log:
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.log",
benchmark:
repeat(
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.benchmark.tsv",
config.get("export_qc_bedtools_intersect", {}).get("benchmark_repeats", 1),
)
threads: config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("export_qc_bedtools_intersect", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("export_qc_bedtools_intersect", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("export_qc_bedtools_intersect", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("export_qc_bedtools_intersect", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("export_qc_bedtools_intersect", {}).get("time", config["default_resources"]["time"]),
container:
config.get("export_qc_bedtools_intersect", {}).get("container", config["default_container"])
message:
"{rule}: export low cov regions from {input.left} based on {input.right}"
wrapper:
"v1.32.0/bio/bedtools/intersect"
| Rule parameters |
Key |
Value |
Description |
| input |
left |
"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz" |
per-base coverage file from mosdepth |
|
coverage_csi |
"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi" |
index file for per-base.bed.gz file |
| _ _ |
right |
config["reference"]["exon_bed"] |
design bed used to only look at coverage based on bedfile |
| output |
results |
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed" |
bed file with coverage per base for design file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Similar to the above, but specifically intersects per-base coverage with a PGRS (polyguanosine-rich sequences) BED file to assess coverage in complex regions.
Rule
rule export_qc_bedtools_intersect_pgrs:
input:
left="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz",
coverage_csi="qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi",
right=config["reference"]["pgrs_bed"],
output:
results=temp("qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed"),
params:
extra=config.get("export_qc_bedtools_intersect_pgrs", {}).get("extra", ""),
log:
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.log",
benchmark:
repeat(
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.benchmark.tsv",
config.get("export_qc_bedtools_intersect_pgrs", {}).get("benchmark_repeats", 1),
)
threads: config.get("export_qc_bedtools_intersect_pgrs", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("export_qc_bedtools_intersect_pgrs", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("export_qc_bedtools_intersect_pgrs", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("export_qc_bedtools_intersect_pgrs", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("export_qc_bedtools_intersect_pgrs", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("export_qc_bedtools_intersect_pgrs", {}).get("time", config["default_resources"]["time"]),
container:
config.get("export_qc_bedtools_intersect_pgrs", {}).get("container", config["default_container"])
message:
"{rule}: export low cov regions from {input.left} based on {input.right}"
wrapper:
"v1.32.0/bio/bedtools/intersect"
| Rule parameters |
Key |
Value |
Description |
| input |
left |
"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz" |
per-base coverage file from mosdepth |
|
coverage_csi |
"qc/mosdepth_bed_exon/{sample}_{type}.per-base.bed.gz.csi" |
index file for per-base.bed.gz file |
| _ _ |
right |
config["reference"]["pgrs_bed"] |
design bed used to only look at coverage based on bedfile, in this case pgrs positions |
| output |
results |
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed" |
bed file with coverage per base for design file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
A comprehensive Python script that aggregates various QC metrics—including mosdepth summaries, coverage thresholds, and read statistics—into a single Excel (XLSX) report for Twist Cancer designs.
Rule
rule export_qc_xlsx_tc_report:
input:
mosdepth_summary="qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.summary.txt",
mosdepth_thresholds="qc/mosdepth_bed_exon/{sample}_{type}.thresholds.bed.gz",
mosdepth_regions="qc/mosdepth_bed_exon/{sample}_{type}.regions.bed.gz",
mosdepth_perbase="qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed",
pgrs_coverage="qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed",
design_bed=config["reference"]["exon_bed"],
pgrs_bed=config["reference"]["pgrs_bed"],
reads_summary="qc/extract_read_metrics/{sample}_{type}.summary.tsv",
wanted_transcripts=config["reference"]["wanted_transcripts"],
output:
results=temp("qc/xlsx_report/{sample}_{type}.xlsx"),
params:
extra=config.get("export_qc_xlsx_report", {}).get("extra", ""),
coverage_thresholds=config["mosdepth_bed"]["thresholds"],
sequenceid=config["sequenceid"],
log:
"qc/xlsx_report/{sample}_{type}.xlsx.log",
benchmark:
repeat(
"qc/xlsx_report/{sample}_{type}.xlsx.benchmark.tsv",
config.get("export_qc_xlsx_report", {}).get("benchmark_repeats", 1),
)
threads: config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("export_qc_xlsx_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("export_qc_xlsx_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("export_qc_xlsx_report", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("export_qc_xlsx_report", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("export_qc_xlsx_report", {}).get("time", config["default_resources"]["time"]),
container:
config.get("export_qc_xlsx_report", {}).get("container", config["default_container"])
message:
"{rule}: collecting qc values into {output} for the twist cancer design"
script:
"../scripts/export_qc_xlsx_tc_report.py"
| Rule parameters |
Key |
Value |
Description |
| input |
mosdepth_summary |
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.summary.txt" |
mosdepth bed summary file |
|
mosdepth_thresholds |
"qc/mosdepth_bed_exon/{sample}_{type}.thresholds.bed.gz" |
Mosdepth bed thresholds file |
|
mosdepth_regions |
"qc/mosdepth_bed_exon/{sample}_{type}.regions.bed.gz" |
mosdepth bed coverage per region file |
|
mosdepth_perbase |
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.per-base.exon_bed.bed" |
mosdepth bed per-base result file subsampled into exons in export_qc_bedtools_intersect output |
|
pgrs_coverage |
"qc/mosdepth_bed_exon/{sample}_{type}.mosdepth.pgrs_cov.bed" |
mosdepth per-base file from export_qc_bedtools_intersect_pgrs output |
|
design_bed |
config["reference"]["exon_bed"] |
design bed defined in config-file |
|
pgrs_bed |
config["reference"]["pgrs_bed"] |
bedfile with PGRS score SNPs |
|
reads_summary |
"qc/extract_read_metrics/{sample}_{type}.summary.tsv" |
reads summary stats with duplication rate (generated by calculate_read_metrics.py) |
| _ _ |
wanted_transcripts |
config["reference"]["wanted_transcripts"] |
path to txt-file in bedformat of transcripts of interest |
| output |
results |
"qc/xlsx_report/{sample}_{type}.xlsx" |
.xlsx file with summarized QC-values per sample |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Calculates and extracts summary statistics from BAM files, including read lengths, qualities, and duplicate counts to assess library quality.
Rule
rule extract_read_metrics:
input:
bam="prealignment/pbmarkdup/{sample}_{type}.bam",
output:
reads_metrics=temp("qc/extract_read_metrics/{sample}_{type}.metrics.tsv"),
reads_summary=temp("qc/extract_read_metrics/{sample}_{type}.summary.tsv"),
params:
extra=config.get("extract_read_metrics", {}).get("extra", ""),
log:
"qc/extract_read_metrics/{sample}_{type}.metrics.log",
benchmark:
repeat(
"extract_read_metrics/{sample}_{type}.metrics.benchmark.tsv",
config.get("extract_read_metrics", {}).get("benchmark_repeats", 1),
)
threads: config.get("extract_read_metrics", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("extract_read_metrics", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("extract_read_metrics", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("extract_read_metrics", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("extract_read_metrics", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("extract_read_metrics", {}).get("time", config["default_resources"]["time"]),
container:
config.get("extract_read_metrics", {}).get("container", config["default_container"])
message:
"{rule}: extract read metrics from {input.bam}"
script:
"../scripts/calculate_read_metrics.py"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"prealignment/pbmarkdup/{sample}_{type}.bam" |
unmapped bam file with duplicates marked by pbmardup |
| output |
reads_metrics |
"qc/extract_read_metrics/{sample}_{type}.metrics.tsv" |
tab-delimited file with summary stats for each read |
| _ _ |
reads_summary |
"qc/extract_read_metrics/{sample}_{type}.summary.tsv" |
tab-delimited file with summary stats over all the reads in reads_metrics |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Calls Short Tandem Repeats (STRs) from BAM files using the STRdust tool, guided by a reference FASTA and a catalog of repeat loci.
Rule
rule strdust:
input:
bam="alignment/minimap2_align/{sample}_{type}.bam",
bai="alignment/minimap2_align/{sample}_{type}.bam.bai",
fasta=config.get("reference", {}).get("fasta", ""),
repeats=config.get("reference", {}).get("str_bed", ""),
output:
vcf="cnv_sv/strdust/{sample}_{type}.vcf",
params:
extra=config.get("strdust", {}).get("extra", ""),
sample=lambda wildcards: f"{wildcards.sample}_{wildcards.type}",
log:
"cnv_sv/strdust/{sample}_{type}.output.log",
benchmark:
repeat("cnv_sv/strdust/{sample}_{type}.output.benchmark.tsv", config.get("strdust", {}).get("benchmark_repeats", 1))
threads: config.get("strdust", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("strdust", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("strdust", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("strdust", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("strdust", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("strdust", {}).get("time", config["default_resources"]["time"]),
container:
config.get("strdust", {}).get("container", config["default_container"])
message:
"{rule}: Call STRs in {input.bam} using STRdust"
shell:
"STRdust "
"{params.extra} "
"-t {threads} "
"-R {input.repeats} "
"--sample {params.sample} "
"{input.fasta} "
"{input.bam} > {output.vcf} 2> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/minimap2_align/{sample}_{type}.bam" |
bam file with alignend long-reads |
|
bai |
"alignment/minimap2_align/{sample}_{type}.bam.bai" |
index for the bam file |
|
fasta |
config.get("reference", {}).get("fasta", "") |
reference genome fasta file |
| _ _ |
repeats |
config.get("reference", {}).get("str_bed", "") |
bed file of str regions for strdust |
| output |
vcf |
"cnv_sv/strdust/{sample}_{type}.vcf" |
vcf file with str calls from strdust |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
An alternative STR caller that performs locus-specific genotyping from long-read BAM files using the strkit tool.
Rule
rule strkit_call:
input:
bam="alignment/minimap2_align/{sample}_{type}.bam",
bai="alignment/minimap2_align/{sample}_{type}.bam.bai",
fasta=config.get("reference", {}).get("fasta", ""),
repeats=config.get("strkit_call", {}).get("bed", ""),
output:
vcf="cnv_sv/strkit_call/{sample}_{type}.vcf",
params:
extra=config.get("strkit_call", {}).get("extra", ""),
sample_id=lambda wildcards: f"{wildcards.sample}_{wildcards.type}",
log:
"cnv_sv/strkit_call/{sample}_{type}.vcf.log",
benchmark:
repeat(
"cnv_sv/strkit_call/{sample}_{type}.output.benchmark.tsv", config.get("strkit_call", {}).get("benchmark_repeats", 1)
)
threads: config.get("strkit_call", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("strkit_call", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("strkit_call", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("strkit_call", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("strkit_call", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("strkit_call", {}).get("time", config["default_resources"]["time"]),
container:
config.get("strkit_call", {}).get("container", config["default_container"])
message:
"{rule}: Call STRs in {input.bam} using strkit"
shell:
"strkit call "
"{input.bam} "
"--ref {input.fasta} "
"--sample-id {params.sample_id} "
"--loci {input.repeats} "
"--vcf {output.vcf} "
"--processes {threads} &> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
bam |
"alignment/minimap2_align/{sample}_{type}.bam" |
bam file with alignend long-reads |
|
bai |
"alignment/minimap2_align/{sample}_{type}.bam.bai" |
index for the bam file |
|
fasta |
config.get("reference", {}).get("fasta", "") |
reference genome fasta file |
| _ _ |
repeats |
config.get("strkit_call", {}).get("bed", "") |
bed file of str regions for strkit |
| output |
vcf |
"cnv_sv/strkit_call/{sample}_{type}.vcf" |
vcf file with str calls from strkit |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
A utility script that adds reference genome information to the header of a VCF file, ensuring compatibility with downstream tools that require the reference path.
Rule
rule add_vcf_ref:
input:
vcf="{file}.vcf.gz",
ref=config["reference"]["fasta"],
output:
vcf="{file}_GRCh38.vcf",
params:
extra=config.get("add_vcf_ref", {}).get("extra", ""),
log:
"fada/add_vcf_ref/{file}_GRCh38.vcf.gz.log",
benchmark:
repeat("fada/add_vcf_ref/{file}_GRCh38.vcf.gz.benchmark.tsv", config.get("add_vcf_ref", {}).get("benchmark_repeats", 1))
threads: config.get("add_vcf_ref", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("add_vcf_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("add_vcf_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("add_vcf_ref", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("add_vcf_ref", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("add_vcf_ref", {}).get("time", config["default_resources"]["time"]),
container:
config.get("add_vcf_ref", {}).get("container", config["default_container"])
message:
"{rule}: Add reference to the header of the {input.vcf}"
script:
"../scripts/add_ref.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"{file}.vcf.gz" |
Input vcf file |
| _ _ |
ref |
config["reference"]["fasta"] |
reference fasta file path |
| output |
vcf |
"{file}_GRCh38.vcf" |
vcf with reference genome path in vcf-header |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |