Resources for Reusing Tools and Scripts
Last updated on 2024-11-26 | Edit this page
Overview
Questions
- How to find other solutions/CWL recipes for awkward problems?
Objectives
- Know good resources for finding solutions to common problems
Pre-written tool descriptions
When you start a CWL workflow, it is recommended to check if there is already a CWL document available for the tools you want to use. Bio-cwl-tools is a library of CWL documents for biology/life-sciences related tools.
The CWL documents of the previous steps were already provided for you, however, you can also find them in this library. In this episode you will use the bio-cwl-tools library to add the last step to the workflow.
Adding new step in workflow
The last step of our workflow is counting the RNA-seq reads for which
we will use the featureCounts
tool.
Find the featureCounts tool in the bio-cwl-tools library 🌶
Find the featureCounts
tool in the bio-cwl-tools
library. Have a look at the CWL document. Which inputs does this
tool need? And what are the outputs of this tool?
The featureCounts
CWL document can be found in the GitHub
repo.
It has three inputs: - annotations - A GTF or GFF file containing the gene annotations - mapped_reads - A BAM file containing the mapped reads - reads_are_paired - A boolean value. These inputs can be found on lines 6, 9, and 12.
The output of this tool is a file called featurecounts
(line 27).
Appending the featureCounts step to the workflow
We need a local copy of featureCounts
in order to use it
in our workflow.
We already imported this as a git submodule during setup, so the tool
should be located at
bio-cwl-tools/subread/featureCounts.cwl
.
Add the featureCounts tool to the workflow 🌶🌶
Please copy the rna_seq_workflow_2.cwl
file to create
rna_seq_workflow_3.cwl
.
Add the featureCounts
tool to the workflow as a workflow
step.
Bonus: 🌶🌶🌶
Similar to the STAR
tool, this tool also needs more RAM
than the default.
Update the RAM used to run the tool without editing the commandlinetool
Use the run
field to add the featureCounts
tool as a step in the workflow.
Use a requirements
entry with
ResourceRequirement
to allocate a ramMin
of
500.
YAML
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly_forward:
type: File
format: http://edamontology.org/format_1930 # FASTQ
rna_reads_fruitfly_reverse:
type: File
format: http://edamontology.org/format_1930 # FASTQ
ref_fruitfly_genome: Directory
fruitfly_gene_model: File
steps:
quality_control_forward:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly_forward
out: [html_file]
quality_control_reverse:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly_reverse
out: [html_file]
trim_low_quality_bases:
run: bio-cwl-tools/cutadapt/cutadapt-paired.cwl
in:
reads_1: rna_reads_fruitfly_forward
reads_2: rna_reads_fruitfly_reverse
minimum_length: { default: 20 }
quality_cutoff: { default: 20 }
out: [ trimmed_reads_1, trimmed_reads_2, report ]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: trim_low_quality_bases/trimmed_reads_1
ReverseReads: trim_low_quality_bases/trimmed_reads_2
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
Overhang: { default: 36 } # the length of the reads - 1
Gtf: fruitfly_gene_model
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
count_reads:
requirements:
ResourceRequirement:
ramMin: 500
run: bio-cwl-tools/subread/featureCounts.cwl
in:
mapped_reads: index_alignment/bam_sorted_indexed
annotations: fruitfly_gene_model
reads_are_paired: {default: true}
out: [featurecounts]
outputs:
quality_report_forward:
type: File
outputSource: quality_control_forward/html_file
quality_report_reverse:
type: File
outputSource: quality_control_reverse/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
featurecounts:
type: File
outputSource: count_reads/featurecounts
Running the new workflow
The workflow is complete and we only need to complete the YAML input file.
Copy the workflow_input_2.yml
file to
workflow_input_3.yml
, and add the last entry in the input
file, which is the fruitfly_gene_model
file.
YAML
rna_reads_fruitfly_forward:
class: File
location: rnaseq/GSM461177_1_subsampled.fastqsanger
format: http://edamontology.org/format_1930 # FASTQ
rna_reads_fruitfly_reverse:
class: File
location: rnaseq/GSM461177_2_subsampled.fastqsanger
format: http://edamontology.org/format_1930 # FASTQ
ref_fruitfly_genome:
class: Directory
location: rnaseq/dm6-STAR-index
fruitfly_gene_model:
class: File
location: rnaseq/Drosophila_melanogaster.BDGP6.87.gtf
format: http://edamontology.org/format_2306
Prerequisite
You have finished the workflow and the input file and now you can run the whole workflow.
Key Points
- bio-cwl-tools is a library of CWL documents for biology/life-sciences related tools