Debugging Workflows
Last updated on 2024-11-26 | Edit this page
Overview
Questions
“How can I check my CWL file for errors?”
“How can I get more information to help with solving an error?”
“What are some common error messages when using CWL?”
Objectives
Check a CWL file for errors
Output debugging information
Interpret and fix commonly encountered error messages keypoints:
Run the workflow with the
--validate
option to check for errorsThe
--debug
option will output more information‘Wiring’ errors won’t necessarily yield an error message
A Firm Reality Check
When working on a CWL workflow, you will probably encounter errors. There are many different ways for errors to occur.
It is always very important to check the error message in the terminal, because it will give you information on the error. This error message will give you the type of error as well as the line of code that contains the error.
We will showcase some of the common errors in this episode.
As a first step to check if your CWL script contains any errors, you
can run the workflow with the --validate
flag.
It is possible for a valid script to still generate an error.
If you encounter an error, the best practice is to re-run the
workflow with the --debug
flag. This will provide you with
extensive information on the error you encounter.
Syntax Errors
When writing a piece of code, it is very easy to make a mistake in your YAML syntax.
Some very common YAML errors are:
Tabs
Using tabs instead of spaces. In YAML files indentations are made using spaces, not tabs. Please download and run this example which includes a tab character.
ERROR Tool definition failed validation:
while scanning for the next token
file:///tab-error.cwl:5:1: found character '\t' that cannot start any token
Field Name Typos
Typos in field names. It is very easy to forget for example the capital letters in field names.
Errors with typos in field names will show
invalid field
.
rna_seq_workflow_fieldname_fail.cwl
YAML
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: File
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputsource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
Validate command
ERROR Tool definition failed validation:
rna_seq_workflow_fieldname_fail.cwl:1:1: Object `rna_seq_workflow_fieldname_fail.cwl` is not valid
because
tried `Workflow` but
rna_seq_workflow_fieldname_fail.cwl:35:1: the `outputs` field is not valid because
rna_seq_workflow_fieldname_fail.cwl:36:3: item is invalid because
rna_seq_workflow_fieldname_fail.cwl:38:5: invalid field `outputsource`, expected one of:
'label', 'secondaryFiles', 'streamable', 'doc', 'id',
'format', 'outputSource', 'linkMerge', 'pickValue', 'type'
IDEs are your friend
Using an IDE can help warn of incorrect fields before needing to validate via the command-line tool
Variable Name Typos
Typos in variable names.
Similar to typos in field names, it is easy to make a mistake in
referencing to a variable.
These errors will show
Field references unknown identifier.
rna_seq_workflow_varname_fail.cwl
YAML
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: File
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignments
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputSource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
Validate command
ERROR Tool definition failed validation:
rna_seq_workflow_varname_fail.cwl:8:1: checking field `steps`
rna_seq_workflow_varname_fail.cwl:29:3: checking object
`rna_seq_workflow_varname_fail.cwl#index_alignment`
rna_seq_workflow_varname_fail.cwl:31:5: checking field `in`
rna_seq_workflow_varname_fail.cwl:32:7: checking object
`rna_seq_workflow_varname_fail.cwl#index_alignment/bam_sorted`
Field `source` references unknown identifier
`mapping_reads/alignments`, tried
file:///.../rna_seq_workflow_varname_fail.cwl#mapping_reads/alignments
IDEs are your friend
Using an IDE, a simple Ctrl+F on a variable can help you see where that variable is present throughout a CWL code. Only one occurence of a variable might mean it has been spelt differently elsewhere.
Wiring error
Wiring errors often occur when you forget to add an output from a
workflow’s step to the outputs
section.
This doesn’t cause an error message, but there won’t be any output in your directory. To get the desired output you have to run the workflow again.
Best practice is to check your outputs
section before
running your script to make sure all the outputs you want are there.
Output Collections
All file / directory outputs of a workflow or tool will be placed into a single directory.
Ensure all expected files and directories are there. For string / boolean output, splitting stderr and stdout of the cwltool commandline into separate files allows the user to easily look through the stdout of a workflow without needing the noise of stderr. Redirects are briefly discussed in the second episode of this tutorial.
Type mismatch
Type errors take place when there is a mismatch in type between
variables. When you declare a variable in the inputs
section, the type of this variable has to match the type in the YAML
inputs file and the type used in one of the workflows steps. The error
message that is shown when this error occurs will tell you on which line
the mismatch happens.
rna_seq_workflow_type_fail.cwl
YAML
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: int
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputSource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
Validation Command
rna_seq_workflow_type_fail.cwl:5:3: Source 'rna_reads_fruitfly' of type "int" is incompatible
rna_seq_workflow_type_fail.cwl:12:7: with sink 'reads_file' of type "File"
rna_seq_workflow_type_fail.cwl:5:3: Source 'rna_reads_fruitfly' of type "int" is incompatible
rna_seq_workflow_type_fail.cwl:23:7: with sink 'ForwardReads' of type ["File", {"type":
"array", "items": "File"}]
Format error
Some files need a specific format that needs to be specified in the YAML inputs file, for example the fastq file in the RNA-seq analysis.
When you don’t specify a format, an error will occur. You can for example use the EDAM ontology.
Format requirements
The format attribute for a File entry is only required if the format attribute is specified on the workflow input.
You may use
cwltool --make-template /path/to/cwl_workflow.cwl
to set
the formats for each input for you.
rna_seq_workflow_with_format.cwl
YAML
cwlVersion: v1.2
class: Workflow
inputs:
rna_reads_fruitfly: File
ref_fruitfly_genome: Directory
steps:
quality_control:
run: bio-cwl-tools/fastqc/fastqc_2.cwl
in:
reads_file: rna_reads_fruitfly
out: [html_file]
mapping_reads:
requirements:
ResourceRequirement:
ramMin: 5120
run: bio-cwl-tools/STAR/STAR-Align.cwl
in:
RunThreadN: {default: 4}
GenomeDir: ref_fruitfly_genome
ForwardReads: rna_reads_fruitfly
OutSAMtype: {default: BAM}
SortedByCoordinate: {default: true}
OutSAMunmapped: {default: Within}
out: [alignment]
index_alignment:
run: bio-cwl-tools/samtools/samtools_index.cwl
in:
bam_sorted: mapping_reads/alignment
out: [bam_sorted_indexed]
outputs:
qc_html:
type: File
outputSource: quality_control/html_file
bam_sorted_indexed:
type: File
outputSource: index_alignment/bam_sorted_indexed
workflow_input_undefined_format.yaml
YAML
rna_reads_fruitfly:
class: File
location: rnaseq/GSM461177_1_subsampled.fastqsanger
ref_fruitfly_genome:
class: Directory
location: rnaseq/dm6-STAR-index
ERROR Exception on step 'mapping_reads'
ERROR [step mapping_reads] Cannot make job: Expected value of 'ForwardReads' to have format http://edamontology.org/format_1930 but
File has no 'format' defined: {
"class": "File",
"location": "file:///.../rnaseq/GSM461177_1_subsampled.fastqsanger",
"size": 142867948,
"basename": "GSM461177_1_subsampled.fastqsanger",
"nameroot": "GSM461177_1_subsampled",
"nameext": ".fastqsanger"
}