10 Workflow Definition Language (WDL) Best Practices
WDL is a powerful language for creating workflows, but it can be tricky to use. Here are 10 best practices to help you get the most out of it.
WDL is a powerful language for creating workflows, but it can be tricky to use. Here are 10 best practices to help you get the most out of it.
Workflow Definition Language (WDL) is a domain-specific language used to define workflows for scientific analysis. WDL is used to define workflows that can be executed on a variety of platforms, including cloud computing and high-performance computing (HPC) clusters.
Creating a workflow in WDL requires a good understanding of the language and its syntax. In this article, we will discuss 10 best practices for creating WDL workflows to ensure that your workflow is optimized for performance and reliability.
Defining all input and output parameters in the WDL helps to ensure that the workflow is reproducible. By explicitly declaring what data will be used as inputs, and what outputs are expected from each task, it becomes easier for users to understand how a workflow works and to replicate it if needed. This also makes it easier to debug any issues with the workflow since the user can quickly identify which tasks have incorrect inputs or outputs.
Additionally, defining all input and output parameters in the WDL allows for better scalability of workflows. By specifying exactly what data needs to be passed between tasks, users can easily add more tasks to the workflow without having to worry about compatibility issues. This makes it much simpler to scale up a workflow when additional processing power is needed.
Hard coding values in WDL files can lead to a number of issues. For example, if the value is changed or updated, it must be manually updated in all places where it appears in the code. This can be time consuming and error prone. Additionally, hard coded values are not easily shared between different workflows, making them difficult to reuse.
To avoid these problems, WDL provides several ways to parameterize values. The most common way is to use variables. Variables allow users to define a single value once and then reference that value throughout the workflow. This makes it easy to update the value in one place and have it automatically applied everywhere else. Variables also make it easier to share values across multiple workflows.
WDL also supports input parameters, which allow users to pass values into a workflow from outside sources. Input parameters provide an additional layer of flexibility by allowing users to specify values at runtime instead of having to hard code them in the WDL file.
Parameter substitution allows for the dynamic passing of values into tasks. This means that instead of hard-coding a value, such as a file path or an integer, you can pass in a parameter which will be replaced with the actual value when the workflow is run. This makes it easier to modify and reuse workflows since the same WDL script can be used for different inputs without having to manually change any code.
Using parameter substitution also helps keep your WDL scripts organized and readable. Instead of having long strings of text scattered throughout the code, all parameters are declared at the top of the script and referenced by name within the task definitions. This makes it much easier to understand what each task is doing and how they interact with one another.
Creating a separate task for each step of the workflow allows for better organization and readability. Each task can be written in its own section, making it easier to find and modify specific tasks without having to search through an entire workflow definition. Additionally, this structure makes it easier to debug errors since they are more likely to be isolated to one particular task.
When creating separate tasks, WDL provides several features that make it easy to define dependencies between them. For example, the “call” statement is used to specify which tasks should run after another task has completed successfully. This ensures that tasks will only run when all their required inputs are available. Furthermore, the “runtime” block can be used to set parameters such as memory requirements or disk space needed by a task. This helps ensure that tasks have enough resources to complete successfully.
Scatter-gather is a workflow pattern that allows for tasks to be split into multiple parts and then recombined. This can be used to parallelize tasks, meaning they are run simultaneously on different machines or cores. By doing this, the overall runtime of the task is reduced as it takes less time to complete than if it were running sequentially.
Using scatter-gather in WDL is relatively straightforward. The first step is to define an array of inputs which will be used to divide up the task. Then, the task itself is defined with the input array as a parameter. Finally, the output from each part of the task is gathered together using the gather keyword. This creates a single output from all the individual parts of the task.
Maps are key-value pairs that allow for easy access to data. They can be used to store and retrieve values quickly, making them ideal for storing variables in WDL. Arrays are ordered collections of elements that can be accessed by index. This makes it easier to iterate over a set of values or perform operations on multiple elements at once. Objects are structured types that contain properties and methods. These objects can be used to represent complex data structures, such as nested maps and arrays.
Using these structured types allows developers to create more efficient workflows with fewer lines of code. For example, when using an array, the same operation can be performed on all elements without having to write separate code for each element. Additionally, using structured types helps make code more readable and maintainable. By breaking up complex data into smaller pieces, it is easier to understand what is happening in the workflow.
Reusing functions across workflows is a great way to reduce the amount of code that needs to be written and maintained. By importing common functions, users can avoid having to rewrite the same code multiple times for different tasks or workflows. This also makes it easier to keep track of changes in the code since all updates are made in one place.
Importing functions also helps with readability and maintainability of WDL scripts. When using imports, the main workflow script will only contain the necessary logic for running the task, while the imported functions will contain the actual implementation details. This allows developers to focus on the overall structure of the workflow without getting bogged down in the details.
To use imports in WDL, you must first define the function in its own file and then import it into the main workflow script. The syntax for this looks like `import “path/to/file” as name`. Once imported, the function can be used just like any other function defined within the main workflow script.
Expressions are a powerful tool for constructing command line arguments in WDL. They allow users to dynamically generate values based on the inputs of their workflow, which can be used to customize and optimize tasks. For example, expressions can be used to set parameters such as memory requirements or number of cores needed for a task depending on the size of the input data. This allows users to create workflows that are more efficient and tailored to their specific needs.
Expressions also make it easier to write reusable code. By using expressions to construct command line arguments, users can avoid hard-coding values into their scripts. This makes it simpler to modify and reuse existing code without having to rewrite large sections of it. Additionally, expressions can be used to simplify complex logic by breaking it down into smaller components. This makes it easier to debug and maintain code over time.
Validating inputs and outputs is important because it ensures that the workflow will run as expected. It helps to identify any errors in the WDL code, such as typos or incorrect syntax, before running the workflow. This can save time and resources by preventing unnecessary runs of the workflow.
To validate inputs and outputs, a user should first check the syntax of their WDL script using a linter tool. A linter tool checks for common mistakes and provides feedback on how to fix them. Additionally, users should test their workflow with sample data to make sure that all tasks are working correctly. This allows users to verify that the workflow produces the expected results. Finally, users should use a workflow engine to execute the workflow and ensure that all tasks complete successfully.
Comments are a great way to make your WDL code more readable and understandable. They can be used to explain the purpose of each task, provide additional context for complex operations, or even just remind yourself what you were thinking when writing the code. This makes it easier for other developers to understand and modify your code in the future.
Comments also help with debugging. If something goes wrong during execution, comments can provide valuable insight into why that might have happened. Additionally, if you need to revisit an old workflow after some time has passed, comments can help jog your memory about how things work.
When adding comments to your WDL code, there are a few best practices to keep in mind. Make sure they are concise and clear so that anyone reading them can quickly understand their purpose. Also, try to avoid using jargon or abbreviations that may not be familiar to everyone. Finally, use consistent formatting throughout your code to make it easier to read.