View on GitHub

Computational Techniques for Life Sciences

Part of the TACC Institute Series, Immersive Training in Advanced Computation

Life Sciences Workflows

Automating workflows 1

While our app is running, let’s start looking at what we will eventually want to do to make our lives easier when running blastp. Let’s write our own script to automate (and document) our process.

# are we in the right directory?
pwd

# create the script file and make it executable
touch blastp_workflow.sh
chmod +x blastp_workflow.sh

# now open this file in your favorite text editor
# e.g. "vim blastp_workflow.sh"

Our blastp_workflow.sh script can start out something like this:

#!/bin/bash

echo "workflow started at $(date)"

# validate and preprocess our input file(s)
query="first10.pep"
database="/work/01114/jfonner/public/ctls2017/refseq_protein/refseq_protein"

# run blastp
blastp -num_threads 12 -query $query -db $database

# postprocess our results

echo "workflow finished at $(date)"

Nice! Try running your new script!

./blastp_workflow.sh

If we let it run to completion, it would take a while, so let’s stop our script and add some things by pressing Ctrl+C to terminate the command. We will use the time to add helpful features to our script and learn more about bash while we are at it.

Reading command line arguments

What if we want to run our workflow on something other than “first10.pep”? We could copy and edit our code, but we would end up with lots of copy of our code with lots of small edits. How do we know which version is the latest or best to use? The other option is to make our script generic by making the parts we want to change something we can specify when we run the workflow.

Let’s make it so that we can give the query as a command line argument that would look like this:

./blastp_workflow.sh first10.pep

Bash has a number of special variables for working with command line arguments:

Let’s try them out!

Make a new script called “arguments.sh”. In it, put this

#!/bin/bash

echo "the command you ran was $0"
echo "you passed in $# arguments: $@"

We could be a bit more fancy about how we handle arguments using an if statement. The syntax for an if statement looks like the following:

if [[ condition is true ]]; then
    # run this block of code
    echo "it's true!"
fi

We can also add else and elif code blocks:

if [[ condition is true ]]; then
    # run this block of code
    echo "it's true!"
elif [[ some other condition is true]]; then
	# run some other code
	echo "some other condition was true."
else
    echo "everything was false"
fi

The conditions themselves look something like this:

[[ $# -eq 1 ]]

That will be true if the number of arguments passed into the script was 1. Here is a full list of conditional expressions: https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html

Hands-on

1) Update arguments.sh to do the following:

2) Update blastp_workflow.sh to use a command line argument for the query AND check that you received one argument before running the workflow

Previous: Life Sciences Workflows Next: Automating Workflows 2