Computational Science in the Cloud Institute 2018

View on GitHub

Automating the Jupyter VM Deployment

In this section we build on what we did in the previous module towards fully automating the installation and configuration of Anaconda and the Jupyter notebook.

The basic steps of the manual installation process were:

The basic strategy is to use Ansible modules to accomplish each of these steps. We won’t work out a complete solution here, but we will work through a series of exercises that will expose the basic concepts needed to get to a complete solution. After working through the exercises, we will check out a complete, working solution from the course web site.

Ansible playbooks

Running single tasks is nice for ad hoc work, but they aren’t convenient for automating an entire installation process. Ansible playbooks allow us to combine tasks into a single script that can be run all at once.

Putting a Task into a Playbook

Playbooks use the YAML data format. YAML is similar to JSON but uses spaces to improve readability. Lists in YAML are designated by single dash (-) characters. Playbooks are ultimately lists of objects, each having at a minimum the following attributes:

Exercise. Use vi to open up a file called playbook.yml in your home directory. Do the following:
  * On the first line, add three dash (-) characters. This is part of the YAML spec and indicates the beginning of the file.
  * On the next line, specify the hosts you want to run the tasks on; in this case, `all`.
  * Next, specify the `tasks` attribute. Note the spacing! the `tasks` keyword should line up under the `hosts` keyword. Let's add a single task to use the `copy` module to copy our `test.txt` file to each remote server.

Here is what the file should have:

---

- hosts: all
  tasks:
  - name: check the uptime
    copy: src=test.txt dest=/root/remote.txt

Now, to run the playbook, use the ansible-playbook command and pass the hosts with the -i flag and the name of the playbook:

$ ansible-playbook -i hosts playbook.yml

Be sure to run it at least twice. Is our playbook idempotent?

More on Ansible Modules

Ansible has many powerful modules to accomplish tasks at a higher-level. We have already seen the copy module; here are a few more:

The get_url module

The get_url module retrieves files from a public URL. This is exactly what we need to download the Anaconda installer. In it’s most basic form, the get_url module takes two parameters:

Exercise. Modify your playbook to change the task to use the get_url module to download the Anaconda installer.

Here is what the file should have:

---

- hosts: all
  tasks:
  - name: download anaconda installer
    get_url:
      url: "https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh"
      dest: "/root/Anaconda3-4.4.0-Linux-x86_64.sh"

Run the playbook at least twice. Is it idempotent? Confirm the results.

The shell module

The shell module is similar to the command module, but executes the command in a shell. The shell module:

We can use this module to:

Exercise. Let's add another task to our playbook to run the Anaconda installer after it is downloaded. 

What about that interactive prompt? Fortunately, the installer has an option for us. Use the -h flag to see all options:

$ bash Anaconda3-4.4.0-Linux-x86_64.sh -h
usage: Anaconda3-4.4.0-Linux-x86_64.sh [options]

Installs Anaconda3 4.4.0

    -b           run install in batch mode (without manual intervention),
                 it is expected the license terms are agreed upon
    -f           no error if install prefix already exists (force)
    -h           print this help message and exit
    -p PREFIX    install prefix, defaults to /root/anaconda3

Great, looks like we can pass the -b flag and install it without a prompt. Therefore, as a first attempt, we can run the installer with something like:

  - name: run the Anaconda installer
    shell: "bash Anaconda3-4.4.0-Linux-x86_64.sh -b"

If we append that to our file, the result is:

---

- hosts: all
  tasks:
  - name: download anaconda installer
    get_url:
      url: "https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh"
      dest: "/root/Anaconda3-4.4.0-Linux-x86_64.sh"

  - name: run the installer
    shell: "bash Anaconda3-4.4.0-Linux-x86_64.sh -b"

Let’s run it and see what happens. Running the installer does take some time but it appears to have worked. If we ssh over to one of our hosts we see the anaconda3 directory in our home dir. However, there are still two issues:

Making a shell module task idempotent

Typically, shell and command tasks will not be idempotent. One technique for getting around this is the creates argument. The creates indicates to Ansible that this task creates some resource (a file or directory). When the resource already exists, Ansible will not run the task.

Exercise. Let's modify our playbook to add a `creates` parameter to our installer task. The installer creates the `anaconda3` directory so we'll use that as a check.

The modified playbook should look like:

---

- hosts: all
  tasks:
  - name: download anaconda installer
    get_url:
      url: "https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh"
      dest: "/root/Anaconda3-4.4.0-Linux-x86_64.sh"

  - name: run the installer
    shell: "bash Anaconda3-4.4.0-Linux-x86_64.sh -b"
    args:
      creates: "/root/anaconda3"

Run the playbook again to ensure we do not get an error.

Exercise. Let's use the shell module again to ensure that /root/.bash_profile exists. In the step below we'll then ensure it has the correct contents.

The modified playbook will look like:

---

- hosts: all
  tasks:
  - name: download anaconda installer
    get_url:
      url: "https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh"
      dest: "/root/Anaconda3-4.4.0-Linux-x86_64.sh"

  - name: run the installer
    shell: "bash Anaconda3-4.4.0-Linux-x86_64.sh -b"
    args:
      creates: "/root/anaconda3"

  - name: ensure the .bash_profile file is present
    shell: "touch /root/.bash_profile"
    args:
      creates: "/root/.bash_profile"

The lineinfile module

The lineinfile module can be used to manipulate text files in an idempotent way. It has a lot of options, but the key ones we will use are:

Exercise. Add a task to the playbook to ensure that the anaconda3 bin directory is on the $PATH. We'll append the bin directory to our $PATH in our .bash_profile file, which we know exists because we ensured it in the previous task.

Here’s what the file should look like now:

---

- hosts: all
  tasks:
  - name: download anaconda installer
    get_url:
      url: "https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh"
      dest: "/root/Anaconda3-4.4.0-Linux-x86_64.sh"

  - name: run the installer
    shell: "bash Anaconda3-4.4.0-Linux-x86_64.sh -b"
    args:
      creates: "/root/anaconda3"

  - name: ensure the .bash_profile file is present
    shell:  "touch /root/.bash_profile"
    args:
      creates: "/root/.bash_profile"

  - name: Ensure Anaconda to bash_profile
    lineinfile:
      state: present
      path: "/root/.bash_profile"
      line: "export PATH=/root/anaconda3/bin:$PATH"

Run the playbook a couple of times to make sure it works and is idempotent.

A Word on Roles

Ansible Roles are a way to package Ansible functionality into a modular component that can be shared across multiple playbooks. Suppose you wanted to have two different Jupyter environments: one geared towards bioinformaticians and one for hazards engineering. To get the most code reuse you could organize things as follows:

Then you could have two playbooks:

Checking Out and Running the Full Solution

The complete solution is available from the project git repository. Use the git clone command to copy the files to your VM:

$ git clone https://github.com/TACC/CSC2018Institute.git

This should have created a directory called CSC2018Institute in the current working directory.

The playbook also requires some small changes to your hosts files. First, put the hosts in the master group by adding

[master]

at the top of the hosts file.

Second, add an additional line in your hosts file to set the password for Jupyter. Add the following to the bottom of your hosts file:


[all:vars]
jpassword=your_password

Once your hosts file is updated, run the playbook with the command:

$ ansible-playbook -i hosts CSC2018Institute/jupyter.yml

If all goes well you should see messages similar to:

ok: [day3-1] => {
    "msg": [
        "*** Jupyter Notebook installed. Launch by re-sourcing your .bash_profile and running the notebook: ***", 
        "*** source ~/.bash_profile ***", 
        "*** jupyter notebook --allow-root ***", 
        "*** Then point your browser at: ***", 
        "*** https://YOUR_PUBLIC_IP:8887 ***"
    ]
}

On your VM with a public IP, follow the instructions to source your ~/.bash_profile and launch the notebook server.