View on GitHub

Computational Techniques for Life Sciences

Part of the TACC Institute Series, Immersive Training in Advanced Computation

Choosing the Correct Hardware

You may think that “bigger” or “faster” computers are automatically faster, but that is not always the case. Each of our systems have their own unique strengths to serve specific types of computation loads. No matter what you do, you know that specialized tools are better than others.

Computers are exactly the same. Here is a condensed overview of some of our systems.

System Pros Cons
Stampede Thousands of nodes, Xeon Phi accelerators Depreciated software stack and high demand
Stampede 2 Thousands of nodes, KNL processors Slow for serial code
Lonestar 5 Compute, GPUs, Large-mem UT only, slow external network
Wrangler SSD Filesystem, Hosted Databases, Hadoop, HDFS Low node-count
Jetstream Long running instances, root access limited storage
Maverick GPUs, high memory Depreciated software stack
Chameleon GPUs, bare metal VM, software defined networking Difficult to configure
Catapult FPGAs Windows :D
Hikari Protected Data No scratch filesystem

Just remember that after you choose a system, you should read the associated user guide to make sure know how to use its full potential. I personally know that many people run on wrangler, but hardly anyone looks up the filesystem where the SSDs live.

In this course, you have been running on Lonestar 5. This is a great general purpose system with GPUs and large-memory nodes. You just have to submit to the correct queues to utilize them. If you are associated with UT, you can request time on it for your projects without going through XSEDE.

CPU bound code

Wikipedia defines code to be CPU-bound

when the time for it to complete a task is determined principally by the speed of the central processor.

This means that your program runtime should not significantly differ if your input comes from SSDs on Wrangler, or the tapes on Ranch.

phant

You can speed up CPU bound code by utilizing

Please consider attending one of our “many-core” or “mpi” workshops to learn how to program in these paradigms.

Explore

I/O bound code

Going back to Wikipedia, a code is called I/O bound if

the time it takes to complete a computation is determined principally by the period spent waiting for input/output operations to be completed.

You can usually identify cases of slowdown caused by I/O when your processes are not utilizing an entire core. This means that the code is either reading or writing a file and it has to STOP execution and wait for those file operations to finish.

To remove the I/O bottleneck, you first need to identify what kind of file operations your program performs. At the very least, you may have pay attention to the files the program takes as input and produces, but you may also need to have an understanding of the code itself.

IOPS

If you code requires lots of IOPS, this usually means that it

You can speed up programs like these by

Throughput

If your code requires lots of throughput, it definitely reads and/or writes large (> 1GB) files. Running multiple processes will oversaturate the link to the filesystem and actually slow things down. Luckily, Lustre ($SCRATCH, $WORK) is ideal for this kind of workload.

When you find that your program is being slowed down, you can try the following things

File count

Lastly, your code can slow down based on how many files your code reads or writes. We previously recommended that you could circumvent throughput bottlenecks by writing multiple files, but if you make too many files, simple ls commands could overload the Lustre metadata servers. The Trinity transcriptome assembly is guilty of this.

If you notice that your program is making huge numbers of files, you can try

Explore

Back - Optimization   —   Next - Agenda