View on GitHub

Computational Techniques for Life Sciences

Part of the TACC Institute Series, Immersive Training in Advanced Computation

How can I clean up my data?

Compression and archiving

$ gzip -c file > file.gz

Example: find the larger “random” file in the “test” directory and compress it at different level

$ cd research_6
$ for i in $(seq 1 9) ; do gzip -c -${i} data_6_7 > data_6_7_${i}.gz ;done
$ ls -lrsh
total 271M
  0 -rw------- 1 beckbw G-814141   0 Jun  6 02:34 data_6_9
  0 -rw------- 1 beckbw G-814141   0 Jun  6 02:34 data_6_8
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_9.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_8.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_7.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_6.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_5.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_4.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_3.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_2.gz
28M -rw------- 1 beckbw G-814141 28M Jun  6 08:30 data_6_7_1.gz
27M -rw------- 1 beckbw G-814141 27M Jun  6 02:37 data_6_7

DISCUSSION: Did compression save me anything?


Archiving and Transfering

$ cd $WORK
$ tar -cvf test.tar $HOME/test
MISTAKE: we have the /home1 path in our archive
$ tar -cvf test.tar -C $HOME test
$ tar -tvf test.tar

Compressing Archives