title | author | date |
---|---|---|
Shell for Bioinformatics |
Sheldon McKay, Mary Piper, Radhika Khetani, Meeta Mistry, Jihe Liu |
September 28, 2020 |
Time | Topic | Instructor |
---|---|---|
9:30 - 10:10 | Workshop introduction | Will |
10:10 - 11:40 | Introduction to Shell | Upen |
11:40 - 12:00 | Overview of self-learning materials and homework submission | Will |
I. Please study the contents and work through all the code within the following lessons:
-
Wildcards and shortcuts in Shell
Click here for a preview of this lesson
Perhaps you are interested in only listing the files that have a.txt
extension or you want to navigate to your home directory quickly. There are many shortcuts in Shell that will help you do these types of tasks.
This lesson will cover:
- Utilizing wildcards for selecting multiple files
- Implementing shortcuts for moving around the Shell quickly
-
Click here for a preview of this lesson
Now that you can navigate around the Shell environment, you are likely interested to know how to view and edit your files.
This lesson will cover:
- Viewing your files
- Editing your files usingvim
-
Click here for a preview of this lesson
You will encounter large files that need a search function to find the information you are looking for. You might also be interested in writing the output of that search to a file or use it as the input to another function.
This lesson will cover:
- Searching files usinggrep
- Writing the output of a command to a file
- Redirecting the output of a command to an additional command
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it).
- Log in using
ssh [email protected]
and enter your password (replace the "XX" in the username with the number you were assigned in class).- Once you are on the login node, use
srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash
to get on a compute node.- Proceed once your command prompt has the word
compute
in it.- If you log out between lessons (using the
exit
command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.
II. Complete the exercises:
- Each lesson above contains exercises; please go through each of them.
- Copy over your solutions into the Google Form the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
09:30 - 10:10 | Self-learning lessons review | All |
10:10 - 10:55 | Shell scripts and variables in Shell | Upen |
10:55 - 12:00 | Loops and automation | Will |
I. Please study the contents and work through all the code within the following lessons:
-
Permissions and Environment Variables
Click here for a preview of this lesson
When using a multi-user system like the O2 cluster, you may want to limit access to your work. Permissions exist to clearly delineate who has the ability to read, write and execute your files.
Also, when working in a UNIX system, there are a core set of default variables that control the behavior of your command-line. One of the most important of these is the $PATH variable, which tells the system where to look for commands that you give it.
This lesson will cover:
- Interpreting and modifying existing permissions
- Querying environmental variables
- Reading and appending to the $PATH variable
-
Introduction to High-performance computing
Click here for a preview of this lesson
Now that you had a chance to explore the O2 cluster, let's focus on the components of this system, how it is different than your personal computer and the advantages that it offers in terms of parallelization.
This lesson will cover:
- Differentiating a high-performance computing cluster like O2 from your personal computer
- Discuss the large parallelization advantage that O2 has over a personal computer
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it). For login instructions, please see above.
II. Complete the exercises:
- Each lesson above contains exercises; please go through each of them.
- Copy over your solutions into the Google Form the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
9:30 - 10:00 | Self-learning lessons review | All |
10:00 - 11:00 | Introduction to the O2 cluster | Will |
11:00 - 11:30 | Exercise (answer key) | Upen |
11:30 - 11:45 | Introduction to the O2 cluster - data storage | Will |
11:45 - 12:00 | Wrap up | Upen |
Introduction to Shell: Dataset
If you are interested in learning some more advanced tools for working on the command-line, we encourage you to walk-through the materials linked below:
Cheat sheets:
- http://fosswire.com/post/2007/08/unixlinux-command-cheat-sheet/
- https://github.com/swcarpentry/boot-camps/blob/master/shell/shell_cheatsheet.md
- tldr_ : Simplified version of the
man
pages (online and searchable)
Online tutorials:
- Explain Shell
- Introduction to the Command Line for Genomics
- BASH Programming - Introduction HOW-TO
- Bioinformatics from the Command Line
General help:
- Google it! - if you don't know how to do something, try Googling it, other people have probably had the same question.
- Learn by doing! There's no real other way to learn this than by trying it out.
- Use vim on your laptop
- Move around the directory structure on your laptop using the Terminal/Shell counts
- Open folders and files using the command
open
- Automate something you don't really need to automate
- Use
man bash
to get more information about bash (bourne-again shell)
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- Some materials used in these lessons were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).