-
Notifications
You must be signed in to change notification settings - Fork 2
EC2, DGX and S3
This Wiki is a general knowledge page on Cloud computing and useful things to know when working on AWS or the DGX
DGX01 is an IP group-specific server that has a large NVIDIA graphic card and thus can be used for larger calculations (especially when GPUs are needed. Sign in with your Google credentials @ http://dgx01.broadinstitute.org. There is a better way to connect, which allows you to download files more easily, ask in the IP group.
The home directory of the DGX is very small, so never store things on there. Store all things under: dgx1nas1/storage/data
You will find the relevant data to this project under /dgx1nas1/cellpainting-datasets/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/
There is a fair overlap and doubled structure to the S3 buckets so be careful which data is most up to date. I have done most my work on S3 because DGX was overloaded and I needed more power, but there are still some scattered files from me under /dgx1nas1/storage/data/meikelb/
.
As mentioned, you will find similar data as on the DGX on S3: https://s3.console.aws.amazon.com/s3/buckets/imaging-platform?prefix=projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/
For the first part of the project two things are important, the compressed images of the LINCS dataset and the cell locations. Check out the DeepProfiler Page of this wiki to get more details.
Location files: s3://imaging-platform/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace/deep_learning/locations/
- everything on S3 can be done via the AWS CLI
- How to pull from S3:
aws s3 cp s3://my_bucket/my_folder/my_file.ext my_copied_file.ext
- use
--recursive
to pull folders
- use
- downloading from S3 to EC2 is faster than downloading to local but not super fast. So try to only download or upload once.
Read the internal IP wiki and the AWS launch documentation
For DeepProfiler, we want to optimize the speed of the training and profiling. This means using Tensorflow on GPUs. Even the smallest GPU instance improves the speed of DeeeProfiler (DP) by 40 fold. The instances I have tried out so far are the P2 and P3 families. The P2 seems to be the cheapest of the GPU instances.
I used the Deep Learning AMI (Ubuntu 18.04) Version 43.0 image and installed DeepProfiler on top. This image has Tensorflow and all other relevant deep learning features available. Make the local EBS around 150 GB large.
Move files from EC2 to local or vice versa:
scp -i /path/my-key-pair.pem /path/my-file.txt ubuntu@my-instance-public-dns-name:path/
scp -i /path/my-key-pair.pem ubuntu@my-instance-public-dns-name:path/
/path/my-file.txt
Send all bash output to the logfile while also displaying in the terminal
yourcommand 2>&1 | tee logs/logfile.txt
Attach your volume to this instance
mkdir TARGETNAME
# check the name of the volume
lsblk
sudo mount /dev/VOLUMENAME /home/ubuntu/TARGETNAME
sudo chmod 777 ~/TARGETNAME/
After increasing the size of a volume:
sudo resize2fs /dev/xvdf
Use tmux and more importantly detach your sessions so that they don't break once you have lost ssh connection!