Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add compy (login and compute) and chrlogin to globus endpoints #275

Merged
merged 1 commit into from
Jul 21, 2023

Conversation

mahf708
Copy link
Contributor

@mahf708 mahf708 commented Jun 19, 2023

fix #274

  • Especially important to review: a controversial workaround to guide the regex to compy if the the HOSTNAME is set to something starting with compy and the the fqdn is n*.local.

@mahf708 mahf708 force-pushed the compy_zstashini branch 2 times, most recently from 5964d94 to 558fdf3 Compare June 19, 2023 14:09
@mahf708 mahf708 changed the title fix endpoints regex? @mahf708 add compy (login and compute) and chrlogin to globus endpoints Jun 19, 2023
@mahf708 mahf708 changed the title @mahf708 add compy (login and compute) and chrlogin to globus endpoints add compy (login and compute) and chrlogin to globus endpoints Jun 19, 2023
@mahf708 mahf708 marked this pull request as ready for review June 20, 2023 15:30
r"b\d+\.lcrc\.anl\.gov": "61f9954c-a4fa-11ea-8f07-0a21f750d19b",
r"chr.*\.lcrc\.anl\.gov": "61f9954c-a4fa-11ea-8f07-0a21f750d19b",
r"cori.*\.nersc\.gov": "9d6d99eb-6d04-11e5-ba46-22000b92c6ec",
r"compy.*\.pnl\.gov": "68fbd2fa-83d7-11e9-8e63-029d279f7e24",
r"perlmutter.*\.nersc\.gov": "6bdc7956-fc0f-4ad2-989c-7aa5ee643a79", # If this doesn't work, use cori
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point, including the compute nodes for Compy. I think Perlmutter compute nodes also don't fit this pattern.

I usually run zstash from a login node but I suppose some may find it useful to run a large Globus transfer from a compute node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest at some point that instead of asking people to do screen we could suggest an alternative (that I personally prefer): submit the zstash job as a compute job. But first, I want to investigate what we can do make use of a full node (i.e., more threading, etc.)

Copy link
Collaborator

@forsyth2 forsyth2 Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahf708 The issue I discovered with using compute nodes is that sometimes the Globus transfers take longer than the max wall clock time (12 hours on Perlmutter, according to https://docs.nersc.gov/jobs/policy/). That is, I think we'd need to set up some sort of restart file system (as E3SM itself does) and use zstash --update.

@mahf708
Copy link
Contributor Author

mahf708 commented Jun 22, 2023

@forsyth2 I didn't test these changes "in production" yet. Would you like me to test them before we proceed or do you think your test suite will be able to able to handle them?

@forsyth2
Copy link
Collaborator

@mahf708 The only test that uses Globus is this one: https://github.com/E3SM-Project/zstash/blob/main/tests/test_globus.py#L182. I suppose we could run that test on both a Compy login node and a Compy compute node.

@forsyth2
Copy link
Collaborator

forsyth2 commented Jul 21, 2023

  • Especially important to review: a controversial workaround to guide the regex to compy if the the HOSTNAME is set to something starting with compy and the the fqdn is n*.local.

I unfortunately don't know too much about the HOSTNAME issue. (I rarely run zstash from compute nodes).

That said, I ran python -m unittest tests/test_globus.py on login nodes and compute nodes for Chrysalis, Perlmutter, and Compy (using this branch) and it passes in all 6 cases. (For reference, I ran on compute nodes using salloc -n 1 on Chrysalis and Compy, and salloc --constraint=cpu -n 1 --qos=regular on Perlmutter.)

So, I think this is good to merge. Thanks @mahf708

@forsyth2 forsyth2 merged commit ea71351 into E3SM-Project:main Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Compy endpoint to Globus file
2 participants