-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Costing of 1/40th simulation #23
Comments
Costing for storage: 1/20th current standard output is 16 GB per month = 192 GB / year. Current 1/20th daily output is 4.3 GB for 1 month, which has 7 variables at daily frequency. We might want u, v, and transport on 3 density surfaces, plus depth of those density surfaces, so 8 variables at 3 hourly frequency. So this would be 157 GB for 1 month ~= 2 TB for 1 year. So perhaps asking for 20 TB would be sufficient. Thoughts on any of this appreciated. |
I am hoping that, with a little work, we can get more scaling out of the model - perhaps 30k cores?. I saw that Ben asked if we needed help on the model ... perhaps we could ask for Paul or Rui to help @micaeljtoliveira with the profiling? |
These estimates assume perfect parallel speedup, which is a best-case scenario and unlikely in practice. Also more cores=more crashes so actual throughput may not improve as much as we'd like, and SU requirements with more cores would increase by both crashes and imperfect parallel scaling. So these SU estimates are a lower bound and we won't know how close we get to it without doing test runs. |
Yep agreed. I’ve passed on these numbers to Al and Ben at NCI and indicated that it’s a lower bound. I’ve also asked if they can help with optimisation to see if we can get any speed up. |
This could be useful if we decide to get a more fine-grained profiling. Currently I'm using the timings provided by MOM6, but these cover relatively large portions of the code. To get a more detailed profiling, we will probably need to use specialized HPC instrumentation tools and NCI staff will surely have lots of experience with those. |
Looks like the actual numbers are a bit better than predicted. Using ~10000 cores, one gets: The question now is if we can use more cores and what is the parallel efficiency in that case. |
Update on the cost: Panan 1/40th have been running for a few months now, and it has indeed been using ~1.5 MSU /yr. With the tile collation I believe the final cost of the 10yrs simulation will be about 16MSU. |
NCI have asked for an updated costing of our planned 1/40th run. Based on scaling up the 1/20th, I have:
1/20th:
cpus: 3744
Walltime: 2 hr 45 mins for 1 month, = 33 hours for 1 year
SU: 22 kSU for 1 month, = 264 kSU for 1 year
1/40th:
cpus: 4 x cpus ~= 15000 cpu
Walltime: 2x time step = 66 hours / year
SU: 8x SU = 2.1 MSU / year
If we run 10 years on 15000 cpu that will cost 21 MSU (plus more for crashes and for when we output high frequency data) and take 28 days…
It sounds like we may need to ask for some more than that because from their current bench-marking, the new core performance is not as good as the existing Cascade Lakes. One problem I see here is that this is going to take 28 days (assuming it runs continuously)! Do we think there's any possibility of scaling this up to more cores to get more throughput? @micaeljtoliveira @angus-g @aekiss @AndyHoggANU ?
The text was updated successfully, but these errors were encountered: