Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GCC 9.3.0 & 10.3.0 to NESSI.NO 2022.11 #46

Merged

Conversation

poksumdo
Copy link
Collaborator

No description provided.

@trz42
Copy link
Owner

trz42 commented Nov 16, 2022

Verified patch. Seems to work. Setting bot:build label.

@trz42 trz42 self-assigned this Nov 16, 2022
@trz42 trz42 added the bot:build Instruct bot to build software stack label Nov 16, 2022
@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance Fox-PR62 for architecture x86_64-amd-zen2 in job dir /fp/projects01/ec88/pilot.nessi/PR62/jobs/2022.11/pr_46/153616

date job status comment
Nov 16 21:47:23 UTC 2022 submitted job id 153616 awaits release by job manager
Nov 16 21:48:20 UTC 2022 released job awaits launch by Slurm scheduler
Nov 16 23:55:49 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-amd-zen2-1668642724.tar.gz (1.561 GiB) in job dir
Nov 20 19:49:47 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-amd-zen2-1668642724.tar.gz to S3 bucket succeeded
Nov 20 07:55:47 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-amd-zen2-1668642724.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/164 for approval
Nov 20 08:12:46 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-amd-zen2-1668642724.tar.gz approved, see PR https://github.com/trz42/staging/pull/164
Nov 20 08:16:19 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-amd-zen2-1668642724.tar.gz successfully ingested at 2022.11/software/linux/x86_64/amd/zen2/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance Fram-PR62 for architecture x86_64-intel-broadwell in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/4628033

date job status comment
Nov 16 21:47:24 UTC 2022 submitted job id 4628033 awaits release by job manager
Nov 16 21:48:26 UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 01:23:44 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-broadwell-1668647903.tar.gz (1.561 GiB) in job dir
Nov 20 19:50:10 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-intel-broadwell-1668647903.tar.gz to S3 bucket succeeded
Nov 20 07:57:06 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-intel-broadwell-1668647903.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/165 for approval
Nov 20 08:17:42 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-intel-broadwell-1668647903.tar.gz approved, see PR https://github.com/trz42/staging/pull/165
Nov 20 08:21:06 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-intel-broadwell-1668647903.tar.gz successfully ingested at 2022.11/software/linux/x86_64/intel/broadwell/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance Betzy-PR62 for architecture x86_64-amd-zen2 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/483955

date job status comment
Nov 16 21:47:24 UTC 2022 submitted job id 483955 awaits release by job manager
Nov 16 21:47:27 UTC 2022 released job awaits launch by Slurm scheduler
Nov 16 22:32:32 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-483955.out in job dir
  • Slurm output lacks message "No missing modules!".

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance CitC-PR62 for architecture x86_64-intel-haswell in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3158

date job status comment
Nov 16 09:47:31 PM UTC 2022 submitted job id 3158 awaits release by job manager
Nov 16 09:48:31 PM UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 01:46:10 AM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-haswell-1668649304.tar.gz (1.561 GiB) in job dir
Nov 20 07:47:24 PM UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-intel-haswell-1668649304.tar.gz to S3 bucket succeeded
Nov 20 07:59:37 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-intel-haswell-1668649304.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/167 for approval
Nov 20 08:24:44 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-intel-haswell-1668649304.tar.gz approved, see PR https://github.com/trz42/staging/pull/167
Nov 20 08:28:09 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-intel-haswell-1668649304.tar.gz successfully ingested at 2022.11/software/linux/x86_64/intel/haswell/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance CitC-PR62 for architecture aarch64-generic in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3159

date job status comment
Nov 16 09:47:33 PM UTC 2022 submitted job id 3159 awaits release by job manager
Nov 16 09:48:29 PM UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 12:26:54 AM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-generic-1668644506.tar.gz (1.507 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance CitC-PR62 for architecture aarch64-graviton2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3160

date job status comment
Nov 16 09:47:34 PM UTC 2022 submitted job id 3160 awaits release by job manager
Nov 16 09:48:27 PM UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 12:27:56 AM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-graviton2-1668644548.tar.gz (1.507 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance Saga-PR62 for architecture x86_64-intel-cascadelake in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7166783

date job status comment
Nov 16 21:47:38 UTC 2022 submitted job id 7166783 awaits release by job manager
Nov 16 21:48:33 UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 00:13:46 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668643776.tar.gz (1.561 GiB) in job dir
Nov 20 19:50:05 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668643776.tar.gz to S3 bucket succeeded
Nov 20 07:58:18 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668643776.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/166 for approval
Nov 20 08:21:14 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668643776.tar.gz approved, see PR https://github.com/trz42/staging/pull/166
Nov 20 08:24:37 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668643776.tar.gz successfully ingested at 2022.11/software/linux/x86_64/intel/cascadelake/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance Saga-PR62 for architecture x86_64-intel-skylake_avx512 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7166784

date job status comment
Nov 16 21:47:41 UTC 2022 submitted job id 7166784 awaits release by job manager
Nov 16 21:48:31 UTC 2022 released job awaits launch by Slurm scheduler
Nov 16 22:21:24 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-7166784.out in job dir
  • Slurm output lacks message "No missing modules!".
  • Slurm output lacks message about created tarball.
  • No tarball matching eessi-*software-*.tar.gz found in job dir.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance eX3-PR62 for architecture x86_64-amd-zen in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/406318

date job status comment
Nov 16 21:47:32 UTC 2022 submitted job id 406318 awaits release by job manager
Nov 16 21:48:01 UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 04:10:36 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-amd-zen-1668657912.tar.gz (1.562 GiB) in job dir
Nov 20 19:49:56 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-amd-zen-1668657912.tar.gz to S3 bucket succeeded
Nov 20 07:54:36 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-amd-zen-1668657912.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/163 for approval
Nov 20 08:09:13 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-amd-zen-1668657912.tar.gz approved, see PR https://github.com/trz42/staging/pull/163
Nov 20 08:12:39 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-amd-zen-1668657912.tar.gz successfully ingested at 2022.11/software/linux/x86_64/amd/zen/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance Saga-PR62 for architecture x86_64-generic in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7166785

date job status comment
Nov 16 21:47:45 UTC 2022 submitted job id 7166785 awaits release by job manager
Nov 16 21:48:27 UTC 2022 released job awaits launch by Slurm scheduler
Nov 16 22:21:27 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-7166785.out in job dir
  • Slurm output lacks message "No missing modules!".
  • Slurm output lacks message about created tarball.
  • No tarball matching eessi-*software-*.tar.gz found in job dir.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance eX3-PR62 for architecture x86_64-amd-zen3 in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/406319

date job status comment
Nov 16 21:47:34 UTC 2022 submitted job id 406319 awaits release by job manager
Nov 16 21:48:03 UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 02:10:25 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-406319.out in job dir
  • Slurm output lacks message "No missing modules!".

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance eX3-PR62 for architecture aarch64-generic in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/406320

date job status comment
Nov 16 21:47:35 UTC 2022 submitted job id 406320 awaits release by job manager
Nov 16 21:47:54 UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 01:41:20 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz (1.507 GiB) in job dir
Nov 20 19:52:43 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz to S3 bucket succeeded
Nov 20 08:01:52 PM UTC 2022 staged tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/168 for approval
Nov 20 08:32:07 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:40:28 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:43:11 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:47:56 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:49:25 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:50:53 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:52:22 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:53:52 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:55:20 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:56:49 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168
Nov 20 08:58:18 PM UTC 2022 rejected 👎 tarball eessi-2022.11-software-linux-aarch64-generic-1668648808.tar.gz rejected, see PR https://github.com/trz42/staging/pull/168

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 16, 2022

New job on instance eX3-PR62 for architecture aarch64-thunderx2 in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/406321

date job status comment
Nov 16 21:47:36 UTC 2022 submitted job id 406321 awaits release by job manager
Nov 16 21:48:00 UTC 2022 released job awaits launch by Slurm scheduler
Nov 17 01:41:21 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-thunderx2-1668648830.tar.gz (1.507 GiB) in job dir
Nov 20 19:59:04 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-aarch64-thunderx2-1668648830.tar.gz to S3 bucket succeeded
Nov 20 20:39:11 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-aarch64-thunderx2-1668648830.tar.gz to S3 bucket succeeded
Nov 20 08:41:46 PM UTC 2022 staged tarball eessi-2022.11-software-linux-aarch64-thunderx2-1668648830.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/171 for approval
Nov 20 08:43:22 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-aarch64-thunderx2-1668648830.tar.gz approved, see PR https://github.com/trz42/staging/pull/171
Nov 20 08:46:31 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-aarch64-thunderx2-1668648830.tar.gz successfully ingested at 2022.11/software/linux/aarch64/thunderx2/

@trz42
Copy link
Owner

trz42 commented Nov 18, 2022

Testing fix(es) to handle non-bot jobs. Begin with not fixed version (job manager should crash in step 2). Continue with partial fix provided in PR63 (job manager might still crash but not anymore in process_new_job). Lastly, test additional changes provided by trz42/eessi-bot-software-layer branch fix-non-bot-job-leaking.

  • 0. Disabled bot on all instances except the one on AWS.

  • 1. Resent bot:build ... wait until bot job runs.

  • 2. Submit an interactive job with srun -C shape=c4.2xlarge --pty /bin/bash ... check if job manager crashes.
    job manager log

    [20221118-T04:03:27] job manager main loop: current_jobs='3173,3172,3171,3174'
    [20221118-T04:03:27] job manager main loop: new_jobs='3174'
    [20221118-T04:03:27] process_new_job(): run scontrol command: /usr/bin/scontrol --oneliner show jobid 3174
    [20221118-T04:03:27] process_new_job(): work dir of job 3174: '/mnt/shared/home/trz42'
    [20221118-T04:03:27] process_new_job(): create a symlink: /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/ids/submitted/3174 -> /mnt/shared/home/trz42
    [20221118-T04:03:27] process_new_job(): run scontrol command: /usr/bin/scontrol release 3174
    

    Note in the log above, the destination of the symlink for the non-bot job is the $HOME directory, because that is the working directory of the non-bot job. It was likely submitted from this directory.

    job manager crashes with

    Traceback (most recent call last):
      File "/usr/lib64/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib64/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/mnt/shared/home/trz42/pilot.nessi/PR62/eessi_bot_job_manager.py", line 475, in <module>
        main()
      File "/mnt/shared/home/trz42/pilot.nessi/PR62/eessi_bot_job_manager.py", line 453, in main
        job_manager.process_new_job(current_jobs[nj])
      File "/mnt/shared/home/trz42/pilot.nessi/PR62/eessi_bot_job_manager.py", line 181, in process_new_job
        repo_name = metadata_pr['repo'] or ''
    KeyError: 'repo'
    
  • 3. Stop bot (smee, job manager, event handler, logging)

  • 4. Remove non-bot job and its traces on filesystem (directory & symlink(s)).

  • 5. Get fix from EESSI/eessi-bot-software-layer (first PR got merged already). Restart bot from new environment (copy app.cfg from PR62 environment should be compliant with main:HEAD). Observe 1-2 iterations of job manager running without crash.

  • 6. Submit an interactive job with srun -C shape=c4.2xlarge --pty /bin/bash ... check if job manager crashes. Should not crash in process_new_job. Check if interactive job is listed as new (new_jobs) and subsequently as known job (known_jobs in next iteration of job manager's main loop).
    event handler log looking good non-bot job is skipped --> why is this in the event handler log??? also variables are not substituted correctly

    [20221118-T04:24:51] issue_comment event handled!
    [20221118-T04:25:38] No metadata file found at {job_metadata_path}, so not a bot job
    [20221118-T04:25:38] No metadata file found at {job_metadata_path} for job {jobid}, so skipping it
    

    job manager log iteration $n$, new job 3175 is detected, that's a non-bot job

    [20221118-T04:25:38] job manager main loop: known_jobs='3173,3172,3171'
    [20221118-T04:25:38] get_current_jobs(): run squeue command: /usr/bin/squeue --long --user=trz42
    [20221118-T04:25:38] get_current_jobs(): squeue output
    b'Fri Nov 18 04:25:38 2022\n             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)\n              3173   compute eessi-bo    trz42  RUNNING      21:29 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0001\n              3172   compute eessi-bo    trz42  RUNNING      20:29 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0003\n              3171   compute eessi-bo    trz42  RUNNING      21:29 1-00:00:00      1 fair-mastodon-c4-2xlarge-0001\n              3175   compute     bash    trz42  RUNNING       0:20 UNLIMITED      1 fair-mastodon-c4-2xlarge-0002\n'
    [20221118-T04:25:38] job manager main loop: current_jobs='3173,3172,3171,3175'
    [20221118-T04:25:38] job manager main loop: new_jobs='3175'
    [20221118-T04:25:38] process_new_job(): run scontrol command: /usr/bin/scontrol --oneliner show jobid 3175
    [20221118-T04:25:38] process_new_job(): work dir of job 3175: '/tmp'
    [20221118-T04:25:38] job manager main loop: finished_jobs=''
    [20221118-T04:25:38] job manager main loop: sleep 60 seconds
    

    job manager log iteration $n+1$, previously new job 3175 (non-bot job) has leaked into list of known_jobs

    [20221118-T04:26:38] job manager main loop: known_jobs='3173,3172,3171,3175'
    [20221118-T04:26:38] get_current_jobs(): run squeue command: /usr/bin/squeue --long --user=trz42
    [20221118-T04:26:38] get_current_jobs(): squeue output
    b'Fri Nov 18 04:26:38 2022\n             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)\n              3173   compute eessi-bo    trz42  RUNNING      22:29 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0001\n              3172   compute eessi-bo    trz42  RUNNING      21:29 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0003\n              3171   compute eessi-bo    trz42  RUNNING      22:29 1-00:00:00      1 fair-mastodon-c4-2xlarge-0001\n              3175   compute     bash    trz42  RUNNING       1:20 UNLIMITED      1 fair-mastodon-c4-2xlarge-0002\n'
    [20221118-T04:26:38] job manager main loop: current_jobs='3173,3172,3171,3175'
    [20221118-T04:26:38] job manager main loop: new_jobs=''
    [20221118-T04:26:38] job manager main loop: finished_jobs=''
    
  • 7. If job manager did not crash (because the non-bot job did not finish yet), cancel the non-bot job and observe the behaviour of the job manager (might crash now.)

    job manager log because job 3175 was in the known list of jobs (known_jobs) but isn't in the current list of jobs (current_jobs) it is now considered a finished job (finished_jobs)

    [20221118-T04:28:38] job manager main loop: known_jobs='3173,3172,3171,3175'
    [20221118-T04:28:38] get_current_jobs(): run squeue command: /usr/bin/squeue --long --user=trz42
    [20221118-T04:28:38] get_current_jobs(): squeue output
    b'Fri Nov 18 04:28:38 2022\n             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)\n              3173   compute eessi-bo    trz42  RUNNING      24:29 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0001\n              3172   compute eessi-bo    trz42  RUNNING      23:29 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0003\n              3171   compute eessi-bo    trz42  RUNNING      24:29 1-00:00:00      1 fair-mastodon-c4-2xlarge-0001\n'
    [20221118-T04:28:38] job manager main loop: current_jobs='3173,3172,3171'
    [20221118-T04:28:38] job manager main loop: new_jobs=''
    [20221118-T04:28:38] job manager main loop: finished_jobs='3175'
    

    job manager crashes with

    Traceback (most recent call last):
      File "/usr/lib64/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib64/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/mnt/shared/home/trz42/pilot.nessi/non-bot-job-fix-part1/eessi_bot_job_manager.py", line 649, in <module>
        main()
      File "/mnt/shared/home/trz42/pilot.nessi/non-bot-job-fix-part1/eessi_bot_job_manager.py", line 630, in main
        job_manager.process_finished_job(known_jobs[fj])
      File "/mnt/shared/home/trz42/pilot.nessi/non-bot-job-fix-part1/eessi_bot_job_manager.py", line 345, in process_finished_job
        sym_dst = os.readlink(job_dir)
    FileNotFoundError: [Errno 2] No such file or directory: '/mnt/shared/home/trz42/pilot.nessi/PR62/jobs/ids/submitted/3175'
    
  • 8. Stop bot (smee, job manager, event handler, logging).

  • 9. Remove non-bot job and its traces on filesystem (directory & symlink(s)). was not necessary because link was not created by changed process_new_job

  • 10. Get fix from trz42/eessi-bot-software-layer (branch fix-non-bot-job-leaking). Restart bot from new environment (copy app.cfg from PR62 environment should be compliant with branch fix-non-bot-job-leaking). Observe 1-2 iterations of job manager running without crash.

  • 11. Submit an interactive job with srun -C shape=c4.2xlarge --pty /bin/bash ... check if job manager crashes.

  • 12. Check if interactive job is listed as new and subsequently as currently known job. (good if not --> indication that job manager should not crash when non-bot job is canceled)
    job manager log iteration $n$, new job 3176 is detected, that's a non-bot job (skipping info is logged into event handler's log)

    [20221118-T04:46:23] job manager main loop: known_jobs='3173,3172,3171'
    [20221118-T04:46:23] get_current_jobs(): run squeue command: /usr/bin/squeue --long --user=trz42
    [20221118-T04:46:23] get_current_jobs(): squeue output
    b'Fri Nov 18 04:46:23 2022\n             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)\n              3176   compute     bash    trz42 CONFIGUR       0:05 UNLIMITED      1 fair-mastodon-c4-2xlarge-0002\n              3173   compute eessi-bo    trz42  RUNNING      42:14 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0001\n              3172   compute eessi-bo    trz42  RUNNING      41:14 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0003\n              3171   compute eessi-bo    trz42  RUNNING      42:14 1-00:00:00      1 fair-mastodon-c4-2xlarge-0001\n'
    [20221118-T04:46:23] job manager main loop: current_jobs='3176,3173,3172,3171'
    [20221118-T04:46:23] job manager main loop: new_jobs='3176'
    

    job manager log iteration $n+1$, previously new job 3176 (non-bot job) has NOT leaked into list of known_jobs, because of this it is assumed to be a new job (no problem as it will be skipped by process_new_job $\rightarrow$ some future optimisation potential)

    [20221118-T04:47:23] job manager main loop: known_jobs='3173,3172,3171'
    [20221118-T04:47:23] get_current_jobs(): run squeue command: /usr/bin/squeue --long --user=trz42
    [20221118-T04:47:23] get_current_jobs(): squeue output
    b'Fri Nov 18 04:47:23 2022\n             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)\n              3176   compute     bash    trz42 CONFIGUR       1:05 UNLIMITED      1 fair-mastodon-c4-2xlarge-0002\n              3173   compute eessi-bo    trz42  RUNNING      43:14 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0001\n              3172   compute eessi-bo    trz42  RUNNING      42:14 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0003\n              3171   compute eessi-bo    trz42  RUNNING      43:14 1-00:00:00      1 fair-mastodon-c4-2xlarge-0001\n'
    [20221118-T04:47:23] job manager main loop: current_jobs='3176,3173,3172,3171'
    [20221118-T04:47:23] job manager main loop: new_jobs='3176'
    
  • 13. Cancel non-bot job and observe behaviour of job manager.
    job manager log iteration $n+k$, non-bot job 3176 finished and is not anymore in output of squeue command, hence not in list of current_jobs; because it was not in known_jobs either it is not considered a finished bot job, hence nothing happens and the job manager continues running

    [20221118-T04:50:24] job manager main loop: known_jobs='3173,3172,3171'
    [20221118-T04:50:24] get_current_jobs(): run squeue command: /usr/bin/squeue --long --user=trz42
    [20221118-T04:50:24] get_current_jobs(): squeue output
    b'Fri Nov 18 04:50:24 2022\n             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)\n              3173   compute eessi-bo    trz42  RUNNING      46:15 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0001\n              3172   compute eessi-bo    trz42  RUNNING      45:15 1-00:00:00      1 fair-mastodon-c6g-2xlarge-0003\n              3171   compute eessi-bo    trz42  RUNNING      46:15 1-00:00:00      1 fair-mastodon-c4-2xlarge-0001\n'
    [20221118-T04:50:24] job manager main loop: current_jobs='3173,3172,3171'
    [20221118-T04:50:24] job manager main loop: new_jobs=''
    [20221118-T04:50:24] job manager main loop: finished_jobs=''
    

    👍

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 18, 2022

New job on instance Betzy-PR62 for architecture x86_64-amd-zen2 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/484316

date job status comment
Nov 18 04:00:47 UTC 2022 submitted job id 484316 awaits release by job manager
Nov 18 04:01:22 UTC 2022 released job awaits launch by Slurm scheduler
Nov 18 04:44:27 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-484316.out in job dir
  • Slurm output lacks message "No missing modules!".

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 18, 2022

New job on instance CitC-PR62 for architecture x86_64-intel-haswell in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3171

date job status comment
Nov 18 04:00:54 AM UTC 2022 submitted job id 3171 awaits release by job manager
Nov 18 04:01:26 AM UTC 2022 released job awaits launch by Slurm scheduler
Nov 18 08:02:14 AM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-haswell-1668758305.tar.gz (1.561 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 18, 2022

New job on instance CitC-PR62 for architecture aarch64-generic in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3172

date job status comment
Nov 18 04:00:56 AM UTC 2022 submitted job id 3172 awaits release by job manager
Nov 18 04:01:24 AM UTC 2022 released job awaits launch by Slurm scheduler
Nov 18 07:49:10 AM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-generic-1668753273.tar.gz (1.507 GiB) in job dir
Nov 20 07:48:31 PM UTC 2022 uploaded transfer of eessi-2022.11-software-linux-aarch64-generic-1668753273.tar.gz to S3 bucket succeeded
Nov 20 07:52:05 PM UTC 2022 staged tarball eessi-2022.11-software-linux-aarch64-generic-1668753273.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/161 for approval
Nov 20 08:01:59 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-aarch64-generic-1668753273.tar.gz approved, see PR https://github.com/trz42/staging/pull/161
Nov 20 08:05:24 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-aarch64-generic-1668753273.tar.gz successfully ingested at 2022.11/software/linux/aarch64/generic/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 18, 2022

New job on instance CitC-PR62 for architecture aarch64-graviton2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3173

date job status comment
Nov 18 04:00:57 AM UTC 2022 submitted job id 3173 awaits release by job manager
Nov 18 04:01:23 AM UTC 2022 released job awaits launch by Slurm scheduler
Nov 18 07:49:08 AM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-graviton2-1668753460.tar.gz (1.507 GiB) in job dir
Nov 20 07:49:28 PM UTC 2022 uploaded transfer of eessi-2022.11-software-linux-aarch64-graviton2-1668753460.tar.gz to S3 bucket succeeded
Nov 20 07:53:18 PM UTC 2022 staged tarball eessi-2022.11-software-linux-aarch64-graviton2-1668753460.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/162 for approval
Nov 20 08:05:31 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-aarch64-graviton2-1668753460.tar.gz approved, see PR https://github.com/trz42/staging/pull/162
Nov 20 08:09:05 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-aarch64-graviton2-1668753460.tar.gz successfully ingested at 2022.11/software/linux/aarch64/graviton2/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 18, 2022

New job on instance Fram-PR62 for architecture x86_64-intel-broadwell in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/4635455

date job status comment
Nov 18 11:13:51 UTC 2022 submitted job id 4635455 awaits release by job manager
Nov 18 11:21:50 UTC 2022 released job awaits launch by Slurm scheduler
Nov 18 15:03:41 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-4635455.out in job dir
  • Slurm output lacks message about created tarball.
  • No tarball matching eessi-*software-*.tar.gz found in job dir.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 18, 2022

New job on instance Betzy-PR62 for architecture x86_64-amd-zen2 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/484480

date job status comment
Nov 18 11:13:52 UTC 2022 submitted job id 484480 awaits release by job manager
Nov 18 11:14:08 UTC 2022 released job awaits launch by Slurm scheduler
Nov 18 11:57:14 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-484480.out in job dir
  • Slurm output lacks message "No missing modules!".

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 19, 2022

New job on instance Fram-PR62 for architecture x86_64-intel-broadwell in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/4637304

date job status comment
Nov 19 19:27:32 UTC 2022 submitted job id 4637304 awaits release by job manager
Nov 19 19:28:08 UTC 2022 released job awaits launch by Slurm scheduler
Nov 19 19:29:10 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-4637304.out in job dir
  • Slurm output lacks message "No missing modules!".
  • Slurm output lacks message about created tarball.
  • No tarball matching eessi-*software-*.tar.gz found in job dir.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 19, 2022

New job on instance Betzy-PR62 for architecture x86_64-amd-zen2 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/485463

date job status comment
Nov 19 19:27:32 UTC 2022 submitted job id 485463 awaits release by job manager
Nov 19 19:28:23 UTC 2022 released job awaits launch by Slurm scheduler
Nov 19 19:30:25 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-485463.out in job dir
  • Slurm output lacks message "No missing modules!".
  • Slurm output lacks message about created tarball.
  • No tarball matching eessi-*software-*.tar.gz found in job dir.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 19, 2022

New job on instance Saga-PR62 for architecture x86_64-intel-cascadelake in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7185038

date job status comment
Nov 19 19:27:36 UTC 2022 submitted job id 7185038 awaits release by job manager
Nov 19 19:28:13 UTC 2022 released job awaits launch by Slurm scheduler
Nov 19 22:01:31 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668895019.tar.gz (1.561 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 19, 2022

New job on instance Saga-PR62 for architecture x86_64-intel-skylake_avx512 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7185039

date job status comment
Nov 19 19:27:38 UTC 2022 submitted job id 7185039 awaits release by job manager
Nov 19 19:28:11 UTC 2022 released job awaits launch by Slurm scheduler
Nov 19 22:22:37 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-skylake_avx512-1668896262.tar.gz (1.561 GiB) in job dir
Nov 20 19:56:54 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-intel-skylake_avx512-1668896262.tar.gz to S3 bucket succeeded
Nov 20 08:29:23 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-intel-skylake_avx512-1668896262.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/170 for approval
Nov 20 08:36:00 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-intel-skylake_avx512-1668896262.tar.gz approved, see PR https://github.com/trz42/staging/pull/170
Nov 20 08:39:19 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-intel-skylake_avx512-1668896262.tar.gz successfully ingested at 2022.11/software/linux/x86_64/intel/skylake_avx512/

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 19, 2022

New job on instance Saga-PR62 for architecture x86_64-generic in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7185040

date job status comment
Nov 19 19:27:39 UTC 2022 submitted job id 7185040 awaits release by job manager
Nov 19 19:28:09 UTC 2022 released job awaits launch by Slurm scheduler
Nov 19 22:17:35 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-generic-1668895964.tar.gz (1.562 GiB) in job dir
Nov 20 19:53:30 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-generic-1668895964.tar.gz to S3 bucket succeeded
Nov 20 08:17:35 PM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-generic-1668895964.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/169 for approval
Nov 20 08:32:23 PM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-generic-1668895964.tar.gz approved, see PR https://github.com/trz42/staging/pull/169
Nov 20 08:35:47 PM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-generic-1668895964.tar.gz successfully ingested at 2022.11/software/linux/x86_64/generic/

@trz42 trz42 added the bot:deploy Instruct bot to deploy built artefacts to Stratum 0 label Nov 20, 2022
@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance Fox-PR62 for architecture x86_64-amd-zen2 in job dir /fp/projects01/ec88/pilot.nessi/PR62/jobs/2022.11/pr_46/153926

date job status comment
Nov 20 21:05:13 UTC 2022 submitted job id 153926 awaits release by job manager
Nov 20 21:05:20 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:07:22 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-amd-zen2-1668978430.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance Fram-PR62 for architecture x86_64-intel-broadwell in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/4646822

date job status comment
Nov 20 21:05:14 UTC 2022 submitted job id 4646822 awaits release by job manager
Nov 20 21:05:56 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:08:59 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-broadwell-1668978514.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance eX3-PR62 for architecture x86_64-amd-zen in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/407110

date job status comment
Nov 20 21:05:05 UTC 2022 submitted job id 407110 awaits release by job manager
Nov 20 21:05:42 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:07:50 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-amd-zen-1668978420.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance eX3-PR62 for architecture x86_64-amd-zen3 in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/407111

date job status comment
Nov 20 21:05:07 UTC 2022 submitted job id 407111 awaits release by job manager
Nov 20 21:05:44 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:18:52 UTC 2022 finished 😢 FAILURE
  • Found slurm output slurm-407111.out in job dir
  • Slurm output lacks message "No missing modules!".

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance Saga-PR62 for architecture x86_64-intel-cascadelake in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7186851

date job status comment
Nov 20 21:05:19 UTC 2022 submitted job id 7186851 awaits release by job manager
Nov 20 21:06:05 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:11:12 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-cascadelake-1668978564.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance eX3-PR62 for architecture aarch64-generic in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/407112

date job status comment
Nov 20 21:05:09 UTC 2022 submitted job id 407112 awaits release by job manager
Nov 20 21:05:38 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:07:46 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-generic-1668978309.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance Saga-PR62 for architecture x86_64-intel-skylake_avx512 in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7186852

date job status comment
Nov 20 21:05:21 UTC 2022 submitted job id 7186852 awaits release by job manager
Nov 20 21:06:03 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:08:07 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-skylake_avx512-1668978440.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance CitC-PR62 for architecture x86_64-intel-haswell in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3178

date job status comment
Nov 20 09:05:21 PM UTC 2022 submitted job id 3178 awaits release by job manager
Nov 20 09:06:08 PM UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 09:14:36 PM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-intel-haswell-1668978729.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance eX3-PR62 for architecture aarch64-thunderx2 in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/407113

date job status comment
Nov 20 21:05:10 UTC 2022 submitted job id 407113 awaits release by job manager
Nov 20 21:05:40 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:07:48 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-thunderx2-1668978309.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance Saga-PR62 for architecture x86_64-generic in job dir /cluster/projects/nn9992k/pilot.nessi/PR62/jobs/2022.11/pr_46/7186853

date job status comment
Nov 20 21:05:22 UTC 2022 submitted job id 7186853 awaits release by job manager
Nov 20 21:06:00 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 21:08:09 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-generic-1668978440.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance CitC-PR62 for architecture aarch64-generic in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3179

date job status comment
Nov 20 09:05:23 PM UTC 2022 submitted job id 3179 awaits release by job manager
Nov 20 09:06:06 PM UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 09:14:32 PM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-generic-1668978692.tar.gz (0.000 GiB) in job dir

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance CitC-PR62 for architecture aarch64-graviton2 in job dir /mnt/shared/home/trz42/pilot.nessi/PR62/jobs/2022.11/pr_46/3180

date job status comment
Nov 20 09:05:24 PM UTC 2022 submitted job id 3180 awaits release by job manager
Nov 20 09:06:03 PM UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 09:14:34 PM UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-aarch64-graviton2-1668978695.tar.gz (0.000 GiB) in job dir

@trz42
Copy link
Owner

trz42 commented Nov 20, 2022

Trying to rebuild for zen3 ☝️ (all others are already ingested and should return quickly). resubmit script is not yet in place again. That would have been the first choice to rerun a failed build job.

@trz42
Copy link
Owner

trz42 commented Nov 20, 2022

Build on zen3 failed again. Stop instances on other resources, reconfigure ex3 to only build for zen3 and resend event.

@eessi-bot-devel-trz42
Copy link

eessi-bot-devel-trz42 bot commented Nov 20, 2022

New job on instance eX3-PR62 for architecture x86_64-amd-zen3 in job dir /home/thomarob/pilot.nessi/PR62/jobs/2022.11/pr_46/407117

date job status comment
Nov 20 21:52:52 UTC 2022 submitted job id 407117 awaits release by job manager
Nov 20 21:52:57 UTC 2022 released job awaits launch by Slurm scheduler
Nov 20 23:58:08 UTC 2022 finished 😁 SUCCESS tarball eessi-2022.11-software-linux-x86_64-amd-zen3-1668988490.tar.gz (1.562 GiB) in job dir
Nov 21 04:19:36 UTC 2022 uploaded transfer of eessi-2022.11-software-linux-x86_64-amd-zen3-1668988490.tar.gz to S3 bucket succeeded
Nov 21 04:21:19 AM UTC 2022 staged tarball eessi-2022.11-software-linux-x86_64-amd-zen3-1668988490.tar.gz downloaded to S0,
merge PR https://github.com/trz42/staging/pull/172 for approval
Nov 21 04:49:56 AM UTC 2022 approved 👍 tarball eessi-2022.11-software-linux-x86_64-amd-zen3-1668988490.tar.gz approved, see PR https://github.com/trz42/staging/pull/172
Nov 21 04:53:13 AM UTC 2022 ingested 🎉 tarball eessi-2022.11-software-linux-x86_64-amd-zen3-1668988490.tar.gz successfully ingested at 2022.11/software/linux/x86_64/amd/zen3/

@trz42 trz42 merged commit 33f97b3 into trz42:nessi.no-2022.11 Nov 21, 2022
@poksumdo poksumdo deleted the nessi.no-2022.11-GCC-9.3.0_10.3.0 branch January 15, 2023 20:52
trz42 added a commit that referenced this pull request Mar 18, 2023
…chmarks-5.7.1-GCCcore-10.3.0

Add OSU-Micro-Benchmarks/5.7.1 with GCC/10.3.0 to NESSI/2022.11
trz42 pushed a commit that referenced this pull request Mar 18, 2023
Sync lmod branch with dev branch (pulling in #46)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:build Instruct bot to build software stack bot:deploy Instruct bot to deploy built artefacts to Stratum 0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants