Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Eiger not recognized after upgrade #1948

Closed
lucamar opened this issue Apr 26, 2021 · 10 comments
Closed

System Eiger not recognized after upgrade #1948

lucamar opened this issue Apr 26, 2021 · 10 comments

Comments

@lucamar
Copy link
Contributor

lucamar commented Apr 26, 2021

I have realized that Eiger is not recognized any longer after the recent system upgrade, using the updated config.py available in my fork and especially the updated folder cscs-checks in the branch alps-21.04 under my fork. The list of matched checks is empty since the system is not set in reframe.log:

$ grep SYSTEM reframe.log 
  RFM_SYSTEM=<not set>

This happened after I merged the latest ReFrame master in my fork on Friday April 23rd: I now need to add --system eiger, otherwise RFM_SYSTEM is not set, as shown above. The last commit is below, do you think there's any change that might affect mapping uan0[2-3] to eiger? lucamar@0ecb419

The issue occurs both with the last reframe script 3.6.0-dev.3+0ecb419e and with the one provided by the latest stable release 3.5.3 too: it does not occur if I don't use the flag -c ~/GitHub/lucamar/reframe/cscs-checks -R with the update list of checks:

ssh eiger
ml reframe
reframe -C /users/lucamar/GitHub/lucamar/reframe/config/cscs.py -l 

[List of matched checks]
- VcSimdTest (found in '/apps/eiger/UES/jenkins/1.4.0/software/reframe/3.5.3/cscs-checks/microbenchmarks/cpu/simd/vc.py')
Found 1 check(s)

If I use the updated list of checks instead, the system is not set and the list of matched checks is empty:

ssh eiger
ml reframe
reframe -C /users/lucamar/GitHub/lucamar/reframe/config/cscs.py -c ~/GitHub/lucamar/reframe/cscs-checks -R -l

[List of matched checks]
Found 0 check(s)

I don't know if I have introduced a regression in the checks, but the system Eiger was recognized before I merged commit 0ecb419edea6c3692521c97cdf3bd44b67125414 last Friday without setting it manually. I have compared the check vc.py:

diff -BEiyw --suppress-common-lines ~/GitHub/lucamar/reframe/cscs-checks/microbenchmarks/cpu/simd/vc.py /apps/eiger/UES/jenkins/1.4.0/software/reframe/3.5.3/cscs-checks/microbenchmarks/cpu/simd/vc.py
        if self.current_system.name in ['eiger', 'pilatus']:  |	        self.valid_prog_environs = ['builtin']
             self.valid_prog_environs = ['cpeGNU']	      <
        else:						      <
             self.valid_prog_environs = ['PrgEnv-gnu']	      <
							      <
            'module list',				      <
                'speedup': (1.32, -0.2, 0.2, '')	      <
            },						      <
            'pilatus:mc': {				      <

Any advice would be really appreciated, thanks in advance!

@teojgo
Copy link
Contributor

teojgo commented Apr 26, 2021

@lucamar try setting the hostname to alps in the configuration. Reframe first tests for the /etc/xthostname.

@lucamar lucamar changed the title System Eiger not recognised after upgrade System Eiger not recognized after upgrade Apr 26, 2021
@lucamar
Copy link
Contributor Author

lucamar commented Apr 26, 2021

Thanks for the advice @teojgo: however also after creating a new entry alps in config/cscs.py, the system is still RFM_SYSTEM=<not set> in the log. For the time being I will keep using --system=$CLUSTER_NAME on Eiger.

@teojgo
Copy link
Contributor

teojgo commented Apr 26, 2021

@lucamar are you passing the -C config/cscs.py flag?

@vkarak
Copy link
Contributor

vkarak commented Apr 26, 2021

@teojgo @lucamar This looks like a bug that it was not present in 3.5.0. See also the discussion in #1930.

@teojgo
Copy link
Contributor

teojgo commented Apr 26, 2021

Thanks for the advice @teojgo: however also after creating a new entry alps in config/cscs.py, the system is still RFM_SYSTEM=<not set> in the log. For the time being I will keep using --system=$CLUSTER_NAME on Eiger.

I meant change the hostname value for eiger to alps

@lucamar
Copy link
Contributor Author

lucamar commented Apr 26, 2021

I meant change the hostname value for eiger to alps

Yes, I did it: see the updated config file in my fork. I decided to try adding a system alps with partitions eiger and pilatus besides login, it may be more convenient (I still kept eiger and pilatus as systems, we can always remove them later).

Anyway, it seems that I was looking at the wrong line in the log file, since it does not define RFM_SYSTEM, even though with @teojgo's advice a check is found on hostname alps (I have defined it in UlimitCheck within my fork):

/users/lucamar/GitHub/lucamar/reframe/bin/reframe -C /users/lucamar/GitHub/lucamar/reframe/config/cscs.py -c /users/lucamar/GitHub/lucamar/reframe/cscs-checks --recursive --prefix /scratch/e1000/lucamar/reframe-eiger --keep-stage-files --failure-stats -t prod -l

[ReFrame Setup]
  version:           3.6.0-dev.3+6cca68fb
  command:           '/users/lucamar/GitHub/lucamar/reframe/bin/reframe -C /users/lucamar/GitHub/lucamar/reframe/config/cscs.py -c /users/lucamar/GitHub/lucamar/reframe/cscs-checks --recursive --prefix /scratch/e1000/lucamar/reframe-eiger --keep-stage-files --failure-stats -t prod -l'
  launched by:       lucamar@uan03
  working directory: '/scratch/e1000/lucamar/reframe-eiger'
  settings file:     '/users/lucamar/GitHub/lucamar/reframe/config/cscs.py'
  check search path: (R) '/users/lucamar/GitHub/lucamar/reframe/cscs-checks'
  stage directory:   '/scratch/e1000/lucamar/reframe-eiger/stage'
  output directory:  '/scratch/e1000/lucamar/reframe-eiger/output'

[List of matched checks]
- UlimitCheck (found in '/users/lucamar/GitHub/lucamar/reframe/cscs-checks/prgenv/ulimit_check.py')
Found 1 check(s)

At the same time:

 grep SYSTEM reframe.log 
  RFM_SYSTEM=<not set>

@vkarak
Copy link
Contributor

vkarak commented Apr 26, 2021

FYI: the best way to check which system has been picked, use: reframe --show-config=systems/0/name. @lucamar Did you only see this after you pulled from master? Did it work on the upgraded system before pulling? Can you point me to your config file before adding the alps system?

@lucamar
Copy link
Contributor Author

lucamar commented Apr 26, 2021

This was my local config/cscs.py last week: https://github.com/lucamar/reframe/blob/68c78f07a93fbabe4c37b6985f34c54f9057c1f3/config/cscs.py
I recall that I did not need to set --system=eiger until Friday afternoon.

@vkarak
Copy link
Contributor

vkarak commented Apr 26, 2021

Even with that config, I can't reproduce the error:

[19:18:31] karakasv@uan02 [~/Devel/reframe][(68c78f07...)]
$ ./bin/reframe -C config/cscs.py --show-config=systems/0/name
./bin/reframe: could not initialize the graylog handler; ignoring ...
"generic"

Perhaps, there was a system change and now the /etc/xthostname is available and makes ReFrame pick this up instead of hostname as pointed out by @teojgo.

@vkarak vkarak removed this from the ReFrame sprint 21.04.2 milestone Apr 27, 2021
@vkarak
Copy link
Contributor

vkarak commented Apr 27, 2021

In our case, it is now confirmed that the difference was due to how /etc/xthostname was defined. I guess we can close this issue now.

@lucamar lucamar closed this as completed Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants