Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production job failed because of incorrect locale settings #46850

Closed
makortel opened this issue Dec 2, 2024 · 16 comments
Closed

Production job failed because of incorrect locale settings #46850

makortel opened this issue Dec 2, 2024 · 16 comments

Comments

@makortel
Copy link
Contributor

makortel commented Dec 2, 2024

https://its.cern.ch/jira/browse/CMSPROD-226 shows production job failures (on 16_0_43) caused by incorrect locale settings. Already scram warns

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "C.UTF-8",
	LANG = "C"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

and then later an exception

----- Begin Fatal Exception 23-Oct-2024 05:53:31 UTC-----------------------
An exception of category 'StdException' occurred while
   [0] Processing global begin LuminosityBlock run: 1 luminosityBlock: 7001
   [1] Calling method for module Pythia8GeneratorFilter/'generator'
Exception Message:
A std::exception was thrown.
locale::facet::_S_create_c_locale name not valid
----- End Fatal Exception -------------------------------------------------

is thrown from

auto tmp_path = boost::filesystem::unique_path(tmp_dir);

(that was run via ExternalGeneratorFilter).

The error is reproducible locally (in cmssw-el7 container) e.g. by setting LC_ALL=C.UTF-8 before running cmsRun.

I assume we could have also other code that could fail when the locale is set to an incorrect value. I wonder how CMS applications should behave if locale is being set incorrectly. On one hand it feels an invalid locale feels a bit silly reason to fail a job, but on the other hand, how much can we trust on string interpretation if the locale is invalid? Furthermore, I think we have been rather ignorant on locale settings so far, and in any case in the production use case all strings (e.g. job configuration) originate from somewhere else than the worker node (i.e. if we'd really want to handle locales properly, we'd have to propagate the locale from wherever e.g. the configuration file is created; or something).

Should we perhaps enforce some locale in all scram environments?

@makortel
Copy link
Contributor Author

makortel commented Dec 2, 2024

assign core

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 2, 2024

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 2, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 2, 2024

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Dec 3, 2024

@smuzaffar Do we have anything about locales in our containers? For example, I see (via locale -a) the C.utf8 in cmssw-el9 and cmssw-el8, but not in cmssw-el7.

@smuzaffar
Copy link
Contributor

@makortel , for el8 and el9 we installed glibc-langpack-en (cms-sw/cms-docker#186) to avoid these warnings from perl/scram. This package is not available for centos7. Also for production jobs we use old cmssw/cms:rhel7 (https://github.com/cms-sw/cms-docker/blob/master/cms/tags.yaml#L7-L10) container while for el8 and el9 we use cmssw-elX images.

Instead of rebuilding cmssw/cms:rhel7 which can break things I suggest that for slc7 we update the scram runtime site hook to set LC_CTYPE=C

@makortel
Copy link
Contributor Author

makortel commented Dec 3, 2024

I suggest that for slc7 we update the scram runtime site hook to set LC_CTYPE=C

My feeling (without diving deeply into intricacies of locales) is that could be a good idea. Or maybe setting LC_ALL?

Would the scram hook apply "immediately" to all jobs?

@smuzaffar
Copy link
Contributor

My feeling (without diving deeply into intricacies of locales) is that could be a good idea. Or maybe setting LC_ALL?

we can set both

Would the scram hook apply "immediately" to all jobs?

yes, it will apply to all jobs when they setup cmssw env cmsenv

@smuzaffar
Copy link
Contributor

smuzaffar commented Dec 5, 2024

cms-sw/cmsdist#9554 adds the new scram runtime site-hook which sets LC_ALL=C for slc6/slc7 env. Once deployed on /cvmfs/cms.cern.ch then cmsenv should set this env

@smuzaffar
Copy link
Contributor

cms-sw/cmsdist#9554 is ready but we will deploy it on Monday

@makortel
Copy link
Contributor Author

makortel commented Dec 9, 2024

I ran a full runTheMatrix.py with a non-existent locale (on CMSSW_14_2_0_pre4 el8) and did not see any failures.

@vlimant
Copy link
Contributor

vlimant commented Dec 10, 2024

can you please confirm whether we need a new release for this, or the cmssw container updated automatically and deployed for production ?

@smuzaffar
Copy link
Contributor

@vlimant , there is no need of new release. scram site hook has been deployed on CVMFS and should be automatically picked up by production jobs

@makortel
Copy link
Contributor Author

I think we can then close the issue

@makortel
Copy link
Contributor Author

+core

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants