Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmslinks issues #803

Merged
merged 1 commit into from
May 31, 2024
Merged

Cmslinks issues #803

merged 1 commit into from
May 31, 2024

Conversation

eachristgr
Copy link
Contributor

@eachristgr eachristgr commented May 23, 2024

As noticed during resolving issue CMSDM-173, cmslinks.py could not be executed successfully from the corresponding cronjob and its logs were not visible in the corresponding container's logs.

Details can be found in the corresponding issue #804.

@haozturk haozturk requested a review from dynamic-entropy May 23, 2024 14:25
Copy link
Contributor

@dynamic-entropy dynamic-entropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly

  • Squash these commits or modify the commit message to represent the changes
  • Also, describe the issue and the change please

I haven't gone through the entire PR, but made a small comment about the logging level.
Cheers

@@ -235,10 +251,18 @@ def update(self, overwrite=False, disable=True, dry=False, srcselect=r'\S+', dst

OPTIONS = PARSER.parse_args()

# Configure logger
# Add new logging level, so usefull info is seperated from info
USEFULL_INFO_NUM = 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a custom logging level?

Copy link
Contributor Author

@eachristgr eachristgr May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the script there are info and warning logs. By default logging level is set to info, however info logs mainly populate the logs instead of providing helpful information. In addition, there is information that there is need to be logged, for example when a new distance is set or when a distance is overwritten. So, a new log level, that sits between info and warning levels, was set for this king of information. This new level was set also as default sο that there is no unneeded filling of the logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove/modify the level for existing logs. I believe the 5 default log levels are enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unimportant logs moved to debug level

logging.warning(f'Problem getting PNN from gitlab for RSE {rse} SITE {site.get("site")}. Error: {str(e)}')

# If PNN could be retrieved skip to the next RSE (do not add entry to the list)
if pnn is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add the continue under exceptions above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that the desired 'rse' field may be empty. Moving continue under the exception above will skip to the next site. However, in case that no sites have the 'rse' field set, the pnn value will remain 'None'. Appending this value to self.rselist will create problems in other parts of the code. So, preventing RSEs with pnn value of None to be appended in the list is mandatory.

@@ -182,21 +193,26 @@ def update(self, overwrite=False, disable=True, dry=False, srcselect=r'\S+', dst
if dry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can refactor this and everywhere else to:

logging.log ("message")
if not dry_run:
    //perform the actual operation

Copy link
Contributor

@dynamic-entropy dynamic-entropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to refactor the existing code.

@eachristgr
Copy link
Contributor Author

It is worth mentioning that since the script was not be running, applying script changes and running the cronejob with the updated code will apply several changes in the distances of most RSEs.

@dynamic-entropy
Copy link
Contributor

Can you find out which distances will be effected and from what to what?
( A dry-run should tell you that).

Also, remember to re-request review when ready.

@eachristgr
Copy link
Contributor Author

Can you find out which distances will be effected and from what to what? ( A dry-run should tell you that).

Also, remember to re-request review when ready.

Changes have been made. Request ready to be reviewed again.

Commits are squashed.

cmslinks_dryrun_logs.txt contains logs from a dry run. In the first lines you can see warnings regarding pnn issue and the rest is logs regarding distances.

Link summary: updated 657, disabled 1383, created 911

Copy link
Contributor

@dynamic-entropy dynamic-entropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Try to make sense of the results of the dry-run.
I have assigned this to you: https://its.cern.ch/jira/browse/CMSTRANSF-515

@eachristgr
Copy link
Contributor Author

As mentioned in the Jira issue, I would like to highlight following points.

  • Distances will be updated following policies that are mentioned on the top of cmslinks.py:
DEFAULT_EXCLUDE_LINKS = (
    {'dest': {'type': 'temp'}, 'src': {}},
    {'dest': {'rse': 'T2_US_Caltech'}, 'src': {'rse': 'T2_US_Caltech_Ceph'}},
    {'dest': {'rse': 'T1_UK_RAL_Tape_Test'}, 'src': {}},
    {'dest': {}, 'src': {'rse': 'T1_UK_RAL_Tape_Test'}},
    {'dest': {'rse': 'T1_UK_RAL_Tape'}, 'src': {'rse': '^(?!T1_UK_RAL_Disk|T0_CH_CERN_Disk).*$'}},
    {'dest': {'rse': '^(?!T1_UK_RAL_Disk).*$'}, 'src': {'rse': 'T1_UK_RAL_Tape'}},
)

DEFAULT_DISTANCE_RULES = {'site': 1, 'region&country': 4, 'country': 7, 'region': 10, 'other': 13}
  • For T1|T2 RSEs with 'cms_type' = 'real' (cmslinks_dryrun_logs_prod.txt), it is noted that:

    • There are no changes between them.
    • The policies above affecting most of the RSEs e.g. disabling links to RSEs with 'type' = 'temp' and to|from 'T1_UK_RAL_Tape_Test'
    • Distances to|from T3_CH_CERN_CTA_CastorTest|T3_CH_CERN_CTA_Test are set 13.

From my understanding the DEFAULT_EXCLUDE_LINKS is up to date with the changes requested in the Jira issue. Let me know what you think.

@dynamic-entropy
Copy link
Contributor

Hello @eachristgr
Thanks for this.
Can you check if the T3_CH_CERN_CTA_CastorTest and T3_CH_CERN_CTA RSEs exist as "alive" rses, i.e. if they haven't been deleted?

@dynamic-entropy dynamic-entropy merged commit f3b8eac into dmwm:master May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants