-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDK] improve PVC creation name error #2496
[SDK] improve PVC creation name error #2496
Conversation
7036b29
to
e005825
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Basically LGTM, just a small comment.
# RFC 1123 regex for valid PVC names: lowercase alphanumeric, '-', or '.'. | ||
return bool( | ||
re.match( | ||
r"^[a-z0-9]([a-z0-9\-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9\-]*[a-z0-9])?)*$", name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using the same regex format as shown in the error message improve readability and maintainability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you please elaborate what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean using this regex format in the ValueError
message: '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'
, since it took me a while to compare if they stand for the same thing. Or is there specific reason you changed the format a little bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for bringing this to my attention. shall i add any unit test for it? please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, there are two failing ci tests, i'm guessing they are flaky tests, would there be any problem that this change have caused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the unit tests for tune
API is still under review, I think you can add your unit test after that one is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the CI test failures are caused by resource problems. I've rerun the tests once, but one of them still failed due to network connectivity issue. Maybe we can try running them again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your time and help in this matter.
/rerun-all |
3 similar comments
/rerun-all |
/rerun-all |
/rerun-all |
Thanks for the contribution! /lgtm |
is it fine if i add a few unit tests after the main unit test pr got merged - keeping this pr open till then? |
/assign @tenzen-y @andreyvelich |
The main unit test is already approved, but it seems the CI test is still in progress. I think we can merge this PR first, and it would be better to open a new PR to add unit tests for this. |
@@ -557,6 +557,20 @@ class name in this argument. | |||
# Create PVC for the Storage Initializer. | |||
# TODO (helenxie-bit): PVC Creation should be part of Katib Controller. | |||
try: | |||
if not utils.is_valid_pvc_name(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering what is the goal to introduce additional validation on top of Kubernetes default validation ?
Are we trying to make this message more user friendly ?
cc @kubeflow/wg-training-leads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, basically my goal was to make the message user friendlier. i'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reconcile this error similar to 409 (CRD is already exist), so we won't introduce additional validation ?
katib/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py
Lines 138 to 143 in 05dbea6
except Exception as e: | |
if hasattr(e, "status") and e.status == 409: | |
raise Exception( | |
f"A Katib Experiment with the name " | |
f"{namespace}/{experiment_name} already exists." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right - i'll change it.
c74b4d6
to
3273e0f
Compare
…h correct name example Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
3273e0f
to
651b231
Compare
done. i kindly ask for your review. |
@@ -569,6 +569,11 @@ class name in this argument. | |||
), | |||
) | |||
except Exception as e: | |||
if hasattr(e, "status") and e.status == 422: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we should move this check after this line since the error belongs to that scenario:
else: |
Additionally, to make the error easier to understand, we could tweak the error message a bit. How about this:
if hasattr(e, "status") and e.status == 422:
raise ValueError(
f"An Experiment with the name {name} is not valid: the name must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character."
)
else:
raise RuntimeError(f"failed to create PVC. Error: {e}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the ValueError message (with a bit of change), with combining it with else, not really - since then we need to wait for this steps first:
pvc_list = self.core_api.list_namespaced_persistent_volume_claim(
namespace=namespace
)
# Check if the PVC with the specified name exists.
for pvc in pvc_list.items:
if pvc.metadata.name == name:
print(
f"PVC '{name}' already exists in namespace " f"{namespace}."
)
break
so my idea is the function fails fast when the name is invalid then if its valid, continues with the check for the existence of the name. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: mahdikhashan <[email protected]>
Thanks for the contribution! /lgtm |
f"alphanumeric characters ('a-z', '0-9'), hyphens ('-'), or periods ('.'). " | ||
f"It must also start and end with an alphanumeric character." | ||
) | ||
|
||
pvc_list = self.core_api.list_namespaced_persistent_volume_claim( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also simplify this logic similar to this one:
katib/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py
Lines 140 to 143 in d6c7319
raise Exception( | |
f"A Katib Experiment with the name " | |
f"{namespace}/{experiment_name} already exists." | |
) |
E.g. if status_code is 409 we just print that PVC already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add more details to the error message, it’ll make it easier for users to understand, which is the goal of this PR. But you’re right—Kubernetes API will also return detailed error reasons. So it depends on whether we want to keep the error messages consistent across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is a bit more leaned to Helens. But I'm open to any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant that all of these:
pvc_list = self.core_api.list_namespaced_persistent_volume_claim(
namespace=namespace
)
# Check if the PVC with the specified name exists.
for pvc in pvc_list.items:
if pvc.metadata.name == name:
print(
f"PVC '{name}' already exists in namespace " f"{namespace}."
)
break
else:
raise RuntimeError(f"failed to create PVC. Error: {e}")
can be replaced to
elif hasattr(e, "status") and e.status == 409:
print(f"PVC '{name}' already exists in namespace " f"{namespace}.")
else:
raise RuntimeError(f"failed to create PVC. Error: {e}")
Does it make sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, SGTM 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Signed-off-by: mahdikhashan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @mahdikhashan!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR handles potential name errors for PVCs gracefully.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2491
Checklist: