-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make sure string function index bounds are correctly documented #25478
Comments
The same with In particular the part of docs of
and related part of docs of
Also see #24255 (I did not update that PR because I do not know what is the expected behavior). |
The docs are what I initially thought the bounds should be, what they actually ended up being is:
Reasoning: it's useful and common to want to pass a just-out-of-bounds index and it makes sense to do so as long as the operation isn't going to go even more out of bounds. So the full in-bounds-or-just-out-of-bounds range of indices is ok for |
Thanks. The only missing piece of specification is how:
This is actually what is stopping me with #24255. Or maybe it is unspecified (i.e. it should return any value that is not within bounds of the string - then the current implementation is OK - only performance should be improved) |
Does it matter? You can't do any computations with those out-of-bounds indices anyway. The only thing you can really do with such an index is check if it is out of bounds or not and compare it to other indices. So I think the only constraints need to be that the functions are consistent when called with the same string object and that they should be strictly monotonic in |
The issue is about notion of:
If we define it as identical under If we define it as two strings identical under
strings It is OK for me to leave it as is, but I want to have a clear documenation. The reason is that people develop custom |
And PR #25525 shows that it might matter (I have spotted that problem because of inconsistent behavior of |
|
I would be fine with defining imaginary out-of-bounds characters to all be exactly 1 code unit. The |
All you write is exactly what I want to say. I think (I will check it to be 100% sure) that we can simply remove methods
Currently it should not have a performance penalty I think because The only issue is that in the future probably those generic methods should be specialized for So In short - I believe there is a simple way to make I could make an appropriate PR showing this approach - should I (I am asking because you are assigned to this issue). |
We can add
Please do! Assignment just means I'm responsible to make sure it gets done, not that I have to actually do. Thank you! |
Apart from PRs I have made going back to the code of this issue I want to make sure this is intended:
and also
which is not covered by the contract given in the docs (I do not know if it was intended or not). Those three cases make me realize that actually those the only three places in the whole Base where I fully agree with:
but then the question is in what situations it is good that In Base those are only those three places I have indicated, and you may decide that:
Fortunately |
You may want to try changing the bounds to |
Thanks. Actually this led me to finding some more bugs:
I did the check what you ask and here are my conclusions:
everything else should not be impacted (fortunately So in short the narrow
I am indifferent, i.e. I can live with both definitions given they are properly documented as both approaches have their benefits and costs. |
@nalimilan - you were working with those |
In the initial PR I wasn't a fan of |
One additional benefit of wide In short: @StefanKarpinski - I think we have all cards about |
@StefanKarpinski: please just indicate your final preference as there are bugs whose fixing depends on the decision 😄. |
I need some time to review all of these issues and changes. I'll be able to look at it some time today. |
While we're here, why abbreviate |
We are done with my PRs cleaning up the errors in the implementations of the related functions I could find. They now all work assuming So we can leave it as is and start writing precise documentations the functions or reconsider range of accepted values in @StefanKarpinski I know this is a busy period in Julia development but this change will be seriously breaking for strings ecosystem if we want to change it in the future. I can live with both definitions. You have worked on this more and longer so I leave it for you to decide 😄. |
I do not know if it is the right place too keep track of this, but somewhere in the documentation we should note that
as this is a natural property people could assume in their code. |
It is additive for valid UTF-8 strings, however, which could be noted at the same time. |
https://discourse.julialang.org/t/documentation-for-thisind-doesnt-match-behavior/8239
The text was updated successfully, but these errors were encountered: