-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String #11735
Conversation
@JeffBezanson @stevengj @tkelman Could one or all of you release your 🍅s on this one? Thanks! |
sizehint!(buf, length(str.data)) | ||
for ch in str | ||
c2 = fun(ch) | ||
!isa(c2, Char) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this structure result in coverage reporting separate hits for both lines? I don't see a test that hits this error path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now added tests for that.
4ad0aa0
to
192bb91
Compare
Bump! Does this look good to go now? Thanks! |
You didn't answer my question, do separate hits show up for both lines in the .cov files with that code structure? |
A thought on these Unicode tests: it could be good to have these tests cover some characters outside the normal ASCII characters, just to avoid regressions. |
About the .cov files, I think there is some problem with the coverage output... Yes, I can add some more tests... I was just hurrying! |
Okay then considering that limitation/bug, there's no reason to go for the 2-line short-circuit formatting. Either use the 1-liner short-circuiting version and be aware that coverage won't say whether the error path is tested, or use a 3-line |
It's not working consistently for the the line case either! |
Did you run without inlining? ref #9493 |
What do I need to do? #9493 is 196 sometimes long comments... |
I did the following:
and got this result: - function map(fun, str::UTF16String)
9 buf = UInt16[]
1 sizehint!(buf, length(str.data))
1 for ch in str
1 c2 = fun(ch)
1 if !isa(c2, Char)
0 throw(UnicodeError(UTF_ERR_MAP_CHAR, 0, 0))
- end
0 uc = reinterpret(UInt32, c2)
1 if uc < 0x10000
5 if utf16_is_surrogate(UInt16(uc))
5 throw(UnicodeError(UTF_ERR_INVALID_CHAR, 0, uc))
- end
2 push!(buf, UInt16(uc))
2 elseif uc <= 0x10ffff
2 push!(buf, UInt16(0xd7c0 + (uc >> 10)))
0 push!(buf, UInt16(0xdc00 + (uc & 0x3ff)))
- else
1 throw(UnicodeError(UTF_ERR_INVALID_CHAR, 0, uc))
- end
- end
2 push!(buf, 0)
3 UTF16String(buf)
- end |
It makes no difference, 1 line |
@tkelman? Not sure what I should do from here, since the unit tests show that every path is covered, but the coverage test is messed up... (or there's something I'm missing in running it). Thanks! |
I haven't the slightest clue how the coverage code is implemented. That might be a bug, in which case it should be reported separately, ideally with a standalone reproduction case that can be run from an unmodified build of master, so people who do know what's going on can determine the problem. |
192bb91
to
9d2cf60
Compare
But you see why I couldn't exactly answer your question about the results of the coverage test, and why that is not relevant to whether or not this change is ready, right? |
As far as you know, did I call the coverage test you wanted correctly? |
Did you also delete |
I tried to follow the instructions, but there is no
Should I delete some other file? |
Would it be |
Removing
|
Yes it would be .dylib on mac (or .dll for windows). That LLVM error is probably a separate bug. What version of LLVM are you using? |
LLVM 3.6.1 usually... I'd forgotten that I'd changed that... I'll try to rebuild with default and retest... ugh! |
Tip: since you're a frequent contributor and like to be on the bleeding edge, it might be worth it having multiple build trees available (I have between 2 and 4 at any given time). |
Please, how do you do that? Have different repositories? (Ah, you've noticed that I like to be on the bleeding edge? 😀) I really want to try using LLVM 3.7 as well, but I haven't figured out yet what I need to do... |
Just different directories with different names. The top level dir defaults to $ git clone [email protected]:JuliaLang/julia.git my_super_secret_totally_awesome_julia_build |
So, it is a different local repository... OK, good idea, thanks! |
It's not even all that necessary to have separate checkouts (you'll need to wait for all of openblas to rebuild in the new copy, and we don't change versions of that very often). I personally keep multiple LLVM versions within the same checkout, they'll exist side-by-side under |
Also helpful are |
Wow! Thanks for all the great info! 37 !isa(c2, Char) &&
1 throw(UnicodeError(UTF_ERR_MAP_CHAR, 0, 0)) Does correctly report coverage... @kshyatt should know about it. |
My personal vote goes to the 3-line Your mac issue with LLVM 3.6.1 is probably known and fixed by #11515, but only on LLVM 3.7. Keno had some long-standing patches to LLVM that took a while to get merged. |
My personal view is that, if we have to do 2 lines like you've quoted to get good coverage, then the |
OK, I'm perfectly happy with the 3-line, I was just going by one of my many review comments, and I think @kshyatt's great coverage work shows that that is more important than just conciseness. |
I use the |
Better yet! Thanks, all of the suggestions have been helpful! |
Bump! Anything else to do here? |
I think it should be fine, will merge in a day or so if no other feedback. P.S: There's really no need to bump open PR's every day. People are busy and aware there are lots of open PR's. If you think something has been forgotten about, waiting at least a couple of days before posting a reminder will help you come across as a little less pushy. |
OK, sure thing! Sorry! |
Needs a rebase apparently. Was about to merge otherwise. |
9d2cf60
to
1e02cb2
Compare
…ase/map `uppercase`/`lowercase`/`map` on a `UTF16String` now returns a `UTF16String`, consistent with `ASCIIString`, `UTF8String`, and `UTF32String` returning the same type string as input.
1e02cb2
to
3e591e0
Compare
OK, it's hard to keep up with all the changes... It's now rebased. |
Again, Appveyor hit by #11807, argh! |
Looks good to me; can we get the tests to pass by re-running the tests now that 11807 is closed? |
Not until #11818 gets fixed. |
Once again, the failure wasn't do to my change, 3 passed on Travis, and 1 on AppVeyor. |
The other string types,
ASCIIString
,UTF8String
, andUTF32String
all return a string of the same type, butUTF16String
would always return aUTF8String
, creating type instability.This also adds testing for all 4 basic string types.