Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String #11735

ScottPJones · 2015-06-17T13:57:30Z

The other string types, ASCIIString, UTF8String, and UTF32String all return a string of the same type, but UTF16String would always return a UTF8String, creating type instability.
This also adds testing for all 4 basic string types.

ScottPJones · 2015-06-18T23:32:13Z

@JeffBezanson @stevengj @tkelman Could one or all of you release your 🍅s on this one? Thanks!

tkelman · 2015-06-18T23:45:18Z

base/utf16.jl

+    sizehint!(buf, length(str.data))
+    for ch in str
+        c2 = fun(ch)
+        !isa(c2, Char) &&


Does this structure result in coverage reporting separate hits for both lines? I don't see a test that hits this error path.

I've now added tests for that.

ScottPJones · 2015-06-19T12:18:05Z

Bump! Does this look good to go now? Thanks!

tkelman · 2015-06-19T16:46:35Z

You didn't answer my question, do separate hits show up for both lines in the .cov files with that code structure?

IainNZ · 2015-06-19T18:51:12Z

A thought on these Unicode tests: it could be good to have these tests cover some characters outside the normal ASCII characters, just to avoid regressions.

ScottPJones · 2015-06-19T18:57:39Z

About the .cov files, I think there is some problem with the coverage output...
It's not reporting at all on the line where the error is printed out, even though the error is definitely there.

Yes, I can add some more tests... I was just hurrying!

tkelman · 2015-06-19T19:03:51Z

It's not reporting at all on the line where the error is printed out, even though the error is definitely there.

Okay then considering that limitation/bug, there's no reason to go for the 2-line short-circuit formatting. Either use the 1-liner short-circuiting version and be aware that coverage won't say whether the error path is tested, or use a 3-line if .. end

ScottPJones · 2015-06-19T20:02:01Z

It's not working consistently for the the line case either!

tkelman · 2015-06-19T20:23:14Z

Did you run without inlining? ref #9493

ScottPJones · 2015-06-19T20:25:45Z

What do I need to do? #9493 is 196 sometimes long comments...

ScottPJones · 2015-06-19T20:40:48Z

I did the following:

ScottsRetinaMBP:julia scott$ rm base/*.cov
ScottsRetinaMBP:julia scott$ rm test/*.cov
ScottsRetinaMBP:julia scott$ usr/bin/julia --inline=no --code-coverage=all test/runtests.jl "strings"
     * strings              in  16.70 seconds
    SUCCESS

and got this result:

        - function map(fun, str::UTF16String)
        9     buf = UInt16[]
        1     sizehint!(buf, length(str.data))
        1     for ch in str
        1         c2 = fun(ch)
        1         if !isa(c2, Char)
        0             throw(UnicodeError(UTF_ERR_MAP_CHAR, 0, 0))
        -         end
        0         uc = reinterpret(UInt32, c2)
        1         if uc < 0x10000
        5             if utf16_is_surrogate(UInt16(uc))
        5                 throw(UnicodeError(UTF_ERR_INVALID_CHAR, 0, uc))
        -             end
        2             push!(buf, UInt16(uc))
        2         elseif uc <= 0x10ffff
        2             push!(buf, UInt16(0xd7c0 + (uc >> 10)))
        0             push!(buf, UInt16(0xdc00 + (uc & 0x3ff)))
        -         else
        1             throw(UnicodeError(UTF_ERR_INVALID_CHAR, 0, uc))
        -         end
        -     end
        2     push!(buf, 0)
        3     UTF16String(buf)
        - end

ScottPJones · 2015-06-19T20:43:05Z

It makes no difference, 1 line &&, 2 line && or if, it's always incorrect...

ScottPJones · 2015-06-19T23:32:50Z

@tkelman? Not sure what I should do from here, since the unit tests show that every path is covered, but the coverage test is messed up... (or there's something I'm missing in running it). Thanks!

tkelman · 2015-06-19T23:36:02Z

I haven't the slightest clue how the coverage code is implemented. That might be a bug, in which case it should be reported separately, ideally with a standalone reproduction case that can be run from an unmodified build of master, so people who do know what's going on can determine the problem.

ScottPJones · 2015-06-19T23:41:04Z

But you see why I couldn't exactly answer your question about the results of the coverage test, and why that is not relevant to whether or not this change is ready, right?

ScottPJones · 2015-06-19T23:41:52Z

As far as you know, did I call the coverage test you wanted correctly?

tkelman · 2015-06-19T23:47:00Z

Did you also delete usr/lib/julia/sys.so? I was referring to the instructions in the first post of #9493, not the rest of the conversation there.

ScottPJones · 2015-06-19T23:51:50Z

I tried to follow the instructions, but there is no sys.so on my system...

ScottsRetinaMBP:julia scott$ ls usr/lib/julia
inference.dylib     inference0.dylib    sys.dylib
inference.dylib.dSYM    inference0.dylib.dSYM   sys.dylib.dSYM
inference.ji        inference0.ji       sys.ji
inference.o     inference0.o        sys.o

Should I delete some other file?

ScottPJones · 2015-06-19T23:52:34Z

Would it be .dylib here?

ScottPJones · 2015-06-19T23:54:18Z

Removing .dylib gives me this: 😞

ScottsRetinaMBP:julia scott$ usr/bin/julia --inline=no --code-coverage=all test/runtests.jl "strings"
LLVM ERROR: Program used external function '_lcnt713' which could not be resolved!

tkelman · 2015-06-19T23:59:09Z

Yes it would be .dylib on mac (or .dll for windows). That LLVM error is probably a separate bug. What version of LLVM are you using?

ScottPJones · 2015-06-20T00:00:44Z

LLVM 3.6.1 usually... I'd forgotten that I'd changed that... I'll try to rebuild with default and retest... ugh!

kmsquire · 2015-06-20T00:45:37Z

LLVM 3.6.1 usually... I'd forgotten that I'd changed that... I'll try to rebuild with default and retest... ugh!

Tip: since you're a frequent contributor and like to be on the bleeding edge, it might be worth it having multiple build trees available (I have between 2 and 4 at any given time).

ScottPJones · 2015-06-20T00:58:51Z

Please, how do you do that? Have different repositories? (Ah, you've noticed that I like to be on the bleeding edge? 😀) I really want to try using LLVM 3.7 as well, but I haven't figured out yet what I need to do...

kmsquire · 2015-06-20T01:05:24Z

Just different directories with different names. The top level dir defaults to julia, but you can call it whatever you like:

$ git clone [email protected]:JuliaLang/julia.git my_super_secret_totally_awesome_julia_build

ScottPJones · 2015-06-20T01:06:58Z

So, it is a different local repository... OK, good idea, thanks!

tkelman · 2015-06-20T01:08:05Z

It's not even all that necessary to have separate checkouts (you'll need to wait for all of openblas to rebuild in the new copy, and we don't change versions of that very often). I personally keep multiple LLVM versions within the same checkout, they'll exist side-by-side under deps you just have to make sure to do make cleanall between changing versions in Make.user.

mbauman · 2015-06-20T01:13:10Z

Also helpful are make -C deps update-llvm (for svn - I tend to only update LLVM-svn after I see a recent patch to fix upstream breakage) and make -C deps reinstall-llvm. Just don't do a distclean and they'll happily live pre-built right alongside each other, waiting to be "reinstalled."

ScottPJones · 2015-06-20T01:25:10Z

Wow! Thanks for all the great info!
OK, now to the good/bad news...
Coverage works fine with LLVM 3.3, hosed and not at all useful with current release LLVM 3.6.1.
Using the form:

       37         !isa(c2, Char) &&
        1             throw(UnicodeError(UTF_ERR_MAP_CHAR, 0, 0))

Does correctly report coverage... @kshyatt should know about it.
Of course, she may prefer the if style, which is totally fine, I just started using the && form because I was told to follow the current idiom in the code...
What should be the preferred idiom, considering that the 1-line form is bad for coverage analysis?

tkelman · 2015-06-20T01:27:18Z

My personal vote goes to the 3-line if because it'll be simpler to understand for newcomers to the language, if we're going to need to spread across 2+ lines to satisfy line-based coverage tools.

Your mac issue with LLVM 3.6.1 is probably known and fixed by #11515, but only on LLVM 3.7. Keno had some long-standing patches to LLVM that took a while to get merged.

kshyatt · 2015-06-20T01:27:22Z

My personal view is that, if we have to do 2 lines like you've quoted to get good coverage, then the if style is preferable. I think the &&/|| idiom is great for one-line concise "this has to be true and if not, panic!" checks but if we're using two lines, the if is easier to read. I'm happy to defer to others on this.

ScottPJones · 2015-06-20T01:31:09Z

OK, I'm perfectly happy with the 3-line, I was just going by one of my many review comments, and I think @kshyatt's great coverage work shows that that is more important than just conciseness.
Now I'm going to have to have a PR to change at least all my Unicode conversion code to go back to if,
if they get merged in! 😀

carlobaldassi · 2015-06-20T01:36:07Z

Just different directories with different names.

I use the git-new-workdir script, which has the same effect but does not duplicate the .git information, and keeps it in sync. Basically, you have different checkouts of the same repository in different directories.

ScottPJones · 2015-06-20T13:17:53Z

Better yet! Thanks, all of the suggestions have been helpful!

ScottPJones · 2015-06-20T19:16:34Z

Bump! Anything else to do here?

tkelman · 2015-06-21T00:02:08Z

I think it should be fine, will merge in a day or so if no other feedback.

P.S: There's really no need to bump open PR's every day. People are busy and aware there are lots of open PR's. If you think something has been forgotten about, waiting at least a couple of days before posting a reminder will help you come across as a little less pushy.

ScottPJones · 2015-06-21T00:07:16Z

OK, sure thing! Sorry!

tkelman · 2015-06-22T10:57:31Z

Needs a rebase apparently. Was about to merge otherwise.

…ase/map `uppercase`/`lowercase`/`map` on a `UTF16String` now returns a `UTF16String`, consistent with `ASCIIString`, `UTF8String`, and `UTF32String` returning the same type string as input.

ScottPJones · 2015-06-22T12:00:54Z

OK, it's hard to keep up with all the changes... It's now rebased.
I'd like to split up string.jl into a lot more logical pieces, and also split up the tests so that they match
the files, that will prevent a lot of these unnecessary rebasing and manual merging. Sound OK to you for another PR?

ScottPJones · 2015-06-22T21:04:17Z

Again, Appveyor hit by #11807, argh!

stevengj · 2015-06-23T20:14:30Z

Looks good to me; can we get the tests to pass by re-running the tests now that 11807 is closed?

tkelman · 2015-06-23T20:16:53Z

Not until #11818 gets fixed.

ScottPJones · 2015-06-24T02:46:54Z

Once again, the failure wasn't do to my change, 3 passed on Travis, and 1 on AppVeyor.

Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String

ScottPJones changed the title ~~Fix #11464 map on a UTF16String should return a UTF16String~~ Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String Jun 17, 2015

tkelman reviewed Jun 18, 2015
View reviewed changes

ScottPJones force-pushed the spj/utf16map branch from 4ad0aa0 to 192bb91 Compare June 19, 2015 04:55

ScottPJones force-pushed the spj/utf16map branch from 192bb91 to 9d2cf60 Compare June 19, 2015 23:38

ScottPJones mentioned this pull request Jun 20, 2015

RFC: changing code idiom for more accurate coverage information #11792

Closed

ScottPJones force-pushed the spj/utf16map branch from 9d2cf60 to 1e02cb2 Compare June 22, 2015 11:50

ScottPJones added 2 commits June 22, 2015 07:56

Fix JuliaLang#11460, Fix JuliaLang#11464 UTF16String uppercase/lowerc…

7e84bcf

…ase/map `uppercase`/`lowercase`/`map` on a `UTF16String` now returns a `UTF16String`, consistent with `ASCIIString`, `UTF8String`, and `UTF32String` returning the same type string as input.

Added new tests for full coverage

3e591e0

ScottPJones force-pushed the spj/utf16map branch from 1e02cb2 to 3e591e0 Compare June 22, 2015 11:58

tkelman added a commit that referenced this pull request Jun 24, 2015

Merge pull request #11735 from ScottPJones/spj/utf16map

a282fcd

Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String

tkelman merged commit a282fcd into JuliaLang:master Jun 24, 2015

Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String #11735

Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String #11735

Conversation

ScottPJones commented Jun 17, 2015

ScottPJones commented Jun 18, 2015

tkelman Jun 18, 2015

Choose a reason for hiding this comment

ScottPJones Jun 19, 2015

Choose a reason for hiding this comment

ScottPJones commented Jun 19, 2015

tkelman commented Jun 19, 2015

IainNZ commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

tkelman commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

tkelman commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

tkelman commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

tkelman commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

tkelman commented Jun 19, 2015

ScottPJones commented Jun 20, 2015

kmsquire commented Jun 20, 2015

ScottPJones commented Jun 20, 2015

kmsquire commented Jun 20, 2015

ScottPJones commented Jun 20, 2015

tkelman commented Jun 20, 2015

mbauman commented Jun 20, 2015

ScottPJones commented Jun 20, 2015

tkelman commented Jun 20, 2015

kshyatt commented Jun 20, 2015

ScottPJones commented Jun 20, 2015

carlobaldassi commented Jun 20, 2015

ScottPJones commented Jun 20, 2015

ScottPJones commented Jun 20, 2015

tkelman commented Jun 21, 2015

ScottPJones commented Jun 21, 2015

tkelman commented Jun 22, 2015

ScottPJones commented Jun 22, 2015

ScottPJones commented Jun 22, 2015

stevengj commented Jun 23, 2015

tkelman commented Jun 23, 2015

ScottPJones commented Jun 24, 2015