Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String #11735

Merged
merged 2 commits into from
Jun 24, 2015

Conversation

ScottPJones
Copy link
Contributor

The other string types, ASCIIString, UTF8String, and UTF32String all return a string of the same type, but UTF16String would always return a UTF8String, creating type instability.
This also adds testing for all 4 basic string types.

@ScottPJones ScottPJones changed the title Fix #11464 map on a UTF16String should return a UTF16String Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String Jun 17, 2015
@ScottPJones
Copy link
Contributor Author

@JeffBezanson @stevengj @tkelman Could one or all of you release your 🍅s on this one? Thanks!

sizehint!(buf, length(str.data))
for ch in str
c2 = fun(ch)
!isa(c2, Char) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this structure result in coverage reporting separate hits for both lines? I don't see a test that hits this error path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now added tests for that.

@ScottPJones
Copy link
Contributor Author

Bump! Does this look good to go now? Thanks!

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

You didn't answer my question, do separate hits show up for both lines in the .cov files with that code structure?

@IainNZ
Copy link
Member

IainNZ commented Jun 19, 2015

A thought on these Unicode tests: it could be good to have these tests cover some characters outside the normal ASCII characters, just to avoid regressions.

@ScottPJones
Copy link
Contributor Author

About the .cov files, I think there is some problem with the coverage output...
It's not reporting at all on the line where the error is printed out, even though the error is definitely there.

Yes, I can add some more tests... I was just hurrying!

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

It's not reporting at all on the line where the error is printed out, even though the error is definitely there.

Okay then considering that limitation/bug, there's no reason to go for the 2-line short-circuit formatting. Either use the 1-liner short-circuiting version and be aware that coverage won't say whether the error path is tested, or use a 3-line if .. end

@ScottPJones
Copy link
Contributor Author

It's not working consistently for the the line case either!

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

Did you run without inlining? ref #9493

@ScottPJones
Copy link
Contributor Author

What do I need to do? #9493 is 196 sometimes long comments...

@ScottPJones
Copy link
Contributor Author

I did the following:

ScottsRetinaMBP:julia scott$ rm base/*.cov
ScottsRetinaMBP:julia scott$ rm test/*.cov
ScottsRetinaMBP:julia scott$ usr/bin/julia --inline=no --code-coverage=all test/runtests.jl "strings"
     * strings              in  16.70 seconds
    SUCCESS

and got this result:

        - function map(fun, str::UTF16String)
        9     buf = UInt16[]
        1     sizehint!(buf, length(str.data))
        1     for ch in str
        1         c2 = fun(ch)
        1         if !isa(c2, Char)
        0             throw(UnicodeError(UTF_ERR_MAP_CHAR, 0, 0))
        -         end
        0         uc = reinterpret(UInt32, c2)
        1         if uc < 0x10000
        5             if utf16_is_surrogate(UInt16(uc))
        5                 throw(UnicodeError(UTF_ERR_INVALID_CHAR, 0, uc))
        -             end
        2             push!(buf, UInt16(uc))
        2         elseif uc <= 0x10ffff
        2             push!(buf, UInt16(0xd7c0 + (uc >> 10)))
        0             push!(buf, UInt16(0xdc00 + (uc & 0x3ff)))
        -         else
        1             throw(UnicodeError(UTF_ERR_INVALID_CHAR, 0, uc))
        -         end
        -     end
        2     push!(buf, 0)
        3     UTF16String(buf)
        - end

@ScottPJones
Copy link
Contributor Author

It makes no difference, 1 line &&, 2 line && or if, it's always incorrect...

@ScottPJones
Copy link
Contributor Author

@tkelman? Not sure what I should do from here, since the unit tests show that every path is covered, but the coverage test is messed up... (or there's something I'm missing in running it). Thanks!

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

I haven't the slightest clue how the coverage code is implemented. That might be a bug, in which case it should be reported separately, ideally with a standalone reproduction case that can be run from an unmodified build of master, so people who do know what's going on can determine the problem.

@ScottPJones
Copy link
Contributor Author

But you see why I couldn't exactly answer your question about the results of the coverage test, and why that is not relevant to whether or not this change is ready, right?

@ScottPJones
Copy link
Contributor Author

As far as you know, did I call the coverage test you wanted correctly?

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

Did you also delete usr/lib/julia/sys.so? I was referring to the instructions in the first post of #9493, not the rest of the conversation there.

@ScottPJones
Copy link
Contributor Author

I tried to follow the instructions, but there is no sys.so on my system...

ScottsRetinaMBP:julia scott$ ls usr/lib/julia
inference.dylib     inference0.dylib    sys.dylib
inference.dylib.dSYM    inference0.dylib.dSYM   sys.dylib.dSYM
inference.ji        inference0.ji       sys.ji
inference.o     inference0.o        sys.o

Should I delete some other file?

@ScottPJones
Copy link
Contributor Author

Would it be .dylib here?

@ScottPJones
Copy link
Contributor Author

Removing .dylib gives me this: 😞

ScottsRetinaMBP:julia scott$ usr/bin/julia --inline=no --code-coverage=all test/runtests.jl "strings"
LLVM ERROR: Program used external function '_lcnt713' which could not be resolved!

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2015

Yes it would be .dylib on mac (or .dll for windows). That LLVM error is probably a separate bug. What version of LLVM are you using?

@ScottPJones
Copy link
Contributor Author

LLVM 3.6.1 usually... I'd forgotten that I'd changed that... I'll try to rebuild with default and retest... ugh!

@kmsquire
Copy link
Member

LLVM 3.6.1 usually... I'd forgotten that I'd changed that... I'll try to rebuild with default and retest... ugh!

Tip: since you're a frequent contributor and like to be on the bleeding edge, it might be worth it having multiple build trees available (I have between 2 and 4 at any given time).

@ScottPJones
Copy link
Contributor Author

Please, how do you do that? Have different repositories? (Ah, you've noticed that I like to be on the bleeding edge? 😀) I really want to try using LLVM 3.7 as well, but I haven't figured out yet what I need to do...

@kmsquire
Copy link
Member

Just different directories with different names. The top level dir defaults to julia, but you can call it whatever you like:

$ git clone [email protected]:JuliaLang/julia.git my_super_secret_totally_awesome_julia_build

@ScottPJones
Copy link
Contributor Author

So, it is a different local repository... OK, good idea, thanks!

@tkelman
Copy link
Contributor

tkelman commented Jun 20, 2015

It's not even all that necessary to have separate checkouts (you'll need to wait for all of openblas to rebuild in the new copy, and we don't change versions of that very often). I personally keep multiple LLVM versions within the same checkout, they'll exist side-by-side under deps you just have to make sure to do make cleanall between changing versions in Make.user.

@mbauman
Copy link
Member

mbauman commented Jun 20, 2015

Also helpful are make -C deps update-llvm (for svn - I tend to only update LLVM-svn after I see a recent patch to fix upstream breakage) and make -C deps reinstall-llvm. Just don't do a distclean and they'll happily live pre-built right alongside each other, waiting to be "reinstalled."

@ScottPJones
Copy link
Contributor Author

Wow! Thanks for all the great info!
OK, now to the good/bad news...
Coverage works fine with LLVM 3.3, hosed and not at all useful with current release LLVM 3.6.1.
Using the form:

       37         !isa(c2, Char) &&
        1             throw(UnicodeError(UTF_ERR_MAP_CHAR, 0, 0))

Does correctly report coverage... @kshyatt should know about it.
Of course, she may prefer the if style, which is totally fine, I just started using the && form because I was told to follow the current idiom in the code...
What should be the preferred idiom, considering that the 1-line form is bad for coverage analysis?

@tkelman
Copy link
Contributor

tkelman commented Jun 20, 2015

My personal vote goes to the 3-line if because it'll be simpler to understand for newcomers to the language, if we're going to need to spread across 2+ lines to satisfy line-based coverage tools.

Your mac issue with LLVM 3.6.1 is probably known and fixed by #11515, but only on LLVM 3.7. Keno had some long-standing patches to LLVM that took a while to get merged.

@kshyatt
Copy link
Contributor

kshyatt commented Jun 20, 2015

My personal view is that, if we have to do 2 lines like you've quoted to get good coverage, then the if style is preferable. I think the &&/|| idiom is great for one-line concise "this has to be true and if not, panic!" checks but if we're using two lines, the if is easier to read. I'm happy to defer to others on this.

@ScottPJones
Copy link
Contributor Author

OK, I'm perfectly happy with the 3-line, I was just going by one of my many review comments, and I think @kshyatt's great coverage work shows that that is more important than just conciseness.
Now I'm going to have to have a PR to change at least all my Unicode conversion code to go back to if,
if they get merged in! 😀

@carlobaldassi
Copy link
Member

Just different directories with different names.

I use the git-new-workdir script, which has the same effect but does not duplicate the .git information, and keeps it in sync. Basically, you have different checkouts of the same repository in different directories.

@ScottPJones
Copy link
Contributor Author

Better yet! Thanks, all of the suggestions have been helpful!

@ScottPJones
Copy link
Contributor Author

Bump! Anything else to do here?

@tkelman
Copy link
Contributor

tkelman commented Jun 21, 2015

I think it should be fine, will merge in a day or so if no other feedback.

P.S: There's really no need to bump open PR's every day. People are busy and aware there are lots of open PR's. If you think something has been forgotten about, waiting at least a couple of days before posting a reminder will help you come across as a little less pushy.

@ScottPJones
Copy link
Contributor Author

OK, sure thing! Sorry!

@tkelman
Copy link
Contributor

tkelman commented Jun 22, 2015

Needs a rebase apparently. Was about to merge otherwise.

…ase/map

`uppercase`/`lowercase`/`map` on a `UTF16String` now returns a `UTF16String`,
consistent with `ASCIIString`, `UTF8String`, and `UTF32String` returning
the same type string as input.
@ScottPJones
Copy link
Contributor Author

OK, it's hard to keep up with all the changes... It's now rebased.
I'd like to split up string.jl into a lot more logical pieces, and also split up the tests so that they match
the files, that will prevent a lot of these unnecessary rebasing and manual merging. Sound OK to you for another PR?

@ScottPJones
Copy link
Contributor Author

Again, Appveyor hit by #11807, argh!

@stevengj
Copy link
Member

Looks good to me; can we get the tests to pass by re-running the tests now that 11807 is closed?

@tkelman
Copy link
Contributor

tkelman commented Jun 23, 2015

Not until #11818 gets fixed.

@ScottPJones
Copy link
Contributor Author

Once again, the failure wasn't do to my change, 3 passed on Travis, and 1 on AppVeyor.

tkelman added a commit that referenced this pull request Jun 24, 2015
Fix #11460, Fix #11464 uppercase/lowercase/map on a UTF16String should return a UTF16String
@tkelman tkelman merged commit a282fcd into JuliaLang:master Jun 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants