Encoding issue with jdt.ls javadoc #524

tolusha · 2018-01-19T15:34:58Z

It is turned out that the original source isn't utf-8 encoded.
I am not sure if it is possible to handle that.

vs code:

Che:

snjeza · 2018-01-21T21:49:03Z

This is an m2e bug. junit-3.8.1-sources.jar uses the ISO-8859-1 encoding and m2e doesn't allow changing source attachment encoding.
The issue can't be reproduced when using junit >= 3.8.2.
See https://bugs.eclipse.org/bugs/show_bug.cgi?id=385391

@fbricon Would you like me to try to fix this issue in m2e?

fbricon · 2018-01-22T04:18:40Z

@snjeza before you try to fix m2e, do you have a general idea on how to do it?

snjeza · 2018-01-22T17:50:40Z

We could add a maven property as the following:

<m2e.source.attachment.encoding>groupid:artifactId:version:ISO-8859-1</m2e.source.attachment.encoding>

fbricon · 2018-01-22T18:42:46Z

Mmm that would not work OOTB. And you probably need to handle javadoc and sources differently.

An alternative might be to, before attaching the file, inspect the 1st (java) file in there and try to detect the encoding via ICU4J (see https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream/4013565#4013565), for instance.

@ifedorenko WDYT?

ifedorenko · 2018-01-22T18:54:00Z

Agree with Fred, the encoding has to come from somewhere and encoding detection seems like a reasonable thing to do. Although I guess it won't be 100% reliable, for things like mixed-encoding sources, not using all sources during autodetection, etc.

Other options you may want to try

look inside artifact pom.xml file for presence of source encoding property. Which may or may not be set correctly.
have hand-crafted and/or crowd-sourced artifact SHA -> encoding mapping.

good luck ;-)

fbricon · 2018-01-22T19:02:19Z

@snjeza If going with ICU4J, make sure you're not depending on the icu4j bundle, but the packages, since jdt.ls only embeds the base icu4j jar

snjeza · 2018-01-22T19:06:50Z

@fbricon @ifedorenko ~~CharsetDetector returns UTF-8 for junit-3.8.1-sources.jar!/junit/framework/TestSuite.java.~~

ifedorenko · 2018-01-22T19:30:59Z

Autodetection will never be 100% reliable, especially if it does not consider all source files.

snjeza · 2018-01-22T19:54:55Z

~~CharsetDetector doesn't work even when detecting TestSuite.java~~

A solution for a gradle project is:

apply plugin: 'java'
apply plugin: 'eclipse'

 eclipse {
    classpath {
        file {
            whenMerged {
                def source = entries.find { it.path.contains('junit-3.8.1-sources.jar') }
                source.entryAttributes['source_encoding'] = 'ISO-8859-1'
            }
        }
    }
}

fbricon · 2018-01-22T20:00:42Z

@snjeza I'm curious, does icu4j detect anything other than utf-8 if you scan the entire junit-3.8.1-sources.jar?

snjeza · 2018-01-22T21:40:36Z

I'm curious, does icu4j detect anything other than utf-8 if you scan the entire junit-3.8.1-sources.jar?

icu4j detects encoding correctly when scanning junit-3.8.1-sources.jar, icu4j-60.2-sources.jar, commons-io-2.6-sources.jar ...

If going with ICU4J, make sure you're not depending on the icu4j bundle, but the packages, since jdt.ls only embeds the base icu4j jar

@fbricon we should add the icu4j dependency to the org.eclipse.m2e.jdt bundle. jdt.ls would have to include the icu4j bundle.
com.ibm.icu.base doesn't include CharsetDetector.

fbricon · 2018-01-22T23:29:07Z

how long does it take to full-scan a jar?
I'm interested to know whether we can accurately detect encoding after scanning say 5, 10, 100 files in a jar. I'd rather sample the scanning than perform a complete archive scan
can you see if other frameworks produce similar results? I'm not thrilled by the idea of adding another 12MB to jdt.ls simply to be able to detect source code encoding

Test for eclipse-jdtls/eclipse.jdt.ls#524 Signed-off-by: Snjezana Peco <[email protected]>

snjeza · 2018-01-31T00:29:14Z

See https://git.eclipse.org/r/#/c/116390/2

fbricon · 2018-03-27T22:42:47Z

Since encoding detection gives worse results than the default UTF-8 encoding,
we're gonna need a new command to explicitly set encoding to a URI (now that m2e persists that information).

fbricon · 2018-04-12T16:36:01Z

WIP to expose a setSourceEncoding command:

snjeza self-assigned this Jan 19, 2018

fbricon added the bug label Jan 19, 2018

snjeza added the upstream label Jan 22, 2018

fbricon added this to the End January 2018 milestone Jan 24, 2018

snjeza added a commit to snjeza/m2e-core-tests that referenced this issue Jan 31, 2018

Encoding issue

b9f9a6f

Test for eclipse-jdtls/eclipse.jdt.ls#524 Signed-off-by: Snjezana Peco <[email protected]>

snjeza mentioned this issue Jan 31, 2018

Encoding issue tesla/m2e-core-tests#57

Closed

fbricon modified the milestones: End January 2018, Mid February 2018 Jan 31, 2018

fbricon modified the milestones: Mid February 2018, End February 2018 Feb 12, 2018

tsmaeder mentioned this issue Feb 12, 2018

Sprint 145 Issues #546

Closed

fbricon modified the milestones: End February 2018, Mid March 2018 Feb 22, 2018

fbricon mentioned this issue Feb 28, 2018

The encoding "TAB" missed when peeking a reference in Java source code redhat-developer/vscode-java#453

Closed

fbricon modified the milestones: Mid March 2018, End March 2018 Mar 15, 2018

fbricon removed this from the End March 2018 milestone Mar 27, 2018

fbricon assigned fbricon and unassigned snjeza Jun 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue with jdt.ls javadoc #524

Encoding issue with jdt.ls javadoc #524

tolusha commented Jan 19, 2018 •

edited

Loading

snjeza commented Jan 21, 2018

fbricon commented Jan 22, 2018

snjeza commented Jan 22, 2018

fbricon commented Jan 22, 2018

ifedorenko commented Jan 22, 2018

fbricon commented Jan 22, 2018

snjeza commented Jan 22, 2018 •

edited

Loading

ifedorenko commented Jan 22, 2018

snjeza commented Jan 22, 2018 •

edited

Loading

fbricon commented Jan 22, 2018

snjeza commented Jan 22, 2018

fbricon commented Jan 22, 2018

snjeza commented Jan 31, 2018

fbricon commented Mar 27, 2018

fbricon commented Apr 12, 2018

Encoding issue with jdt.ls javadoc #524

Encoding issue with jdt.ls javadoc #524

Comments

tolusha commented Jan 19, 2018 • edited Loading

snjeza commented Jan 21, 2018

fbricon commented Jan 22, 2018

snjeza commented Jan 22, 2018

fbricon commented Jan 22, 2018

ifedorenko commented Jan 22, 2018

fbricon commented Jan 22, 2018

snjeza commented Jan 22, 2018 • edited Loading

ifedorenko commented Jan 22, 2018

snjeza commented Jan 22, 2018 • edited Loading

fbricon commented Jan 22, 2018

snjeza commented Jan 22, 2018

fbricon commented Jan 22, 2018

snjeza commented Jan 31, 2018

fbricon commented Mar 27, 2018

fbricon commented Apr 12, 2018

tolusha commented Jan 19, 2018 •

edited

Loading

snjeza commented Jan 22, 2018 •

edited

Loading

snjeza commented Jan 22, 2018 •

edited

Loading