You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 7, 2019. It is now read-only.
On Windows10 javasphinx-apidoc won't work when run on Python 3.6.4. It will if run on Python2.7. This is with javasphinx==0.9.15 for both
It looks like the script, or possibly the python stdlib, are expecting the read files to be encoded in cp1252? But the files are actually utf-8. This will hit a problem on any byte that isn't a valid cp1252 character.
e.g. If trying to read character 🐍 ( U+1F40D, encoded in UTF-8 as b'\xF0\x9F\x90\x8D') then the script throws an exception, as it's treating that as 4 separate characters, and byte 0x90 is not a cp1252 character.
The stack trace shown is:
File "C:\dev\env\python\Python36\Scripts\javasphinx-apidoc-script.py", line 11, in <module>
load_entry_point('javasphinx==0.9.15', 'console_scripts', 'javasphinx-apidoc')()
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 347, in main
opts.member_headers, opts.parser_lib)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 228, in generate_documents
this_file_documents = generate_from_source_file(doc_compiler, source_file, cache_dir)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 191, in generate_from_source_file
source = f.read()
File "c:\dev\env\python\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 24: character maps to <undefined>
Whilst it works in py2, I'm feel like this is purely by accident due to python2's very "liberal" string decoding policies and the fact that it's a UTF-8 file. If my file was encoded in something weird, e.g. EUCJIS/SJIS, then the tool will fail. The official javadoc tool has an encoding option.
It would be good if javasphinx-apidoc could take an --encoding parameter and ensure that all files are read/decoded in that format.
Full Example
This was done using Powershell_ISA to "ensure" that the unicode characters were printed correctly, but it will happen in cmd.exe or git bash etc.
PS C:\dev\work\Mobile-SDK-Android\docs> Get-Content .\java\utf8.java -Encoding UTF8
package java;
/**
* 🐍 U+1F40D -> \xF0\x9F\x90\x8D
* 👐 U+1F450 -> \xF0\x9F\x91\x90
*/
public class EncodingProblems {
public static void main(String[] args) {
System.out.println("Hello!");
}
}
PS C:\dev\work\Mobile-SDK-Android\docs> C:\dev\env\python\Python36\Scripts\javasphinx-apidoc.exe --output-dir=tmp/ java/
C:\dev\env\python\Python36\Scripts\javasphinx-apidoc.exe : Traceback (most recent call last):
At line:1 char:1
+ C:\dev\env\python\Python36\Scripts\javasphinx-apidoc.exe --output-dir ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
File "C:\dev\env\python\Python36\Scripts\javasphinx-apidoc-script.py", line 11, in <module>
load_entry_point('javasphinx==0.9.15', 'console_scripts', 'javasphinx-apidoc')()
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 347, in main
opts.member_headers, opts.parser_lib)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 228, in generate_documents
this_file_documents = generate_from_source_file(doc_compiler, source_file, cache_dir)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 191, in generate_from_source_file
source = f.read()
File "c:\dev\env\python\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 24: character maps to <undefined>
PS C:\dev\work\Mobile-SDK-Android\docs> C:\dev\env\python\Python27\Scripts\javasphinx-apidoc.exe --output-dir=tmp/ java/
The text was updated successfully, but these errors were encountered:
Hi,
Related to issue #56 and issue #37
On Windows10
javasphinx-apidoc
won't work when run on Python 3.6.4. It will if run on Python2.7. This is withjavasphinx==0.9.15
for bothIt looks like the script, or possibly the python stdlib, are expecting the read files to be encoded in cp1252? But the files are actually utf-8. This will hit a problem on any byte that isn't a valid cp1252 character.
e.g. If trying to read character 🐍 ( U+1F40D, encoded in UTF-8 as b'\xF0\x9F\x90\x8D') then the script throws an exception, as it's treating that as 4 separate characters, and byte 0x90 is not a cp1252 character.
The stack trace shown is:
Whilst it works in py2, I'm feel like this is purely by accident due to python2's very "liberal" string decoding policies and the fact that it's a UTF-8 file. If my file was encoded in something weird, e.g. EUCJIS/SJIS, then the tool will fail. The official javadoc tool has an encoding option.
It would be good if
javasphinx-apidoc
could take an--encoding
parameter and ensure that all files are read/decoded in that format.Full Example
This was done using Powershell_ISA to "ensure" that the unicode characters were printed correctly, but it will happen in cmd.exe or git bash etc.
The text was updated successfully, but these errors were encountered: