Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle different encodings #24

Open
davidanthoff opened this issue Jul 20, 2017 · 5 comments
Open

Handle different encodings #24

davidanthoff opened this issue Jul 20, 2017 · 5 comments

Comments

@davidanthoff
Copy link
Member

Would be great if the kind of functionality being added in JuliaData/DataFrames.jl#1194 was also available here.

@shashi
Copy link
Collaborator

shashi commented Jul 26, 2017

Thanks for the reference! Would be good to have.

@pevnak
Copy link

pevnak commented Aug 25, 2018

I do not know, if the problem is related to this, but seems to me that TextParse does not handle correctly unicode in headers.
While loading this fails

"α"
0.05

this succeeds

"a"
0.05

.

I have found the error through using CSVFiles.jl, but the stack trace clearly goes to TextParse

[1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex(::String, ::UnitRange{Int64}) at ./strings/string.jl:246
 [3] _substring at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:268 [inlined]
 [4] tryparsenext(::TextParse.StringToken{String}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:264
 [5] tryparsenext(::TextParse.Quoted{String,TextParse.StringToken{String}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:353

Thanks for help.

@andreasnoack
Copy link
Contributor

The example works fine for me. Could you provide a complete example with a complete error message and info about the versions of Julia and your packages?

@pevnak
Copy link

pevnak commented Aug 27, 2018

The error is

julia> load("/tmp/test.csv")
Error showing value of type CSVFiles.CSVFile:
ERROR: StringIndexError("\"α\"\n0.05", 3)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at ./strings/string.jl:12
 [2] getindex(::String, ::UnitRange{Int64}) at ./strings/string.jl:246
 [3] _substring at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:268 [inlined]
 [4] tryparsenext(::TextParse.StringToken{String}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:264
 [5] tryparsenext(::TextParse.Quoted{String,TextParse.StringToken{String}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:353
 [6] macro expansion at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/util.jl:27 [inlined]
 [7] tryparsenext(::TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String}}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/field.jl:552
 [8] macro expansion at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/util.jl:27 [inlined]
 [9] quotedsplit(::String, ::TextParse.LocalOpts, ::Bool, ::Int64, ::Int64) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:640
 [10] readcolnames(::String, ::TextParse.LocalOpts, ::Int64, ::Array{String,1}) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:447
 [11] #_csvread_internal#26(::Bool, ::Char, ::Char, ::Type, ::Bool, ::Int64, ::Nothing, ::Nothing, ::Int64, ::Nothing, ::Bool, ::Array{String,1}, ::Array{String,1}, ::DataStructures.OrderedDict{Union{Int64, String},AbstractArray{T,1} where T}, ::Int64, ::Nothing, ::Array{Any,1}, ::String, ::Int64, ::typeof(TextParse._csvread_internal), ::String, ::Char) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:196
 [12] (::getfield(TextParse, Symbol("#kw##_csvread_internal")))(::NamedTuple{(:filename,),Tuple{String}}, ::typeof(TextParse._csvread_internal), ::String, ::Char) at ./none:0
 [13] #22 at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:104 [inlined]
 [14] #open#298(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(TextParse, Symbol("##22#24")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},String,Char}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:369
 [15] open at ./iostream.jl:367 [inlined]
 [16] #_csvread_f#20 at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:102 [inlined]
 [17] _csvread_f at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:94 [inlined]
 [18] #csvread#16(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::String, ::Char) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:76
 [19] csvread(::String, ::Char) at /Users/tpevny/.julia/packages/TextParse/VFtjK/src/csv.jl:76
 [20] getiterator(::CSVFiles.CSVFile) at /Users/tpevny/.julia/packages/CSVFiles/Mzpbp/src/CSVFiles.jl:69
 [21] show(::IOContext{REPL.Terminals.TTYTerminal}, ::CSVFiles.CSVFile) at /Users/tpevny/.julia/packages/CSVFiles/Mzpbp/src/CSVFiles.jl:22
 [22] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::CSVFiles.CSVFile) at ./sysimg.jl:195
 [23] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:131
 [24] display(::REPL.REPLDisplay, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:135
 [25] display(::CSVFiles.CSVFile) at ./multimedia.jl:287
 [26] #invokelatest#1 at ./essentials.jl:691 [inlined]
 [27] invokelatest at ./essentials.jl:690 [inlined]
 [28] print_response(::IO, ::Any, ::Any, ::Bool, ::Bool, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:154
 [29] print_response(::REPL.AbstractREPL, ::Any, ::Any, ::Bool, ::Bool) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:139
 [30] (::getfield(REPL, Symbol("#do_respond#40")){Bool,getfield(REPL, Symbol("##50#59")){REPL.LineEditREPL,REPL.REPLHistoryProvider},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:708
 [31] #invokelatest#1 at ./essentials.jl:691 [inlined]
 [32] invokelatest at ./essentials.jl:690 [inlined]
 [33] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/LineEdit.jl:2261
 [34] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:1029
 [35] run_repl(::REPL.AbstractREPL, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:191
 [36] (::getfield(Base, Symbol("##831#833")){Bool,Bool,Bool,Bool})(::Module) at ./logging.jl:311
 [37] #invokelatest#1 at ./essentials.jl:691 [inlined]
 [38] invokelatest at ./essentials.jl:690 [inlined]
 [39] macro expansion at ./logging.jl:308 [inlined]
 [40] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:340
 [41] exec_options(::Base.JLOptions) at ./client.jl:252
 [42] _start() at ./client.jl:432

and list of my packages is

(v0.7) pkg> st
    Status `~/.julia/environments/v0.7/Project.toml`
  [79e6a3ab] Adapt v0.3.1
  [7d9fca2a] Arpack v0.2.3
  [336ed68f] CSV v0.3.1
  [5d742f6a] CSVFiles v0.9.0
  [a93c6f00] DataFrames v0.13.1
  [864edb3b] DataStructures v0.11.0
  [b4f34e82] Distances v0.7.3
  [31c24e10] Distributions v0.16.2
  [5789e2e9] FileIO v1.0.1
  [587475ba] Flux v0.6.5
  [28b8d3ca] GR v0.32.3
  [7073ff75] IJulia v1.9.3
  [c8e1da08] IterTools v1.0.0
  [682c06a0] JSON v0.19.0
  [b1bec4e5] LIBSVM v0.3.0
  [7f8f8fb0] LearnBase v0.2.2+ #master (https://github.com/JuliaML/LearnBase.jl.git)
  [9920b226] MLDataPattern v0.4.0+ #master (https://github.com/JuliaML/MLDataPattern.jl.git)
  [872c559c] NNlib v0.4.1
  [429524aa] Optim v0.16.0
  [58dd65bb] Plotly v0.1.1
  [91a5bcdd] Plots v0.19.3
  [c46f51b8] ProfileView v0.3.0
  [d330b81b] PyPlot v2.6.0
  [295af30f] Revise v0.7.1
  [f2b01f46] Roots v0.7.1
  [2913bbd2] StatsBase v0.25.0
  [e0df1984] TextParse v0.6.0
  [37b6cedf] Traceur v0.1.1
  [b8865327] UnicodePlots v0.3.1
  [98cad3c8] ValueHistories v0.5.0
  [10745b16] Statistics

(v0.7) pkg>

Thanks for help.

@andreasnoack
Copy link
Contributor

I just tried on 0.7 and it is indeed an issue there. Please report this as a separate issue. It used to work on 0.6 so it is a regression.

@davidanthoff davidanthoff mentioned this issue Nov 29, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants