Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in saving non-standard element types #32

Open
bkamins opened this issue Sep 10, 2018 · 2 comments
Open

Errors in saving non-standard element types #32

bkamins opened this issue Sep 10, 2018 · 2 comments
Labels

Comments

@bkamins
Copy link

bkamins commented Sep 10, 2018

Consider the following code:

julia> df = DataFrame(x = [',','\n', ','])
3×1 DataFrame
│ Row │ x    │
│     │ Char │
├─────┼──────┤
│ 1   │ ','  │
│ 2   │ '\n' │
│ 3   │ ','  │

julia> df |> save("test.csv")

julia> println(read("test.csv", String))
"x"
,


,


julia>

And the saved file is broken because non-strings are saved as not quoted.

Here is an extreme example (not to say it happens in reality, but just shows that it could be handled better). The code is a continuation of the earlier code:

julia> DataFrame(d=[df, df]) |> save("test2.csv")

julia> println(read("test2.csv", String))
"d"
3×1 DataFrame
│ Row │ x    │
│     │ Char │
├─────┼──────┤
│ 1   │ ','  │
│ 2   │ '\n' │
│ 3   │ ','  │
3×1 DataFrame
│ Row │ x    │
│     │ Char │
├─────┼──────┤
│ 1   │ ','  │
│ 2   │ '\n' │
│ 3   │ ','  │

and it is completely unreadable back (even as string) because it is not quoted again.

Finally let us consider a more normal scenario, which is again broken because of non-quoting:

julia> df = DataFrame(a=Date("2000-10-10"), b=Date("2000-11-11"))
1×2 DataFrame
│ Row │ a          │ b          │
│     │ Date       │ Date       │
├─────┼────────────┼────────────┤
│ 1   │ 2000-10-10 │ 2000-11-11 │

julia> df |> save("test3.csv", delim="-")

julia> println(read("test3.csv", String))
"a"-"b"
2000-10-10-2000-11-11

@davidanthoff Not sure which of the issues above can be fixed but at least I wanted you to be aware of them.

@davidanthoff
Copy link
Member

Thanks for reporting these, these are clearly bugs!

I guess a quick, partial fix would be to just always write quotes around Char (that really seems better in general), and maybe also around dates? Or maybe around every type, except when we know that we don't need them (numbers, some other exceptions)?

@bkamins
Copy link
Author

bkamins commented Sep 10, 2018

This is what I thought. The only problem is that when you quote them then you might need to escape something in the quotes (as in the last example with dates). This means that when reading it back you would have to unquote the string before trying to parse it, which would introduce a computational overhead (and I guess this is what TextParse.jl wants to avoid).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants