Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing error? #333

Closed
ChrisRackauckas opened this issue Apr 6, 2022 · 7 comments
Closed

Parsing error? #333

ChrisRackauckas opened this issue Apr 6, 2022 · 7 comments

Comments

@ChrisRackauckas
Copy link

https://github.com/SciML/ModelingToolkit.jl/runs/5852797241?check_suite_focus=true#step:4:143

It seems to point to https://github.com/SciML/ModelingToolkit.jl/blob/v8.6.0/test/error_handling.jl#L60

@KristofferC said

You probably found a bug in CSTParser.jl. I suggest open an issue on it with the problematic code.

@ChrisRackauckas
Copy link
Author

ChrisRackauckas commented Apr 6, 2022

@simeonschaub
Copy link
Contributor

The culprit here seems to be a format character at the start of the file, which Julia's parser just ignores:

julia> CSTParser.parse("\ufeffusing Test")
  1:13  errortoken
  1:3    errortoken( CSTParser.UnexpectedToken)
  1:3     errortoken( CSTParser.Unknown)
  4:13   using
  4:7       1:0   OP: .
  4:7      Test

julia> Meta.parse("\ufeffusing Test")
:(using Test)

@pfitzseb
Copy link
Member

pfitzseb commented Apr 6, 2022

FEFF is the big-endian UTF-16 BOM. Seems reasonable to only support UTF-8 encoded files to me.

@pfitzseb
Copy link
Member

pfitzseb commented Apr 6, 2022

Nvm, this actually is UTF-8 with BOM:

julia> first(codeunits(read("src/OrdinaryDiffEq.jl", String)), 6)
6-element Vector{UInt8}:
 0xef
 0xbb
 0xbf
 0x22
 0x22
 0x22

It's somewhat arguable that we should support that.

@StefanKarpinski
Copy link

StefanKarpinski commented Apr 6, 2022

Julia itself just treats U+FEFF as a space:

julia> Meta.parse("[1\ufeff2]")
:([1 2])

That's a simple approach that allows it as a BOM as well since parsing doesn't care about space at the beginning of a file.

@pfitzseb
Copy link
Member

pfitzseb commented Apr 6, 2022

JuliaLang/Tokenize.jl#197 does that.
One potential issue with that approach is that JuliaFormatter is likely to remove the unnecessary leading space, which some Microsoft tools apparently don't like.

@StefanKarpinski
Copy link

Maybe it would make sense to treat a leading BOM as a special token that JuliaFormatter knows to retain. Or the logic could be changed from deleting leading whitespace to trimming leading whitespace to just the BOM if there is a BOM.

@pfitzseb pfitzseb closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants