Parsing error? #333

ChrisRackauckas · 2022-04-06T15:01:42Z

https://github.com/SciML/ModelingToolkit.jl/runs/5852797241?check_suite_focus=true#step:4:143

It seems to point to https://github.com/SciML/ModelingToolkit.jl/blob/v8.6.0/test/error_handling.jl#L60

You probably found a bug in CSTParser.jl. I suggest open an issue on it with the problematic code.

ChrisRackauckas · 2022-04-06T15:06:51Z

Same thing seems to show up many places.

https://github.com/SciML/OrdinaryDiffEq.jl/runs/5853304680?check_suite_focus=true#step:4:104
https://github.com/SciML/OrdinaryDiffEq.jl/blob/v6.8.0/src/OrdinaryDiffEq.jl#L303

etc.

simeonschaub · 2022-04-06T15:33:00Z

The culprit here seems to be a format character at the start of the file, which Julia's parser just ignores:

julia> CSTParser.parse("\ufeffusing Test")
  1:13  errortoken
  1:3    errortoken( CSTParser.UnexpectedToken)
  1:3     errortoken( CSTParser.Unknown)
  4:13   using
  4:7       1:0   OP: .
  4:7      Test

julia> Meta.parse("\ufeffusing Test")
:(using Test)

pfitzseb · 2022-04-06T15:44:51Z

FEFF is the big-endian UTF-16 BOM. Seems reasonable to only support UTF-8 encoded files to me.

pfitzseb · 2022-04-06T16:08:08Z

Nvm, this actually is UTF-8 with BOM:

julia> first(codeunits(read("src/OrdinaryDiffEq.jl", String)), 6)
6-element Vector{UInt8}:
 0xef
 0xbb
 0xbf
 0x22
 0x22
 0x22

It's somewhat arguable that we should support that.

StefanKarpinski · 2022-04-06T16:15:18Z

Julia itself just treats U+FEFF as a space:

julia> Meta.parse("[1\ufeff2]")
:([1 2])

That's a simple approach that allows it as a BOM as well since parsing doesn't care about space at the beginning of a file.

pfitzseb · 2022-04-06T16:19:48Z

JuliaLang/Tokenize.jl#197 does that.
One potential issue with that approach is that JuliaFormatter is likely to remove the unnecessary leading space, which some Microsoft tools apparently don't like.

StefanKarpinski · 2022-04-07T16:45:15Z

Maybe it would make sense to treat a leading BOM as a special token that JuliaFormatter knows to retain. Or the logic could be changed from deleting leading whitespace to trimming leading whitespace to just the BOM if there is a BOM.

simeonschaub mentioned this issue Apr 6, 2022

error when encountering format characters JuliaLang/JuliaSyntax.jl#23

Closed

pfitzseb closed this as completed May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing error? #333

Parsing error? #333

ChrisRackauckas commented Apr 6, 2022

ChrisRackauckas commented Apr 6, 2022 •

edited

Loading

simeonschaub commented Apr 6, 2022

pfitzseb commented Apr 6, 2022

pfitzseb commented Apr 6, 2022

StefanKarpinski commented Apr 6, 2022 •

edited

Loading

pfitzseb commented Apr 6, 2022

StefanKarpinski commented Apr 7, 2022

Parsing error? #333

Parsing error? #333

Comments

ChrisRackauckas commented Apr 6, 2022

ChrisRackauckas commented Apr 6, 2022 • edited Loading

simeonschaub commented Apr 6, 2022

pfitzseb commented Apr 6, 2022

pfitzseb commented Apr 6, 2022

StefanKarpinski commented Apr 6, 2022 • edited Loading

pfitzseb commented Apr 6, 2022

StefanKarpinski commented Apr 7, 2022

ChrisRackauckas commented Apr 6, 2022 •

edited

Loading

StefanKarpinski commented Apr 6, 2022 •

edited

Loading