-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot parse EMPTY token with white space #1013
Comments
Hi Paul, Thanks for the example. I could easily reproduce it Python, using shapely.from_wkt
Within the Geos code base, it certainly seems like the intention was to ignore whitespace, as geos/src/io/StringTokenizer.cpp Line 145 in 0aef713
StringTokenizer::nextToken
Possible explanationWithin Line 331 in 0aef713
tokenizer->peekNextToken does return the enum StringTokenizer::TT_WORD , or perhaps due to some unlikely, very subtle bug it incorrectly returns StringTokenizer::TT_NUMBER . The third, error handling, else clause has nothing to do with it. In either scenario, flow continues to call WKTReader::getPreciseCoordinate (either directly in the number case), or if treated as TT_WORD) probably via calls to WKTReader::readPointText and WKTReader::getCoordinates . This calls WKTReader::getNextNumber
The Exception seen in Shapely above: Line 140 in 0aef713
If Line 66 in 0aef713
in WKTReader::getNextEmptyOrOpener , within the call therein to ``WKTReader::getNextWord, the call to tokenizer->nextToken()` , which advances the iterator, should instead call `tokenizer->peekNextToken`?
All those functions are fundamental to the module though, well tested, and used by so many other WKT types (e.g. WKTReader::getCoordinates is also used by POINT, LINESTRING and LINEARRING and POLYGON) that very strong justification is needed before changing them. A special function for Multipoint could be added but that would duplicate code and increase maintenance work. Doing it properly, there's a big risk of breaking something else in fixing this particular bug. So before I spend even longer digging in to this, please could someone confirm what the intended scope of the flavour of WKT that Geos will parse is? In the Geos WKT grammar spec, EMPTY is not supported -within the parens- in Multipoints at all (meaning a Multipoint is either a non-empty sequence of points, or it's empty. And a point cannot be empty):
https://libgeos.org/specifications/wkt/ So yes this is a bug. But please can someone either confirm or rule out, that the bug is actually because EMPTY should not be supported within the parens of a MULTIPOINT at all? Parsers can be free to choose to parse a superset of a standard (if that standard doesn't require an error to be raised on invalid grammar). WKT is intended to be human readable. Either requiring structural whitespace or forbidding it would both defeat that very purpose. More importantly, in the official WKT spec, section 7.2.2 (2D geom only), multipoints are defined as empty, or a non-empty sequence of point texts, and point texts may be empty:
https://www.ogc.org/standard/sfa/ So perhaps my point is just a quibble that the BNF for WKT in the geos documentation needs making consistent with (the OG) the OGC's. |
This does not parse
this does
Here's a test case
The text was updated successfully, but these errors were encountered: