Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TWKB read and write implementation #452

Closed
wants to merge 28 commits into from
Closed

Conversation

groldan
Copy link
Member

@groldan groldan commented Jul 19, 2019

I have picked up where James and Jody left for TWKBReader and TWKBWriter and finished it with full compatibility with the implementation available in PostGIS.

I've also added a couple optimizations that are present in the spec but that the PostGIS version seems not to follow, for which I would still like to write some more unit tests, and hence I'm submitting this PR as a draft for the time being, for consideration or any discussion needed.

Most test cases use pre-computed data from PostGIS. Check the generate_test_data.sql file, which generates .csv files with expected inputs and output for all combinations of geometry type, xy/z/m precision, and other encoding parameters.

I'll be working on adding more documentation and test cases for the optimizations and then move the PR out of the draft state.

Being my first contribution to JTS I'm looking forward to comments on any aspect of it.

Cheers,
Gabriel.

jnh5y and others added 23 commits July 18, 2019 22:44
Signed-off-by: Jim Hughes <[email protected]>
Signed-off-by: Jim Hughes <[email protected]>
Signed-off-by: Jim Hughes <[email protected]>
Signed-off-by: Jim Hughes <[email protected]>
Signed-off-by: Jim Hughes <[email protected]>
Signed-off-by: Gabriel Roldan <[email protected]>
This class, coming from [1], reads and writes variable
length, zig-zag encoded integers, has already been
accepted by LocationTech's IP review on the Geogig project,
and lets us get rid of the protocol buffers dependency
which is only being used to read and write varints.

[1] <https://svn.apache.org/repos/asf/mahout/branches/mahout-0.8/core/src/main/java/org/apache/mahout/math/Varint.java>

Signed-off-by: Gabriel Roldan <[email protected]>
@groldan
Copy link
Member Author

groldan commented Jul 19, 2019

Fixes #246 BTW

groldan added 2 commits July 18, 2019 23:17
Pre-computed data from PostGIS. Check the generate_test_data.sql file,
which generates .csv files with expected inputs and output for all
combinations of geometry type, xy/z/m precision, and other encoding parameters.

Signed-off-by: Gabriel Roldan <[email protected]>
Remove the dependency on Google protocol buffers
and finish the implementation of TWKB reader and
writer.

Signed-off-by: Gabriel Roldan <[email protected]>
@groldan
Copy link
Member Author

groldan commented Jul 19, 2019

@aaime I just found there's a TWKB reader in gt-postgis, didn't know that. Made some tests, seems it has some issues though. You might be interested in this implementation that seems more robust.

The test case bellow parses a geometry with the JTS and the GeoTools TWKB readers, with this output:

input: POINT ZM(2147483647.1234567 2147483648.123457 1 2)
JTS TWKB reader: POINT ZM(2147483647.1234567 2147483648.123457 1 2)
GT  TWKB reader: POINT Z(213.8718216 0.1234568 1)
public @Test void checkJtsVsGtReader() throws Exception {
        long largeOrdinate = Integer.MAX_VALUE;
        String wkt = String.format("POINT ZM(%s.12345678 %s.12345678 1 2)", largeOrdinate,
                largeOrdinate + 1);
        WKTReader wktReader = new WKTReader();
        wktReader.setIsOldJtsCoordinateSyntaxAllowed(false);
        Geometry geom = wktReader.read(wkt);
        System.err.println("input: " + new WKTWriter(4).write(geom));

        twkbWriter.setXYPrecision(7).setZPrecision(7).setMPrecision(7);

        byte[] twkb = twkbWriter.write(geom);

        Geometry twkbRead = twkbReader.read(twkb);
        System.err.println("JTS TWKB reader: " + new WKTWriter(4).write(twkbRead));

        Geometry gtTwkbRead = geoToolsTWKBReader.read(twkb);
        System.err.println("GT  TWKB reader: " + new WKTWriter(4).write(gtTwkbRead));

        assertEquals(geom, twkbRead);
        assertEquals(geom, gtTwkbRead);
    }

The GT parser will also choke on TWKB byte strings that contain an "idlist", like one generated with the following SQL:

    SELECT ST_AsText(ST_GeomFromTWKB( ST_AsTWKB(ARRAY['POINT(0 0)'::Geometry, 'POINT(1 1)'::Geometry], '{1,2}'::bigint[], 0, 0, 0, false, false) )),
    ST_AsTWKB(ARRAY['POINT(0 0)'::Geometry, 'POINT(1 1)'::Geometry], '{1,2}'::bigint[], 0, 0, 0, false, false);
      st_astext      |      st_astwkb       
    ---------------------+----------------------
    MULTIPOINT(0 0,1 1) | \x040402020400000202

Ok, it won't com like that from the queries made by gt-postgis, but still. The following test case:

public @Test void testParseWithIdList() throws Exception {
        byte[] twkb = WKBReader.hexToBytes("040402020400000202");
        Geometry expected = new WKTReader().read("MULTIPOINT(0 0, 1 1)");
        Geometry parsedJTS = twkbReader.read(twkb);
        assertEquals(expected, parsedJTS);
        Geometry parsedGT = geoToolsTWKBReader.read(twkb);
        assertEquals(expected, parsedGT);
    }

Fails at the last line with expected:<MULTIPOINT ((0 0), (1 1))> but was:<MULTIPOINT ((1 2), (1 2))>

It may be that the CoordinateSequenceFactory creates
a CoordinateSequence with no measures set even if
it was created for 4 dimensions. Explicitly setting
the measure fixes it. Test the reader respects the
provided GeometryFactory, if any.

Signed-off-by: Gabriel Roldan <[email protected]>
@aaime
Copy link

aaime commented Jul 19, 2019

Indeed not interested in either cases for the moment. We could switch regardless to the JTS version if it has the same or better performance characteristics (calculations, allocations), one less bit to maintain! :-D

@aaime
Copy link

aaime commented Jul 19, 2019

By the way, both @jodygarnett and @jnh5y were made aware of the GeoTools implementation, you did not talk to the JTS folks before starting this work?

@groldan
Copy link
Member Author

groldan commented Jul 19, 2019

Indeed not interested in either cases for the moment.

Oh, the first one is a case of correctness, though it should be an easy fix. Basically where the spec says "varint" it means "varlong", and the gt reader is discarding the longer values, chopping ordinates at 32-bit ints. See how it parsed to 213.876..:

input: POINT ZM(2147483647.1234567 2147483648.123457 1 2)
JTS TWKB reader: POINT ZM(2147483647.1234567 2147483648.123457 1 2)
GT  TWKB reader: POINT Z(213.8718216 0.1234568 1)

We could switch regardless to the JTS version if it has the same or better performance characteristics (calculations, allocations), one less bit to maintain! :-D

Being the first iteration, been more focused on correctness, so we'll see. Didn't have time to compare parsing performance nor optimize yet.

@groldan
Copy link
Member Author

groldan commented Jul 19, 2019

By the way, both @jodygarnett and @jnh5y were made aware of the GeoTools implementation, you did not talk to the JTS folks before starting this work?

I did not! started as a weekend/bored time playground and as I've seen the branch was incomplete I just kept going. In any case the GT one is only a parser, no encoder, and it'd be good to have both in JTS just like there exist for WKT and WKB.

@dr-jts
Copy link
Contributor

dr-jts commented Jul 19, 2019

Nice to see a PR from you @groldan !

Will be a while until I can give this a thorough review. But one thing I think is that this should be moved down into a jts.io.twkb package. The base jts.io package is getting a bit crowded...

@groldan
Copy link
Member Author

groldan commented Jul 20, 2019

Hi @dr-jts, thanks, long time no see.

one thing I think is that this should be moved down into a jts.io.twkb package. The base jts.io package is getting a bit crowded...

Agreed and done.

@murdos
Copy link
Contributor

murdos commented Jan 3, 2022

Hi! Are there chance this PR being reviewed and merged? Having this feature in JTS would be nice 😃

@mprins
Copy link
Contributor

mprins commented Sep 26, 2022

I think this can be closed? considering #854

@dr-jts dr-jts closed this Sep 26, 2022
@groldan groldan deleted the twkb branch September 27, 2022 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants