Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Charset in a ShapeFileReader API #3464

Merged
merged 4 commits into from
May 12, 2022

Conversation

vlulla
Copy link
Contributor

@vlulla vlulla commented May 10, 2022

Overview

This PR makes charset configurable, but defaults to ISO-8859-1 in order not to change bahavior of the old functions.

Checklist

  • ./CHANGELOG.md updated, if necessary. Link to the issue if closed, otherwise the PR.
  • Unit tests added for bug-fix or new feature

Closes #3445

Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a nice PR!

I left a couple of comments, only one of them is critical.

Could I ask you to sign ECA please to make Eclipse license checker happy?

(Eclipse grabs your name and email from the commit, so doublecheck that you commited under the correct email (that was used to sign ECA), right now the ECA is required for (vlla@****.com) email).

@pomadchin
Copy link
Member

Let's do the following:

  1. We need a test that reproduces the error
  2. We need a test that makes sure that the reproduced error is fixed
  3. It looks like the change is slighly larger than two methods addition:
def readSimpleFeatures(path: String): Seq[SimpleFeature] = readSimpleFeatures(new URL(s"file://${new File(path).getAbsolutePath}"), DEFAULT_CHARSET)
def readSimpleFeatures(path: String, charSet: Charset): Seq[SimpleFeature] = readSimpleFeatures(new URL(s"file://${new File(path).getAbsolutePath}"), charSet)

def readSimpleFeatures(url: URL): Seq[SimpleFeature] = readSimpleFeatures(url, DEFAULT_CHARSET)
def readSimpleFeatures(url: URL, charSet: Charset): Seq[SimpleFeature] = ???

// but also all user API methods should have overloads

// ... (just an example below, all polygon methods should have the charset support) 
def readMultiPolygonFeatures(path: String): Seq[MultiPolygonFeature[Map[String,Object]]] =
  readMultiPolygonFeatures(path, DEFAULT_CHARSET)
def readMultiPolygonFeatures(path: String, charSet: Charset): Seq[MultiPolygonFeature[Map[String,Object]]] =
  readSimpleFeatures(path, charSet)
    .flatMap { ft => ft.geom[MultiPolygon].map(MultiPolygonFeature(_, ft.attributeMap)) }
// ...

The test may look this way:

class ShapeFileReaderSpec extends AnyFunSpec with Matchers {
  describe("ShapeFileReader") {
    it("should read MultiPolygons feature attributes") {
      val path = "shapefile/data/shapefiles/demographics/demographics.shp"
      val features = ShapeFileReader.readMultiPolygonFeatures(path)
      features.size should be (160)
      for(MultiPolygonFeature(_: MultiPolygon, data: Map[String, Object]) <- features) {
        data.keys.toSeq should be (Seq("LowIncome", "gbcode", "ename", "WorkingAge", "TotalPop", "Employment"))
      }
    }

    // https://github.com/locationtech/geotrellis/issues/3445
    it("should read UTF-8 MultiPolygons feature attributes") {
      val path = "shapefile/data/shapefiles/demographics-utf8/demographics.shp"
      val features = ShapeFileReader.readMultiPolygonFeatures(path, Charset.forName("UTF-8"))
      features.size should be (160)

      features.take(4).map(_.data("ename").asInstanceOf[String]) shouldBe Seq("南关街道", "七里烟香", "谢庄镇", "Cheng Guan Zhen")
    }
  }
}

Note: I pushed a test file that should work with the test above.

@pomadchin pomadchin changed the title fix: issue 3445 Expose Charset in a ShapeFileReader API May 11, 2022
@vlulla
Copy link
Contributor Author

vlulla commented May 11, 2022

OOPS, i did not see your later comments and tried to fix the typo....sorry!

@vlulla
Copy link
Contributor Author

vlulla commented May 12, 2022

I have made all the changes that you had mentioned. Please let me know if I have still missed anything.

@vlulla vlulla requested a review from pomadchin May 12, 2022 15:36
Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Vijay! LGTM, could you rm the comment from tests? I can update the CHANGELOG for you myself to show you how it is done.

@vlulla
Copy link
Contributor Author

vlulla commented May 12, 2022

Thank you for the patient guidance! It has been very educational.

@pomadchin pomadchin merged commit 04dc66c into locationtech:master May 12, 2022
@axb21
Copy link

axb21 commented May 12, 2022

Ah great! I said in the other issue that I'd make a pull request for this but then got distracted by life. Sorry about that, but I'm glad to see @vlulla jumped in and did it!

@vlulla vlulla deleted the fix/issue-3445 branch May 12, 2022 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants