-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Charset in a ShapeFileReader API #3464
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a nice PR!
I left a couple of comments, only one of them is critical.
Could I ask you to sign ECA please to make Eclipse license checker happy?
(Eclipse grabs your name and email from the commit, so doublecheck that you commited under the correct email (that was used to sign ECA), right now the ECA is required for (vlla@****.com) email).
shapefile/src/main/scala/geotrellis/shapefile/ShapeFileReader.scala
Outdated
Show resolved
Hide resolved
shapefile/src/test/scala/geotrellis/shapefile/ShapeFileReaderSpec.scala
Outdated
Show resolved
Hide resolved
shapefile/src/test/scala/geotrellis/shapefile/ShapeFileReaderSpec.scala
Outdated
Show resolved
Hide resolved
Let's do the following:
def readSimpleFeatures(path: String): Seq[SimpleFeature] = readSimpleFeatures(new URL(s"file://${new File(path).getAbsolutePath}"), DEFAULT_CHARSET)
def readSimpleFeatures(path: String, charSet: Charset): Seq[SimpleFeature] = readSimpleFeatures(new URL(s"file://${new File(path).getAbsolutePath}"), charSet)
def readSimpleFeatures(url: URL): Seq[SimpleFeature] = readSimpleFeatures(url, DEFAULT_CHARSET)
def readSimpleFeatures(url: URL, charSet: Charset): Seq[SimpleFeature] = ???
// but also all user API methods should have overloads
// ... (just an example below, all polygon methods should have the charset support)
def readMultiPolygonFeatures(path: String): Seq[MultiPolygonFeature[Map[String,Object]]] =
readMultiPolygonFeatures(path, DEFAULT_CHARSET)
def readMultiPolygonFeatures(path: String, charSet: Charset): Seq[MultiPolygonFeature[Map[String,Object]]] =
readSimpleFeatures(path, charSet)
.flatMap { ft => ft.geom[MultiPolygon].map(MultiPolygonFeature(_, ft.attributeMap)) }
// ... The test may look this way: class ShapeFileReaderSpec extends AnyFunSpec with Matchers {
describe("ShapeFileReader") {
it("should read MultiPolygons feature attributes") {
val path = "shapefile/data/shapefiles/demographics/demographics.shp"
val features = ShapeFileReader.readMultiPolygonFeatures(path)
features.size should be (160)
for(MultiPolygonFeature(_: MultiPolygon, data: Map[String, Object]) <- features) {
data.keys.toSeq should be (Seq("LowIncome", "gbcode", "ename", "WorkingAge", "TotalPop", "Employment"))
}
}
// https://github.com/locationtech/geotrellis/issues/3445
it("should read UTF-8 MultiPolygons feature attributes") {
val path = "shapefile/data/shapefiles/demographics-utf8/demographics.shp"
val features = ShapeFileReader.readMultiPolygonFeatures(path, Charset.forName("UTF-8"))
features.size should be (160)
features.take(4).map(_.data("ename").asInstanceOf[String]) shouldBe Seq("南关街道", "七里烟香", "谢庄镇", "Cheng Guan Zhen")
}
}
} Note: I pushed a test file that should work with the test above. |
OOPS, i did not see your later comments and tried to fix the typo....sorry! |
I have made all the changes that you had mentioned. Please let me know if I have still missed anything. |
shapefile/src/test/scala/geotrellis/shapefile/ShapeFileReaderSpec.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Vijay! LGTM, could you rm the comment from tests? I can update the CHANGELOG for you myself to show you how it is done.
Thank you for the patient guidance! It has been very educational. |
Ah great! I said in the other issue that I'd make a pull request for this but then got distracted by life. Sorry about that, but I'm glad to see @vlulla jumped in and did it! |
Overview
This PR makes charset configurable, but defaults to
ISO-8859-1
in order not to change bahavior of the old functions.Checklist
Closes #3445