Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After processing saves round shape_distances in exponential notation #73

Closed
Ge-Rag opened this issue Oct 19, 2023 · 7 comments
Closed

Comments

@Ge-Rag
Copy link

Ge-Rag commented Oct 19, 2023

I am running a feed through gtfstools to remove some route types.

After saving the feed to a file I have noticed that shape distances of 100000 are in exponential notation. The feed is from norway.
https://data.public-transport.earth/gtfs/no

Before: NSB:JourneyPattern:R13-1554,2047,60.295310,11.216110,100000.0

After: NSB:JourneyPattern:R13-1554,2047,60.29531,11.21611,1e+05

The 1e+05 messes up my further processing.

I would appreciate if this could be resolved.

regards,
George

@dhersz
Copy link
Member

dhersz commented Oct 19, 2023

Hello @Ge-Rag, thanks for opening the issue.

I confirm that I can reproduce this behavior.

Upon further inspection, I can see that this behavior comes from {gtfsio}, which is ultimately caused by how data.table::fwrite() is being used there.

Could you please elaborate on how this is messing up with your processing on later stages?

@dhersz
Copy link
Member

dhersz commented Oct 19, 2023

To fix this we just need to adjust the scipen argument on fwrite(). A common practice seems to be using scipen = 999.

@Ge-Rag
Copy link
Author

Ge-Rag commented Oct 19, 2023 via email

@dhersz
Copy link
Member

dhersz commented Oct 19, 2023

A reproducible example:

mock_shapes <- data.frame(
  shape_id = c("a", "b", "c"),
  shape_pt_sequence = 1:3,
  shape_pt_lat = 40:42,
  shape_pt_lon = 40:42,
  shape_dist_traveled = c(1, 10000000, 10000001)
)

tmpdir <- tempfile()
dir.create(tmpdir)
shapes_path <- file.path(tmpdir, "shapes.txt")
data.table::fwrite(mock_shapes, shapes_path, scipen = 999)
zip_path <- zip::zipr(tempfile(fileext = ".zip"), shapes_path)

readLines(shapes_path)
#> [1] "shape_id,shape_pt_sequence,shape_pt_lat,shape_pt_lon,shape_dist_traveled"
#> [2] "a,1,40,40,1"                                                             
#> [3] "b,2,41,41,10000000"                                                      
#> [4] "c,3,42,42,10000001"

gtfs <- gtfsio::import_gtfs(zip_path)
gtfs$shapes
#>    shape_id shape_pt_sequence shape_pt_lat shape_pt_lon shape_dist_traveled
#> 1:        a                 1           40           40               1e+00
#> 2:        b                 2           41           41               1e+07
#> 3:        c                 3           42           42               1e+07

exported_gtfs_dir <- tempfile()
gtfsio::export_gtfs(gtfs, exported_gtfs_dir, as_dir = TRUE)

readLines(file.path(exported_gtfs_dir, "shapes.txt"))
#> [1] "shape_id,shape_pt_sequence,shape_pt_lat,shape_pt_lon,shape_dist_traveled"
#> [2] "a,1,40,40,1"                                                             
#> [3] "b,2,41,41,1e+07"                                                         
#> [4] "c,3,42,42,10000001"

new_zip_path <- zip::zipr(
  tempfile(fileext = ".zip"),
  file.path(exported_gtfs_dir, "shapes.txt")
)
reimported_gtfs <- gtfsio::import_gtfs(new_zip_path)

reimported_gtfs$shapes$shape_dist_traveled
#> [1] 1e+00 1e+07 1e+07
format(reimported_gtfs$shapes$shape_dist_traveled, scientific = FALSE)
#> [1] "       1" "10000000" "10000001"

@dhersz
Copy link
Member

dhersz commented Oct 19, 2023

Hello dhersz,

thanks for the quick reply.

After running an extract for trains only I have to generate shapes which I do with pfaedle. Pfaedle refuses to execute with fields containing non numerical data.

Regards,
George

Interesting to see that pfaedle can't deal with numbers in scientific notations. I tried other csv parsers in R (read.csv(), readr::read_csv()) and they seem to have no problems dealing with it.

Still, I'll send a patch to {gtfsio} to change the current behavior.

@dhersz
Copy link
Member

dhersz commented Oct 20, 2023

Hi @Ge-Rag, the new version of {gtfsio} on CRAN should contain a fix for this issue. Can you try updating the version you're using (install.packages("gtfsio")) and check if the issue is in fact gone, please?

@Ge-Rag
Copy link
Author

Ge-Rag commented Oct 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants