Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OSM data for geocoding in all boroughs #179

Merged
merged 71 commits into from
Nov 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
b646535
output direct to intersections.csv
danvk Nov 17, 2024
85b5437
get_intersection_center
danvk Nov 17, 2024
c09734c
rename intersection -> grid
danvk Nov 17, 2024
aa60155
generate all-intersections file
danvk Nov 17, 2024
4807a51
geocode broadway & 59
danvk Nov 17, 2024
8d6ea67
extract avenue parsing code
danvk Nov 17, 2024
9783527
exact geocoding above 125
danvk Nov 17, 2024
4159b04
checkpoint; this abstraction is breaking
danvk Nov 17, 2024
6083a61
update test logs
danvk Nov 17, 2024
a40696a
pare back logging, track parse vs. grid
danvk Nov 17, 2024
23aa26e
factor out a single grid.geocode_intersection function
danvk Nov 17, 2024
51cbdc3
Sutton Place
danvk Nov 17, 2024
326807b
Riverside Drive
danvk Nov 17, 2024
36aed0f
update test logs
danvk Nov 17, 2024
d104318
clean up logging
danvk Nov 17, 2024
b704937
debug logging
danvk Nov 17, 2024
4c106db
interpolate between streets
danvk Nov 17, 2024
892678b
track interpolations
danvk Nov 17, 2024
501d55a
loosen up street matching
danvk Nov 17, 2024
f9f84b2
parse ordinals; +26
danvk Nov 17, 2024
a8d8a45
logging
danvk Nov 20, 2024
dd005ad
generate all intersections
danvk Nov 20, 2024
3da1d0c
49532 NYC intersections
danvk Nov 20, 2024
7989b38
de-dupe on name; 46420 intersections
danvk Nov 20, 2024
6a626f0
move Grid into a class
danvk Nov 20, 2024
61f62c0
normalize Fifth Avenue -> 5th Avenue
danvk Nov 20, 2024
89cdcf8
exact intersection geocoding for OSM (all boroughs)
danvk Nov 20, 2024
99d0444
normalize second
danvk Nov 20, 2024
1d92ae4
expand_abbrevs
danvk Nov 21, 2024
d53252c
pare back normalization a bit
danvk Nov 21, 2024
aed7043
require full match for ordinal rewrite
danvk Nov 21, 2024
57ab3a3
fix St. Nicholas bug
danvk Nov 21, 2024
c8bf03a
try stripping dirs
danvk Nov 21, 2024
2d4f0e8
bug fix
danvk Nov 21, 2024
c3970ea
try double-strip; probably overkill
danvk Nov 21, 2024
036d2b9
copy-paste mode and filters for geogpt batch
danvk Nov 21, 2024
9281d2d
prompt variation asking to avoid ave/ave intersection
danvk Nov 21, 2024
30ff35e
ask for an array response
danvk Nov 21, 2024
debd7fa
Merge branch 'master' into more-intersections
danvk Nov 21, 2024
58f23bb
refactor coders
danvk Nov 21, 2024
e5ee1ab
update some tests
danvk Nov 21, 2024
9d4836c
fix special cases coder; able to repro current results
danvk Nov 21, 2024
311a276
Merge branch 'master' into more-intersections
danvk Nov 22, 2024
e14fd80
rv irrelevant change
danvk Nov 22, 2024
a0488dc
pare back to direction stripping
danvk Nov 22, 2024
ddfbc43
TODO
danvk Nov 22, 2024
01e1f08
default to images.ndjson
danvk Nov 22, 2024
536abf2
Be more careful about matching "Park Ave" not just "Park"; handle Riv…
danvk Nov 22, 2024
5b5b3ac
fix odd 144 bug
danvk Nov 22, 2024
009fadd
Exclude Central Park South/West/East/North
danvk Nov 22, 2024
bd33bf1
pare back logging, update tests
danvk Nov 22, 2024
870c914
update data, stats
danvk Nov 22, 2024
c4a909a
stats, sizes for static site
danvk Nov 22, 2024
871fdda
attempt to restore generate_intersections.csv
danvk Nov 24, 2024
3939915
write both
danvk Nov 24, 2024
54bbd99
I am confused
danvk Nov 24, 2024
ae637a2
never match St/Dr at start of street name
danvk Nov 24, 2024
1b0f8ce
so many intersection files
danvk Nov 24, 2024
e7894eb
generate all three
danvk Nov 24, 2024
6b00920
St at start is always Saint, not Street
danvk Nov 24, 2024
11e3124
update test stats
danvk Nov 24, 2024
4c54edd
pare back logging; geocode.py runs in ~5s
danvk Nov 24, 2024
50d0309
update site stats
danvk Nov 24, 2024
ea742af
Merge branch 'master' into five-boro-osm
danvk Nov 24, 2024
6b7f80d
ruff check
danvk Nov 24, 2024
0a58148
no status bar
danvk Nov 24, 2024
b3399a7
one more place
danvk Nov 24, 2024
c3e6418
spell it right
danvk Nov 24, 2024
b385f5f
consistent rounding
danvk Nov 24, 2024
899576b
update data for rounding
danvk Nov 24, 2024
39df34c
keep the old name
danvk Nov 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/e2etest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ jobs:
tar -xzf geocache.tgz
- name: Run geocoder
run: |
PYTHONPATH=. poetry run oldnyc/geocode/geocode.py --ids_filter test/random200-ids.txt --images_ndjson data/images.ndjson --output_format id-location.txt --geocode > >(tee test/random200-geocoded.txt) 2> >(tee test/random200.logs.txt >&2)
PYTHONPATH=. poetry run oldnyc/geocode/geocode.py --ids_filter test/random200-ids.txt --output_format id-location.txt --geocode --no-progress-bar > >(tee test/random200-geocoded.txt) 2> >(tee test/random200.logs.txt >&2)
# See https://stackoverflow.com/a/692407/388951 for the stdout/stderr redirection
- name: Generate intersections
run: |
export PYTHONPATH=.
poetry run python oldnyc/geocode/osm/generate_intersections.py > data/intersections.csv
poetry run python oldnyc/geocode/osm/generate_intersections.py
- name: Generate truth data
run: |
export PYTHONPATH=.
poetry run oldnyc/geocode/geocode.py --images_ndjson data/images.ndjson --output_format geojson --ids_filter data/geocode/random500-ids.txt --geocode > /tmp/images.geojson
poetry run oldnyc/geocode/geocode.py --output_format geojson --ids_filter data/geocode/random500-ids.txt --geocode > /tmp/images.geojson
poetry run oldnyc/geocode/truth/make_localturk_csv.py data/geocode/random500-ids.txt /tmp/images.geojson data/geocode/random500.csv
# We don't actually care about the diff on this file, just that make_localturk_csv.py doesn't error out.
git checkout data/geocode/random500.csv
Expand All @@ -32,7 +32,7 @@ jobs:
- name: Check performance on truth data
run: |
export PYTHONPATH=.
poetry run oldnyc/geocode/geocode.py --ids_filter data/geocode/truth-ids.txt --images_ndjson data/images.ndjson --output_format geojson --geocode > /tmp/actual.geojson
poetry run oldnyc/geocode/geocode.py --ids_filter data/geocode/truth-ids.txt --output_format geojson --geocode --no-progress-bar > /tmp/actual.geojson
poetry run oldnyc/geocode/calculate_metrics.py --stats_only --truth_data data/geocode/truth.geojson --computed_data /tmp/actual.geojson > test/geocode-performance.txt
- name: Check for diffs
run: |
Expand Down Expand Up @@ -84,7 +84,7 @@ jobs:
run: |
export PYTHONPATH=.
tar -xzf geocache.tgz
poetry run oldnyc/geocode/geocode.py --images_ndjson data/images.ndjson --lat_lon_map data/lat-lon-map.txt --output_format lat-lon-to-ids.json --geocode > data/lat-lon-to-ids.json 2> >(tee >( sed -n '/Finalizing/,$p' > test/geocoding-stats.txt) >&2)
poetry run oldnyc/geocode/geocode.py --lat_lon_map data/lat-lon-map.txt --output_format lat-lon-to-ids.json --geocode --no-progress-bar > data/lat-lon-to-ids.json 2> >(tee >( sed -n '/Finalizing/,$p' > test/geocoding-stats.txt) >&2)
- name: Generate static site
run: |
export PYTHONPATH=.
Expand Down
2 changes: 2 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ TODO:

## Repro instructions

If there are no instructions for a file here, check `e2etest.yml`.

### osm-roads.json

Run `data/nyc-named-roads.overpass-query.txt` through the Overpass API. This will produce a big JSON file that needs to be filtered. You can do this with:
Expand Down
Loading