Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another common abbreviation for Level #6

Open
poorlymac opened this issue Aug 16, 2019 · 1 comment
Open

Another common abbreviation for Level #6

poorlymac opened this issue Aug 16, 2019 · 1 comment

Comments

@poorlymac
Copy link

Hi,

A (unfortunately) common abbreviation for level I have come across is a simple L. For example : UNIT 900, L 9, 50 THINGO ST, HOOHAAVILLE, VIC 3000. I even tried adding L to lookups.py and deleting the cache but to no avail. The kind of result I get is :
"flat_number": "9009",
"flat_number_prefix": "L",
"flat_type": "UNIT",
"locality_name": "HOOHAAVILLE",
"number_first": "50",
"original": "UNIT 900, L 9, 50 THINGO ST, HOOHAAVILLE, VIC 3000",
"postcode": "3000",
"state": "VICTORIA",
"street_name": "THINGO",
"street_type": "STREET"

or L9 with no space drags the 9 into the 50.

Is there a way to get L in and recognised?

@jasonrig
Copy link
Owner

As with issue #5 I think this will come down to tuning the address generation code used during training. The reason your approach of adding L to lookups.py didn't work is that the model is assigning a high probability of L being a unit number prefix, and a high probability of 900 and 9 both being the unit number itself (so the code concatenates them since it groups letters of the same class).

Perhaps issue #5 and #6 can be worked on together since they both would involve tuning the address synthesis code.

If you wanted to have a play around and see if anything works for you, I would try to work through the synthesise_address function. It's a little ugly, but you can see that for a given clean record from GNAF, it mutates the string in various ways while keeping track of what each character's class (unit type, street name, etc.) is. The goal would be to include more examples of the cases you're seeing that fail when performing these random mutations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants