Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Remove semantic parsing code #3207

Merged
merged 13 commits into from
Oct 10, 2019
Merged

Conversation

matt-gardner
Copy link
Contributor

I've started setting up github.com/allenai/allennlp-semparse (not ready yet, but working on it). Once we're ready, this PR will remove all semantic parsing code, dependencies, and tests from the main allennlp repo. This should happen when we're ready for a 1.0 release. I think we should have a 0.9 release before that with the interpret code and other stuff, so it'll be a little bit before this should be merged. Just seeing what this does to CI.

@epwalsh
Copy link
Member

epwalsh commented Aug 28, 2019

I'm not sure if this has been discussed yet, but how about naming external submodules in the form allennlp.submodule instead of allennlp_submodule? So allennlp_semparse would be allennlp.semparse. The repo name could stay the same, but the project would be structured like this:

      1 .
      2 └── allennlp
      3     └── semparse
      4         └── __init__.py

This is how arxiv organizes all of their Python projects. See https://github.com/arXiv/arxiv-canonical for example.

@matt-gardner
Copy link
Contributor Author

I was avoiding that structure because it did not seem like it would work if, e.g., you pip-installed several allennlp packages. Does it work? That seems totally crazy. How does python manage resolution? E.g., from allennlp import semparse. Where does it look?

I'm not opposed to this on principle if it works, though it'd be helpful to think through the pros and cons of this setup.

@matt-gardner
Copy link
Contributor Author

One con: we were intending these separate packages to be examples of how someone might use allennlp in their own modeling project. There's no need to duplicate allennlp's package structure in your own project, and it's overly confusing if we have all of these nested packages in a separate repo, just to keep things in line with how they were when they were all combined in the main repo.

@epwalsh
Copy link
Member

epwalsh commented Aug 28, 2019

I was avoiding that structure because it did not seem like it would work if, e.g., you pip-installed several allennlp packages. Does it work? That seems totally crazy. How does python manage resolution? E.g., from allennlp import semparse. Where does it look?

I'm not opposed to this on principle if it works, though it'd be helpful to think through the pros and cons of this setup.

You would have to import it as import allennlp.semparse as semparse or something like that. I believe from allennlp import semparse would not work.

@matt-gardner
Copy link
Contributor Author

You would have to import it as import allennlp.semparse as semparse or something like that. I believe from allennlp import semparse would not work.

Even that confuses me, I would expect the lookup on allennlp.semparse to collide in crazy ways between the two packages. But if you say it works, I'll believe you =). I'm just surprised that python can figure that out.

@epwalsh
Copy link
Member

epwalsh commented Aug 28, 2019

I just double-checked for sanity, and yes, it really does work :)

@matt-gardner matt-gardner changed the title [WIP] Remove semantic parsing code Remove semantic parsing code Oct 10, 2019
@matt-gardner matt-gardner requested a review from DeNeutoy October 10, 2019 20:15
Copy link
Contributor

@DeNeutoy DeNeutoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo hoo, nice one 🎆

Can you also remove the now unneeded libraries from setup.py too?

https://github.com/allenai/allennlp/blob/master/setup.py


# Used by semantic parsing code to format and postprocess SQL
sqlparse>=0.2.4

# For text normalization
ftfy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These text normalisation libraries can also go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is actually used by the OpenAI word splitter, not the semparse code.

requirements.txt Outdated
@@ -49,12 +49,6 @@ gevent>=1.3.6
# Used by semantic parsing code to strip diacritics from unicode strings.
unidecode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this unidecode library apart from for the semparse code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one could have been removed when we got rid of the SEMPRE integration, looks like. Removing it now.

@matt-gardner
Copy link
Contributor Author

@brendan-ai2, with this merged, we're now in a position to set up CI for sniff tests on the sub-repos.

@matt-gardner matt-gardner merged commit 841ec87 into allenai:master Oct 10, 2019
@matt-gardner matt-gardner deleted the remove_semparse branch October 10, 2019 23:59
@brendan-ai2
Copy link
Contributor

@matt-gardner, that's great! Could you temporarily add me as an admin on https://github.com/allenai/allennlp-semparse so I can look at how the CI is configured there? I'll try to replicate Github's CI in TC.

@matt-gardner
Copy link
Contributor Author

I just gave you admin privileges, but I don't think you actually need it - you can just look here: https://github.com/allenai/allennlp-semparse/blob/master/.github/workflows/main.yml. Though, thinking about this more, I think the right thing to do is to create a single allennlp-hub repo (or similar), and have that one pip install all of the sub repos, and have something like our current pretrained.py, and our sniff tests. Then our CI can pip install allennlp-hub and run the sniff tests from there. That should simplify things a lot.

@brendan-ai2
Copy link
Contributor

Sounds like a good plan. Thanks for laying it out! I've assigned myself to #3351.

reiyw pushed a commit to reiyw/allennlp that referenced this pull request Nov 12, 2019
* Remove semantic parsing code

* Fix module imports, remove more tests

* More fixes

* fix predict test

* fix another test

* remove more docs

* last doc errors, i think...

* Remove unnecessary requirements

* Removing some fixutre data that I missed earlier

* Fix merge conflicts with black

* More merge conflicts, pin to pytorch 1.2

* black merge conflicts

* Remove unidecode
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants