Remove semantic parsing code #3207

matt-gardner · 2019-08-28T00:19:46Z

I've started setting up github.com/allenai/allennlp-semparse (not ready yet, but working on it). Once we're ready, this PR will remove all semantic parsing code, dependencies, and tests from the main allennlp repo. This should happen when we're ready for a 1.0 release. I think we should have a 0.9 release before that with the interpret code and other stuff, so it'll be a little bit before this should be merged. Just seeing what this does to CI.

epwalsh · 2019-08-28T22:35:36Z

I'm not sure if this has been discussed yet, but how about naming external submodules in the form allennlp.submodule instead of allennlp_submodule? So allennlp_semparse would be allennlp.semparse. The repo name could stay the same, but the project would be structured like this:

      1 .
      2 └── allennlp
      3     └── semparse
      4         └── __init__.py

This is how arxiv organizes all of their Python projects. See https://github.com/arXiv/arxiv-canonical for example.

matt-gardner · 2019-08-28T22:49:38Z

I was avoiding that structure because it did not seem like it would work if, e.g., you pip-installed several allennlp packages. Does it work? That seems totally crazy. How does python manage resolution? E.g., from allennlp import semparse. Where does it look?

I'm not opposed to this on principle if it works, though it'd be helpful to think through the pros and cons of this setup.

matt-gardner · 2019-08-28T22:51:06Z

One con: we were intending these separate packages to be examples of how someone might use allennlp in their own modeling project. There's no need to duplicate allennlp's package structure in your own project, and it's overly confusing if we have all of these nested packages in a separate repo, just to keep things in line with how they were when they were all combined in the main repo.

epwalsh · 2019-08-28T22:54:50Z

I was avoiding that structure because it did not seem like it would work if, e.g., you pip-installed several allennlp packages. Does it work? That seems totally crazy. How does python manage resolution? E.g., from allennlp import semparse. Where does it look?

I'm not opposed to this on principle if it works, though it'd be helpful to think through the pros and cons of this setup.

You would have to import it as import allennlp.semparse as semparse or something like that. I believe from allennlp import semparse would not work.

matt-gardner · 2019-08-28T22:58:23Z

You would have to import it as import allennlp.semparse as semparse or something like that. I believe from allennlp import semparse would not work.

Even that confuses me, I would expect the lookup on allennlp.semparse to collide in crazy ways between the two packages. But if you say it works, I'll believe you =). I'm just surprised that python can figure that out.

epwalsh · 2019-08-28T23:06:47Z

I just double-checked for sanity, and yes, it really does work :)

DeNeutoy

Woo hoo, nice one 🎆

Can you also remove the now unneeded libraries from setup.py too?

https://github.com/allenai/allennlp/blob/master/setup.py

DeNeutoy · 2019-10-10T22:43:53Z

requirements.txt

-
-# Used by semantic parsing code to format and postprocess SQL
-sqlparse>=0.2.4
-
 # For text normalization
 ftfy


These text normalisation libraries can also go

This one is actually used by the OpenAI word splitter, not the semparse code.

DeNeutoy · 2019-10-10T22:44:26Z

requirements.txt

@@ -49,12 +49,6 @@ gevent>=1.3.6
 # Used by semantic parsing code to strip diacritics from unicode strings.
 unidecode


I don't think we need this unidecode library apart from for the semparse code

This one could have been removed when we got rid of the SEMPRE integration, looks like. Removing it now.

matt-gardner · 2019-10-10T23:59:19Z

@brendan-ai2, with this merged, we're now in a position to set up CI for sniff tests on the sub-repos.

brendan-ai2 · 2019-10-11T19:37:19Z

@matt-gardner, that's great! Could you temporarily add me as an admin on https://github.com/allenai/allennlp-semparse so I can look at how the CI is configured there? I'll try to replicate Github's CI in TC.

matt-gardner · 2019-10-11T19:51:18Z

I just gave you admin privileges, but I don't think you actually need it - you can just look here: https://github.com/allenai/allennlp-semparse/blob/master/.github/workflows/main.yml. Though, thinking about this more, I think the right thing to do is to create a single allennlp-hub repo (or similar), and have that one pip install all of the sub repos, and have something like our current pretrained.py, and our sniff tests. Then our CI can pip install allennlp-hub and run the sniff tests from there. That should simplify things a lot.

brendan-ai2 · 2019-10-11T22:34:33Z

Sounds like a good plan. Thanks for laying it out! I've assigned myself to #3351.

* Remove semantic parsing code * Fix module imports, remove more tests * More fixes * fix predict test * fix another test * remove more docs * last doc errors, i think... * Remove unnecessary requirements * Removing some fixutre data that I missed earlier * Fix merge conflicts with black * More merge conflicts, pin to pytorch 1.2 * black merge conflicts * Remove unidecode

matt-gardner added 9 commits October 10, 2019 09:13

Remove semantic parsing code

5336944

Fix module imports, remove more tests

bbb8ef2

More fixes

c0382f4

fix predict test

816274a

fix another test

d47a428

remove more docs

f026ca4

last doc errors, i think...

7ad8bab

Remove unnecessary requirements

47c9c9f

Removing some fixutre data that I missed earlier

f25ad04

matt-gardner force-pushed the remove_semparse branch from c29f7d7 to f25ad04 Compare October 10, 2019 16:15

matt-gardner changed the title ~~[WIP] Remove semantic parsing code~~ Remove semantic parsing code Oct 10, 2019

matt-gardner added 3 commits October 10, 2019 11:29

Fix merge conflicts with black

091e350

More merge conflicts, pin to pytorch 1.2

cab42ff

black merge conflicts

1dec91d

matt-gardner requested a review from DeNeutoy October 10, 2019 20:15

DeNeutoy approved these changes Oct 10, 2019

View reviewed changes

Remove unidecode

4fc9a93

matt-gardner merged commit 841ec87 into allenai:master Oct 10, 2019

matt-gardner deleted the remove_semparse branch October 10, 2019 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove semantic parsing code #3207

Remove semantic parsing code #3207

matt-gardner commented Aug 28, 2019

epwalsh commented Aug 28, 2019

matt-gardner commented Aug 28, 2019

matt-gardner commented Aug 28, 2019

epwalsh commented Aug 28, 2019

matt-gardner commented Aug 28, 2019

epwalsh commented Aug 28, 2019

DeNeutoy left a comment

DeNeutoy Oct 10, 2019

matt-gardner Oct 10, 2019

DeNeutoy Oct 10, 2019

matt-gardner Oct 10, 2019

matt-gardner commented Oct 10, 2019

brendan-ai2 commented Oct 11, 2019

matt-gardner commented Oct 11, 2019

brendan-ai2 commented Oct 11, 2019

		@@ -49,12 +49,6 @@ gevent>=1.3.6
		# Used by semantic parsing code to strip diacritics from unicode strings.
		unidecode

Remove semantic parsing code #3207

Remove semantic parsing code #3207

Conversation

matt-gardner commented Aug 28, 2019

epwalsh commented Aug 28, 2019

matt-gardner commented Aug 28, 2019

matt-gardner commented Aug 28, 2019

epwalsh commented Aug 28, 2019

matt-gardner commented Aug 28, 2019

epwalsh commented Aug 28, 2019

DeNeutoy left a comment

Choose a reason for hiding this comment

DeNeutoy Oct 10, 2019

Choose a reason for hiding this comment

matt-gardner Oct 10, 2019

Choose a reason for hiding this comment

DeNeutoy Oct 10, 2019

Choose a reason for hiding this comment

matt-gardner Oct 10, 2019

Choose a reason for hiding this comment

matt-gardner commented Oct 10, 2019

brendan-ai2 commented Oct 11, 2019

matt-gardner commented Oct 11, 2019

brendan-ai2 commented Oct 11, 2019