Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runner.py] don't call get_application #411

Merged
merged 7 commits into from
Dec 24, 2021
Merged

Conversation

pawelmhm
Copy link
Contributor

@pawelmhm pawelmhm commented Nov 5, 2021

This is because get_application launches default Twisted reactor when called from runner, and we want to respect reactor settings from Scrapy. I also think there is some weird circularity in current code.

Fixes #377

How code in master branch works now

  1. twisted Application is launched with some reactor settings, get_application() called once
  2. server listens to requests
  3. scrapyd-client sends deploy request, POST to addversion endpoint comes
  4. webservice launches subprocess scrapyd.runner with argument list, to list Scrapy spiders
  5. this subprocess calls get_application() 2nd time, this time without twistd specific settings, so even if you pass asyncioreactor to twistd, this won't be respected, and it will launch default Twisted reactor (epoll on Linux)
  6. scrapy execute() is called, scrapy CrawlerProcess is trying to install reactor, but it realizes there is already default Twisted epoll reactor running, error is supressed
  7. scrapy raises Exception when it tries to verify that reactor from settings is equal to reactor running, exception: The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

To fix this:

on point 5) instead of calling get_application we just get Eggstorage, without application object. I think there is no need for an application object in runner which is called from Application already as subprocess. If root process is creating subprocess I do not expect it to create another object that manages root process. So now it won't call application

TODO

  • unit tests
  • check if it doesn't break anything important

this is because get_application launches default Twisted
reactor when called from runner, and we want to respect
Reactor settings from Scrapy.
@pawelmhm
Copy link
Contributor Author

pawelmhm commented Nov 5, 2021

fixes #377

@pawelmhm
Copy link
Contributor Author

pawelmhm commented Nov 9, 2021

waiting for this: #412 to be merged, after that I'll add unit test here

@jpmckinney
Copy link
Contributor

Sounds good to me. I've done a quick review of #412. Let me know if anything else you need.

@codecov
Copy link

codecov bot commented Nov 22, 2021

Codecov Report

Merging #411 (388953d) into master (2eb5409) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage   79.63%   79.67%   +0.03%     
==========================================
  Files          23       23              
  Lines        1051     1053       +2     
==========================================
+ Hits          837      839       +2     
  Misses        214      214              
Flag Coverage Δ
unittests 79.67% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
scrapyd/runner.py 97.22% <100.00%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2eb5409...388953d. Read the comment docs.

@pawelmhm
Copy link
Contributor Author

Windows tests for 3.8+ are hanging for some reason, I tested on local windows laptop, and it worked fine, not sure what is it in environment here. I don't know if it runs this test or if it is just hanging there before starting tests

image

@pawelmhm
Copy link
Contributor Author

removed windows, created issue for investigating this

@pawelmhm
Copy link
Contributor Author

pawelmhm commented Dec 10, 2021

@jpmckinney so I'll integrate this PR with current master and merge. Then we can release 1.3 version, there is a lot of fresh code available. Next step in 2022 we can release 2.0 with this #421, possibly also this #304?

@pawelmhm pawelmhm force-pushed the decouple-runner-from-app branch from cebbc36 to 388953d Compare December 14, 2021 17:45
@jpmckinney
Copy link
Contributor

@jpmckinney so I'll integrate this PR with current master and merge. Then we can release 1.3 version, there is a lot of fresh code available. Next step in 2022 we can release 2.0 with this #421, possibly also this #304?

Sounds good to me!

@pawelmhm pawelmhm merged commit b45c205 into master Dec 24, 2021
@pawelmhm pawelmhm deleted the decouple-runner-from-app branch December 24, 2021 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrapyd does not support spiders that use AsyncioSelectorReactor
2 participants