Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal only works in main thread #48

Open
rosendyakov opened this issue Sep 28, 2022 · 5 comments
Open

signal only works in main thread #48

rosendyakov opened this issue Sep 28, 2022 · 5 comments

Comments

@rosendyakov
Copy link

Hello, I'm currently developing a very simple Flask app, running only locally and I wanted to scrape some Reddit posts, using your API. I followed the example, as it's specified in the documentation, however whenever I run my script, I get the following error:

ValueError: signal only works in main thread

I read that Flask-SocketIO package causes this, but I saw that this project uses Websocket-client, which is a different package.

Would really appreciate your input.

@mattpodolak
Copy link
Owner

hey @rosendyakov can you provide the following info, and the minimum amount of code needed to re-create the issue? This will help me as I look into this further:

python version:
flask version:
pmaw version:

@SeifReda30
Copy link

I have the same issue in deploying a web application using pmaw

File "/app/adam-radar/Python-Scripts/User Specified Scripts/Discussion Platforms/Reddit/reddit_submissions_by_keywords.py", line 111, in reddit_submissions
api_request_generator = api.search_submissions(q=keyword,after=start_time,before=end_time)
File "/home/appuser/venv/lib/python3.9/site-packages/pmaw/PushshiftAPI.py", line 77, in search_submissions
return self._search(kind="submission", **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/pmaw/PushshiftAPIBase.py", line 304, in _search
self.req.check_sigs()
File "/home/appuser/venv/lib/python3.9/site-packages/pmaw/Request.py", line 110, in check_sigs
signal.signal(getattr(signal, "SIG" + sig), self._exit)
File "/usr/local/lib/python3.9/signal.py", line 56, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))

@mike-mo
Copy link

mike-mo commented Feb 27, 2023

I am hitting the same symptom, though my setup is a little bit more involved (Azure Durable Functions), so in order to make up for the added complexity I published my repro to https://github.com/mike-mo/azure-durable-pmaw

Python version: 3.10.10
Azure Functions Core Tools Version: 4.0.5030
Azure Functions Runtime Version: 4.15.2.20177
pmaw version: 3.0.0

Same Python version and pmaw version work fine to run a basic script that fetches the information. It must be something to do with how threading is handled by these frameworks.

@CryptoRahino
Copy link

having the same issue here, i'm using multiprocessing.pool.ThreadPool to call the api function
`def run_download(subreddits: list, start_date: int, end_date: int, additional_args: dict,
working_dir: Path = None) -> DataFrame:
logger.info(f"Starting Download from Reddit using subreddits {subreddits}")
all_df = DataFrame()

with ThreadPool() as pool:
    query = {'start_date': start_date,
             'end_date': end_date,
             "working_dir": working_dir,
             **additional_args}
    lst_df =pool.starmap(_get_subreddit, [(start_date,end_date,  subreddit) for subreddit in subreddits])

    for df in lst_df:
        if df.empty:
            continue
        all_df = concat([df, all_df], axis=0)

def _get_subreddit(self, start_time: int, end_time: int, subreddit_name=None, **kwargs) -> DataFrame:
    params = self.params.copy()
    params.update(kwargs)
    subreddit_name = subreddit_name or self.subreddit_name
    df = DataFrame(
        self.api.search_submissions(subreddit=subreddit_name, since=start_time, until=end_time, **params))
    return df.drop_duplicates('id')
        
        `

i have tried this with async function but it also didn't help.
ValueError: signal only works in main thread of the main interpreter

@simoninnyc
Copy link

Did you manage to solve this? Running into a similar issue using an Azure Durable Function with Scrapy and signal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants