Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot scrap documents into meilisearch #26

Closed
kappa-wingman opened this issue Jun 9, 2020 · 5 comments · Fixed by #28
Closed

Cannot scrap documents into meilisearch #26

kappa-wingman opened this issue Jun 9, 2020 · 5 comments · Fixed by #28
Assignees
Labels
bug Something isn't working

Comments

@kappa-wingman
Copy link

Greetings,

I am using Pelican as my blog generator.
I am not using Ubuntu so I need to use docker to run the doc-scraper.

I can run the small tutorial and I can import data into the meilisearch.

But I cannot run the doc-scraper to get data into meilisearch.
Below is the error:

Traceback (most recent call last):
File "./docs_scraper", line 22, in
run_config(sys.argv[1])
File "/docs-scraper/scraper/src/index.py", line 43, in run_config
config.custom_settings
File "/docs-scraper/scraper/src/meilisearch_helper.py", line 108, in init
settings = {**MeiliSearchHelper.SETTINGS, **custom_settings}
TypeError: 'NoneType' object is not a mapping

Below is my config.json
{
"index_uid": "docs",
"sitemap_urls": ["https://www.kappawingman.com/sitemap.xml"],
"start_urls": ["https://www.kappawingman.com"],
"selectors": {
"lvl0": {
"selector": ".entry-content",
"global": true,
"default_value": "Documentation"
},
"lvl1": "#main_content h1",
"lvl2": ".toc-backref h2",
"lvl3": ".toc-backref h3",
"text": ".entry-content p, .entry-content li"
},
"strip_chars": " .,;:#",
"scrap_start_urls": true
}

On the meilisearch console, I saw these messages:
[2020-06-09T16:15:34Z INFO tide::middleware::logger] DELETE /indexes/docs 204 17ms
[2020-06-09T16:15:34Z INFO tide::middleware::logger] POST /indexes 201 15ms

Any help would be appreciated, thanks.

@curquiza
Copy link
Member

curquiza commented Jun 9, 2020

Hello @kappa-wingman! Thanks for using MeiliSearch and docs-scraper!

I succeeded to reproduce your issue with the v0.9.1 (or the current latest) docker image.

It looks like it does not work because you haven't set any custom_settings fields, but this field is not mandatory and you should NOT get any error. Sorry about that!

Temporary solution

@kappa-wingman If you cannot wait for the fix, you can use the docker image v0.9.0 that should work without any custom_settings fields set.

$ docker run -t --rm \
    -e MEILISEARCH_HOST_URL=<your-meilisearch-host-url> \
    -e MEILISEARCH_API_KEY=<your-meilisearch-api-key> \
    -v <absolute-path-to-your-config-file>:/docs-scraper/config.json \
    getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json

Sorry about this issue, and thanks a lot for your report!! 😁

Fix the bug

@bidoubiwa, it looks like it's linked to the PR you did about the custom settings (#22). Did you test that it works if you don't pass any custom_settings field in the config file? Because I've tested it and it does not work.
Tell me if I'm wrong. I assign you on the issue until you tell me the error does not come from that 😉
(Edit: weird that the tests did not notice that, if you can investigate on that at the same time, it's great 😊)

@kappa-wingman
Copy link
Author

Thanks for the quick reply. I can import documents now.

@kappa-wingman kappa-wingman changed the title Cannot scrap documents info meilisearch Cannot scrap documents into meilisearch Jun 9, 2020
@bidoubiwa
Copy link
Contributor

Hey! I'm on it

(Edit: weird that the tests did not notice that, if you can investigate on that at the same time, it's great 😊)

The tests are not testing if the scraper works. Since the error comes from just before the scrape the tests don't go through there.

@curquiza
Copy link
Member

@bidoubiwa I let you open an issue about improving the test in docs-scraper :)

@curquiza
Copy link
Member

@kappa-wingman, thanks to @bidoubiwa this bug is now fixed. master and the Docker image latest are now up-to-date.
And the new Docker image v0.9.2 is now available! 🎉
Thanks again for your report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants