-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot scrap documents into meilisearch #26
Comments
Hello @kappa-wingman! Thanks for using MeiliSearch and docs-scraper! I succeeded to reproduce your issue with the It looks like it does not work because you haven't set any Temporary solution@kappa-wingman If you cannot wait for the fix, you can use the docker image $ docker run -t --rm \
-e MEILISEARCH_HOST_URL=<your-meilisearch-host-url> \
-e MEILISEARCH_API_KEY=<your-meilisearch-api-key> \
-v <absolute-path-to-your-config-file>:/docs-scraper/config.json \
getmeili/docs-scraper:v0.9.0 pipenv run ./docs_scraper config.json Sorry about this issue, and thanks a lot for your report!! 😁 Fix the bug@bidoubiwa, it looks like it's linked to the PR you did about the custom settings (#22). Did you test that it works if you don't pass any |
Thanks for the quick reply. I can import documents now. |
Hey! I'm on it
The tests are not testing if the scraper works. Since the error comes from just before the scrape the tests don't go through there. |
@bidoubiwa I let you open an issue about improving the test in docs-scraper :) |
@kappa-wingman, thanks to @bidoubiwa this bug is now fixed. |
Greetings,
I am using Pelican as my blog generator.
I am not using Ubuntu so I need to use docker to run the doc-scraper.
I can run the small tutorial and I can import data into the meilisearch.
But I cannot run the doc-scraper to get data into meilisearch.
Below is the error:
Traceback (most recent call last):
File "./docs_scraper", line 22, in
run_config(sys.argv[1])
File "/docs-scraper/scraper/src/index.py", line 43, in run_config
config.custom_settings
File "/docs-scraper/scraper/src/meilisearch_helper.py", line 108, in init
settings = {**MeiliSearchHelper.SETTINGS, **custom_settings}
TypeError: 'NoneType' object is not a mapping
Below is my config.json
{
"index_uid": "docs",
"sitemap_urls": ["https://www.kappawingman.com/sitemap.xml"],
"start_urls": ["https://www.kappawingman.com"],
"selectors": {
"lvl0": {
"selector": ".entry-content",
"global": true,
"default_value": "Documentation"
},
"lvl1": "#main_content h1",
"lvl2": ".toc-backref h2",
"lvl3": ".toc-backref h3",
"text": ".entry-content p, .entry-content li"
},
"strip_chars": " .,;:#",
"scrap_start_urls": true
}
On the meilisearch console, I saw these messages:
[2020-06-09T16:15:34Z INFO tide::middleware::logger] DELETE /indexes/docs 204 17ms
[2020-06-09T16:15:34Z INFO tide::middleware::logger] POST /indexes 201 15ms
Any help would be appreciated, thanks.
The text was updated successfully, but these errors were encountered: