Skip to content
Oliver Beckstein edited this page Oct 13, 2021 · 20 revisions

The search functionality is provided by algolia and is known as algolia DocSearch

Documentation

Hosted search

We are using the hosted search option where Algolia runs the docsearch-scraper.

specific issues

docsearch-scraper

One can run the scraper by oneself and then serve that index. That's also recommended for debugging. If we do this, here are links to get started:

Relevant issues

For details, look through the issue comments

  • add search box #73
  • restrict DocSearch to relevant parts of the site #77
  • sitemapindex #79

Configuration

To change the configuration, make a PR against https://github.com/algolia/docsearch-configs/blob/master/configs/mdanalysis.json

The syntax is explained at https://docsearch.algolia.com/docs/config-file/

In order for anything to be indexed it must match one of the selectors

  • levels are mapped to heading tags
  • text is mapped to p, li, and similar tags
  • examine the produced documentation with the Firefox Web Developer Tool or similar to see which CSS elements apply to the content that should be indexed

Example selectors

selectors": {
    "lvl0": "[itemprop='articleBody'] > .section h1, .page h1, .post h1, .body > .section h1",
    "lvl1": "[itemprop='articleBody'] > .section h2, .page h2, .post h2, .body > .section h2",
    "lvl2": "[itemprop='articleBody'] > .section h3, .page h3, .post h3, .body > .section h3",
    "lvl3": "[itemprop='articleBody'] > .section h4, .page h4, .post h4, .body > .section h4",
    "lvl4": "[itemprop='articleBody'] > .section h5, .page h5, .post h5, .body > .section h5",
    "text": "[itemprop='articleBody'] > .section p, .page p, .post p, .body > .section p, [itemprop='articleBody'] > .section li, .page li, .post li, .body > .section li"
  },

Working with sitemaps

When making a PR

Please:

Debugging search (v2)

Run a local version of the scraper that has index submission to algolia disabled (to avoid running in limits for the free plan). For example, install https://github.com/orbeckst/docsearch-scraper/tree/dryrun

Have the config file handy (e.g., by cloning https://github.com/algolia/docsearch-configs).

Run the scraper and check the output

./docsearch run ../docsearch-configs/configs/mdanalysis.json 2>&1 | tee RUN.log
less RUN.log

DocSearch: https://www.mdanalysis.org 0 records) Ignored: from start url https://userguide.mdanalysis.org/stable/index.html Ignored: from start url https://docs.mdanalysis.org/stable/index.html DocSearch: https://www.mdanalysis.org/pages/privacy/ 12 records) DocSearch: https://www.mdanalysis.org/pages/used-by/ 30 records) ... ... DocSearch: https://www.mdanalysis.org/2015/12/15/The_benefit_of_social_coding/ 6 records) DocSearch: https://www.mdanalysis.org/distopia/search.html 0 records) Ignored from sitemap: https://www.mdanalysis.org/distopia/genindex.html Ignored from sitemap: https://www.mdanalysis.org/distopia/index.html DocSearch: https://www.mdanalysis.org/distopia/api/vector_triple.html 0 records) DocSearch: https://www.mdanalysis.org/distopia/api/helper_functions.html 0 records) DocSearch: https://www.mdanalysis.org/distopia/api/distopia.html 0 records) DocSearch: https://www.mdanalysis.org/distopia/building_distopia.html 0 records)


Interpretation of results:
* lines with *N records* where N > 0: this is desired and shows that the scraper collected data records for the index
* lines with *0 records*: the rules do not seem to correctly catch elements on the page for scraping
* *Ignored: from start url*: started scraping by following but then hit a *stop_url*
* *Ignored from sitemap:* : started scraping from sitemap (which is good!) and then hit a *stop_url*
* Missing pages (e.g., nothing on the User Guide): check the sitemap file!!



Clone this wiki locally