-
-
Notifications
You must be signed in to change notification settings - Fork 0
Add new Kiwix download mirror
Kiwix has many mirrors available to ease retrieval of ZIMs, nightly builds and other artifacts (see https://download.kiwix.org/README for details).
Current list of mirrors is available at https://download.kiwix.org/mirrors.html
MirrorBrain is used to manage these mirrors.
This document describes how to add a new mirror to the list of existing ones.
It does not describes how to create a mirror, only how to add it to this list.
Following info is needed from the mirror owner in order to add a new mirror:
- operator name + URL (who has to be credited for making this mirror available)
- URLs:
- rsync: mandatory to allow MirrorBrain to check mirror status (but could be opened only to our IP)
- http and/or ftp: must be public (sic)
- location (country code + continent code, could be inferred from server IP) of the mirror
- admin email address + name (to contact in case of issue)
Before configuring anything, confirm that:
- you can communicate with the admin email
- operator URL is working
- rsync URL is OK :
rsync -avn rsync://xxxx
- HTTP/FTP URLs are working
Internally at Kiwix, we have to decide on the score (reversed priority, mirror with lower score have less priority) we give to the mirror. The following rule of thumb is used:
- 100: servers with limited bandwidth (based on some tests on our side; we prefer to give very low priority to slow mirrors to avoid users raising issues for something we have no control over)
- 500: Kiwix master mirror (since it is used by other mirrors to retrieved data, we prefer that end-users do not rely on this one)
- 3000: good mirrors
- 5000: very good mirrors
We first have to add the mirror to MB database.
Open a shell on the apache
container of the mirrorbrain-web-deployment
pod located in the zim
namespace.
Install vim
: apt update && apt install -y vim
Some useful commands:
-
mb list
=> show the list of configured mirror identifiers -
mb show <identifier>
=> show one mirror configuration -
mb edit <identifier>
=> edit one mirror configuration
You first need to decide the mirror identifier ; usually, we use the mirror hostname as identifier.
Then, you should create the new mirror with mb new
(spoiler: read till the end, this won't work):
mb new <identifier> -c <country_code> -r <continent_code> -H <http_url> -R <rsync_url> -e <admin_email> -a <admin_name> --operator-url=<operator_url> --operator-name=<operator_name>
E.g.
mb new mirror-sites-fr.mblibrary.info -c FR -r EU -H https://mirror-sites-fr.mblibrary.info/mirror-sites/download.kiwix.org/ -R rsync://mirror-sites-fr.mblibrary.info/download.kiwix.org/ -e [email protected] -a "Dr. Mamdouh Barakat" --operator-url="https://www.mbgroup.global/" --operator-name="MB Group"
Unfortunately, this command is broken at the stage where it tries to retrieve country code + continent code + coordinates based on mirror IP. You could launch the command above to detect the Python file which has to be manually updated to set dummy infos instead of retrieving them from the GeoIP databases (/usr/local/lib/python2.7/dist-packages/mb/geoip.py
normally).
Update this file manually to return dummy values ("fr", "eu", and "(0.000,0.000)" below):
-
lookup_country_code
function willreturn "fr"
-
lookup_region_code
function willreturn "eu"
-
lookup_coordinates
function willreturn (0.000,0.000)
Launch the command mb new ...
as expected before (no need to pass -c
and -r
args theoretically since they are overridden ... but they are mandatory ...).
Edit the configuration with mb edit <identifier>
to:
- set
country
real value (must be lower-case) - set
region
real value (continent, must be lower-case) - set
enabled
toTrue
- set
statusBaseurl
toTrue
Theoretically, you then have to run mb test
to confirm mirror configuration but this command fails with HTTPS mirror.
We have a cronjob mb-update-db
to update MirrorBrain DB with latest mirrors status multiple times per day.
This cronjob uses the following script: https://github.com/kiwix/container-images/blob/main/mirrorbrain/bin/update_mirrorbrain_db.sh
This script must be updated to scan new mirrors (either for ALLDIRS if all directories are mirrored, or ZIMDIRS/WMDIRS if only a portion of the data is mirrored).
Beware that adding new mirrors will increase the cronjob duration which might need to be adapted (to be discussed, do not worry too much, the cronjob configuration avoids two jobs parallel execution, since scanMirror
operations cannot be run in parallel).
Push your modifications to the main
branch and wait for CI completion (to rebuild MB image).
Relaunch the mirrorbrain-web-deployment
pod to use the new latest
image (this will in addition discard any local modifications you've made, which is pretty good).
This is mandatory because the cronjobs are not pulling the image (imagePullPolicy
is IfNotPresent
), only mirrorbrain-web-deployment
is always pulling the new image (imagePullPolicy
is Always
). This is done on purpose to ensure that all pods are using the same image since they are all running on the same node.
Wait for a full run of the mb-update-db
cronjob after image update ; check logs for info regarding the scan of new mirror.
Once scan is complete, check the list of mirror for ~ 3 files files in various folders (e.g. zims, nightly builds, ...). For every file:
- select a URL on
https://mirror.download.kiwix.org/
(e.g. https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_mini_2023-07.zim) - append
.mirrorlist
to URL and removemirror.
suffix from hostname and open this new URL (e.g https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_mini_2023-07.zim.mirrorlist). This URL displays info about this single file including the list of mirrors where it is available - check that your new mirror is present in the list (at the bottom of the page)
Check that https://mirror.download.kiwix.org/mirrors.html
includes your mirror (beware that it will take time for this file to be mirrored to other mirrors)
By default, people are using rsync on download.kiwix.org
as mirroring source, but this mirror allows only 3 concurrent connections since we do not want to be overwhelmed.
For official mirrors, we open access to a private mirror master.download.kiwix.org
as mirroring source, whitelisted based on the target mirror IP, with a reserved seat for every official mirror.
To open access to the new official mirror, edit the file https://github.com/kiwix/k8s/blob/main/zim/rsyncd/rsyncd.yaml:
- Locate the
master.download.kiwix.org
section (https://github.com/kiwix/k8s/blob/fb48b67a4eb4498566471a711d2898e8eb84a042/zim/rsyncd/rsyncd.yaml#L34) - Increase by 1 the number of
max connection
(https://github.com/kiwix/k8s/blob/fb48b67a4eb4498566471a711d2898e8eb84a042/zim/rsyncd/rsyncd.yaml#L37) - Add the hostname + IPv4 + IPv6 (if available) to the
hosts allow
list (https://github.com/kiwix/k8s/blob/fb48b67a4eb4498566471a711d2898e8eb84a042/zim/rsyncd/rsyncd.yaml#L40) - Add documentation comment about the contact info for this mirror (https://github.com/kiwix/k8s/blob/fb48b67a4eb4498566471a711d2898e8eb84a042/zim/rsyncd/rsyncd.yaml#L51) ; the 'X' tick has to be put only once we have got confirmation from the mirror owner that he has switched to
master.download.kiwix.org
See https://github.com/kiwix/k8s/commit/fb48b67a4eb4498566471a711d2898e8eb84a042 for a sample change.
Deploy this change manually, first shutting down rsyncd and then starting it again:
- set the number of
replicas
to 0 for the deployment (https://github.com/kiwix/k8s/blob/fb48b67a4eb4498566471a711d2898e8eb84a042/zim/rsyncd/rsyncd.yaml#L86) ; this is mandatory because rsyncd is using a node port so we cannot have a container terminating and another one creating at the same time) - apply the file : kubectl apply -f zim/rsyncd/rsyncd.yaml (this will also update the configmap with the configuration changes made above)
- set the number of
replicas
to 1 for the deployment - apply the file : kubectl apply -f zim/rsyncd/rsyncd.yaml
Do not forget to push your changes to Github.
You then have to inform the mirror owner that he has been granted access to master.download.kiwix.org
:
Dear XXX,
I am pleased to inform you that your mirror is now part of the Kiwix downloads load-balancer. You should already be getting some traffic.
As mentioned before, official mirrors gets a reserved slot on the rsync server (anonymous is limited and frequently clogged). You are thus invited to change your rsync conf to point to the master.download.kiwix.org
module instead of download.kiwix.org
. This module will only work from your IP (xx.xxx.xxx.xxx).
From:
rsync -vzrlptD --delete master.download.kiwix.org::download.kiwix.org/zim/ ./zim/
To:
rsync -vzrlptD --delete master.download.kiwix.org::master.download.kiwix.org/zim/ ./zim/
Please let us know once everything is fine on your end and please do not hesitate to contact us should you notice any unexpected behavior.
All the best,