Improve the README #1983

kelson42 · 2024-02-02T12:14:09Z

Some general remarks:
    - why layout is gibberish in terms of line returns ? (maybe my fault)
    - "per default" looks like a frenchy translation, "by default" is probably more appropriate
    - other comments are inline in bold italic underline

> docker run --rm -it ghcr.io/openzim/mwoffliner:1.13.0 mwoffliner --help
starting redis-server in the background…
Create a fancy HTML dump of a Mediawiki instance in a ZIM file
Why "fancy"? Why "HTML dump"? 
  Usage: mwofflin
er
  Example, as a system tool:
  mwoffliner --mwUrl=https://en.wikipedia.org/ -
[email protected]
  Or, as a node script:
  node mwoffliner.js --mwUrl=htt
ps://en.wikipedia.org/ [email protected]
  Or, as a npm script: '
  npm r
un mwoffliner -- --mwUrl=https://en.wikipedia.org/ [email protected]

Options:
  --version                   Show version number                      [boolean]
  --help                      Show help                                [boolean]
  --mwUrl                     Mediawiki base URL.                     [required]
More precision would be welcomed (should I include the /wiki/ ?)
  --adminEmail                Email of the mwoffliner user which will be put in
                              the HTTP user-agent string              [required]
What is it used for? Does the user needs to exists in the Mediawiki instance?
  --articleList               List of articles to include. Can be a comma sepera
seperated => separated (typo)
                              ted list of titles or a local path or http(s) URL
                              to a file with one title (in UTF8) per line
  --articleListToIgnore       List of articles to ignore. Can be a comma seperat
seperated => separated (typo)
                              ed list of titles or a local path or http(s) URL t
                              o a file with one title (in UTF8) per line
  --customZimFavicon          Use this option to give a path to a PNG favicon, i
                              t will be used in place of the Mediawiki logo. Thi
                              s can be a local path or an HTTP(S) url
Favicons are not used anymore if I'm not mistaken, is it used as an illustration? What is the expected resolution? Is it automatically scaled to fill both resolutions (48 and 96)?
  --customZimTitle            Allow to configure a custom ZIM file title.
  --customZimDescription      Allow to configure a custom ZIM file description.
                              Max length is 80 chars.
  --customZimLongDescription  Allow to configure a custom ZIM file long descript
                              ion. Max length is 4000 chars.
  --customZimTags             Allow to configure custom ZIM file tags (semi-colo
                              n separated).
  --customZimLanguage         Allow to configure a custom ISO639-3 content langu
                              age code.
  --customMainPage            Allow to configure a custom page as welcome page.
  --filenamePrefix            For the part of the ZIM filename which is before t
                              he format & date parts.
  --format                    Specify a flavour for the scraping. If missing, sc
                              rape all article contents. Each --format argument
                              will cause a new local file to be created but opti
                              ons can be combined. Supported options are:
                               * nov
                              id: no video & audio content
                               * nopic: no pictures
                               (implies "novid")
                               * nopdf: no PDF files
                               * nodet
                              : only the first/head paragraph (implies "novid")

                              Format names can also be aliased using a ":"
                              Examp
                              le: "... --format=nopic:mini --format=novid,nopdf"
What is the format alias used for?
  --keepEmptyParagraphs       Keep all paragraphs, even empty ones.
  --mwWikiPath                Mediawiki wiki base path (per default "/wiki/")
  --mwApiPath                 Mediawiki API path (per default "/w/api.php")
  --mwRestApiPath             Mediawiki Rest API path (per default "/api/rest_v1
                              ")
Rest => REST
  --mwModulePath              Mediawiki module load path (per default "/w/load.p
                              hp")
Are we speaking about https://www.mediawiki.org/wiki/Manual:Load.php? If yes, I would suggest to rename to "Mediawiki ResourceLoader path"
  --mwDomain                  Mediawiki user domain (thought for private wikis)
  --mwUsername                Mediawiki username (thought for private wikis)
  --mwPassword                Mediawiki user password (thought for private wikis
                              )
  --minifyHtml                Try to reduce the size of the HTML
  --outputDirectory           Directory to write the downloaded content
  --publisher                 ZIM publisher meta data, per default 'Kiwix'
  --redis                     Redis path (redis:// URL or path to UNIX socket)
  --requestTimeout            Request timeout - in seconds(default value is 120
                              seconds)
  --resume                    Do not overwrite if ZIM file already created
It is not clear, will it restart from the last article processed?
  --speed                     Multiplicator for the number of parallel HTTP requ
                              ests on Parsoid backend (per default the number of
                               CPU cores). The default value is 1.
  --verbose                   Print information to the stdout if the level is "i
                              nfo" or "log", and to the stderr, if the level is
                              warn or error. The option can be empty or one of "
                              info", "log", "warn", "error", or "quiet". Option
                              with an empty value is equal to "info".The default
                               level is "error". If you choose the lower level t
                              hen you will see messages also from the more high
                              levels. For example, if you use warn then you will
                               see warnings and errors.
I absolutely don't get what goes to stdout, what goes to stderr, what is the default ; and why is it named "verbose", usually such flags are booleans, here we can set a value as well?
  --withoutZimFullTextIndex   Don't include a fulltext search index to the ZIM
  --webp                      Convert all jpeg, png and gif images to webp forma
                              t
  --addNamespaces             Force additional namespace (comma separated number
                              s)
  --getCategories             [WIP] Download category pages
What does "WIP" means (i.e. what works and what is not working)
  --osTmpDir                  Override default operating system temporary direct
                              ory path environment variable
  --customFlavour             A custom processor that can filter and process art
                              icles (see extensions/*.js)
It should be a path to the custom processor JS? (not clear)
  --optimisationCacheUrl      S3 url, including credentials and bucket name
Not clear, you should precise this is a cache in the description as well, something like "S3 url to a bucket under which the scraper will cache i_dont_know_what ; the url must include credentials (keyId and secretAccessKey) as well as bucket name, e.g. https://s3.myprovider.com/?keyId=THISISAKEYID&secretAccessKey=THISISASECRETKEY&bucketName=this-is-my-bucket

The text was updated successfully, but these errors were encountered:

kelson42 added bug enhancement labels Feb 2, 2024

kelson42 added this to the 1.14.0 milestone Feb 2, 2024

kelson42 self-assigned this Feb 2, 2024

kelson42 mentioned this issue Feb 23, 2024

Better usage() #1996

Merged

kelson42 closed this as completed in #1996 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the README #1983

Improve the README #1983

kelson42 commented Feb 2, 2024 •

edited

Loading

Improve the README #1983

Improve the README #1983

Comments

kelson42 commented Feb 2, 2024 • edited Loading

kelson42 commented Feb 2, 2024 •

edited

Loading