Skip to content

Commit

Permalink
docs(zipdownloader): refactor slightly for name and version, crosslink
Browse files Browse the repository at this point in the history
  • Loading branch information
poikilotherm committed Feb 10, 2022
1 parent c91c071 commit 429ec3f
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 33 deletions.
64 changes: 33 additions & 31 deletions doc/sphinx-guides/source/installation/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,43 +68,45 @@ Once a standardized version of you Custom Terms are registered as a license, an
Optional Components
-------------------

.. _zipdownloader:

Standalone "Zipper" Service Tool
++++++++++++++++++++++++++++++++

As of Dataverse Software 5.0 we offer an experimental optimization for the multi-file, download-as-zip functionality. If this option
(``:CustomZipDownloadServiceUrl``) is enabled, instead of enforcing
the size limit on multi-file zipped downloads (as normally specified
by the option ``:ZipDownloadLimit``), we attempt to serve all the
files that the user requested (that they are authorized to download),
but the request is redirected to a standalone zipper service running
as a cgi-bin executable under Apache. Thus moving these potentially
long-running jobs completely outside the Application Server (Payara);
and preventing worker threads from becoming locked serving them. Since
zipping is also a CPU-intensive task, it is possible to have this
service running on a different host system, freeing the cycles on the
main Application Server. (The system running the service needs to have
access to the database as well as to the storage filesystem, and/or S3
bucket).

Please consult the scripts/zipdownload/README.md in the Dataverse Software 5.0+ source tree for more information.

To install: You can follow the instructions in the file above to build
``ZipDownloadService-v1.0.0.jar``. It will also be available, pre-built as part of the Dataverse Software 5.0 release on GitHub. Copy it, together with the shell
script scripts/zipdownload/cgi-bin/zipdownload to the cgi-bin
directory of the chosen Apache server (/var/www/cgi-bin standard).

Make sure the shell script (zipdownload) is executable, and edit it to configure the
database access credentials. Do note that the executable does not need
access to the entire Dataverse installation database. A security-conscious admin
can create a dedicated database user with access to just one table:
``CUSTOMZIPSERVICEREQUEST``.

You may need to make extra Apache configuration changes to make sure /cgi-bin/zipdownload is accessible from the outside.
For example, if this is the same Apache that's in front of your Dataverse installation Payara instance, you will need to add another pass through statement to your configuration:
As of Dataverse Software 5.0 we offer an **experimental** optimization for the multi-file, download-as-zip functionality.
If this option (``:CustomZipDownloadServiceUrl``) is enabled, instead of enforcing the size limit on multi-file zipped
downloads (as normally specified by the option ``:ZipDownloadLimit``), we attempt to serve all the files that the user
requested (that they are authorized to download), but the request is redirected to a standalone zipper service running
as a cgi-bin executable under Apache.

Thus moving these potentially long-running jobs completely outside the Application Server (Payara); and preventing
worker threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have
this service running on a different host system, freeing the cycles on the main Application Server. (The system running
the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket).

Please consult the `README at scripts/zipdownload <https://github.com/IQSS/dataverse/tree/master/scripts/zipdownload>`_
in the Dataverse Software 5.0+ source tree for more information.

To install:

1. Follow the instructions in the file above to build ``zipdownloader-0.0.1.jar``. (Also available from
`zipper.zip <https://github.com/IQSS/dataverse/releases/download/v5.0/zipper.zip>`_ of the
`Dataverse Software 5.0 release on GitHub <https://github.com/IQSS/dataverse/releases/tag/v5.0>`_).
2. Copy it, together with the shell script :download:`cgi-bin/zipdownload <../../../../scripts/zipdownload/cgi-bin/zipdownload>`
to the ``cgi-bin`` directory of the chosen Apache server (/var/www/cgi-bin standard).
3. Make sure the shell script (``zipdownload``) is executable, and edit it to configure the database access credentials.
Do note that the executable does not need access to the entire Dataverse installation database. A security-conscious
admin can create a dedicated database user with access to just one table: ``CUSTOMZIPSERVICEREQUEST``.

You may need to make extra Apache configuration changes to make sure ``/cgi-bin/zipdownload`` is accessible from the outside.
For example, if this is the same Apache that's in front of your Dataverse installation Payara instance, you will need to
add another pass through statement to your configuration:

``ProxyPassMatch ^/cgi-bin/zipdownload !``

Test this by accessing it directly at ``<SERVER URL>/cgi-bin/download``. You should get a ``404 No such download job!``. If instead you are getting an "internal server error", this may be an SELinux issue; try ``setenforce Permissive``. If you are getting a generic Dataverse collection "not found" page, review the ``ProxyPassMatch`` rule you have added.
Test this by accessing it directly at ``<SERVER URL>/cgi-bin/download``. You should get a ``404 No such download job!``.
If instead you are getting an "internal server error", this may be an SELinux issue; try ``setenforce Permissive``.
If you are getting a generic Dataverse collection "not found" page, review the ``ProxyPassMatch`` rule you have added.

To activate in your Dataverse installation::

Expand Down
7 changes: 5 additions & 2 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2321,15 +2321,18 @@ If you don’t want date facets to be sorted chronologically, set:
:CustomZipDownloadServiceUrl
++++++++++++++++++++++++++++

The location of the "Standalone Zipper" service. If this option is specified, the Dataverse installation will be redirecing bulk/mutli-file zip download requests to that location, instead of serving them internally. See the "Advanced" section of the Installation guide for information on how to install the external zipper. (This is still an experimental feature, as of Dataverse Software 5.0).
The location of the "Standalone Zipper" service. If this option is specified, the Dataverse installation will be
redirecing bulk/mutli-file zip download requests to that location, instead of serving them internally.
See :ref:`zipdownloader` of the Advanced Installation guide for information on how to install the external zipper.
(This is still an **experimental** feature, as of Dataverse Software 5.0).

To enable redirects to the zipper installed on the same server as the main Dataverse Software application:

``curl -X PUT -d '/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl``

To enable redirects to the zipper on a different server:

``curl -X PUT -d 'https://zipper.example.edu/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl``
``curl -X PUT -d 'https://zipper.example.edu/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl``

:ArchiverClassName
++++++++++++++++++
Expand Down

0 comments on commit 429ec3f

Please sign in to comment.