Skip to content

The data store to access datasets from the CLMS API

License

Notifications You must be signed in to change notification settings

xcube-dev/xcube-clms

Repository files navigation

xcube-clms

Unittest xcube-clms Codecov xcube-clms Code style: black License

The xcube-clms Python package provides an xcube data store that enables access to datasets hosted by the Copernicus Land Monitoring Service (CLMS). The data store is called "clms" and implemented as an xcube plugin. It uses the CLMS API under the hood.

Setup

Installing the xcube-clms plugin from the repository

To install xcube-clms directly from the git repository, clone the repository, cd into xcube-clms, and follow the steps below:

conda env create -f environment.yml
conda activate xcube-clms
pip install .

This sets up a new conda environment, installs all the dependencies required for xcube-clms, and then installs xcube-clms directly from the repository into the environment.

Create credentials to access the CLMS API

Create the credentials as a json file required for the CLMS API following the documentation. The credentials will be required during the initialization of the CLMS data store. Please follow the instructions in the example/notebooks/CLMSDataStoreTutorial.ipynb, on how to pass the credentials from the json file to the store.

Testing

To run the unit test suite:

pytest

Additional Notes about the data store

This data store introduces the initial mechanism of preloading data, including cache management, downloading, and file processing. This uses the experimental Preload API from the xcube data store.

This new addition of a preload interface is due to the nature of the CLMS API which allows the user to create data requests, with undetermined time to wait in the queue for the request to be processed, followed by downloading zip files, unzipping them, extracting them in a cache and processing them which can be then finally opened using a cache data store. The default is file data store stored at /clms_cache location in your cwd, but the users are free to choose their data store of their liking.

Preloading allows the data store to request the datasets for download to the CLMS API in both blocking/non-blocking way which handles sending the download request, queueing for download, waiting in the queue, periodically checking for the request status, downloading the data, extracting and post-processing it.

The preload mechanism can be used using .preload_data(*data_ids, **preload_params) on the CLMS data store instance.

The following classes (components) are responsible for this mechanism:

Clms

  • Serves as the main interface to interact with the CLMS API. This class coordinates with the ClmsPreloadHandle class to preload the data into a cache data store.

DownloadTaskManager

  • Handles the download process, including managing download requests and checking their statuses.
  • Retrieves task statuses based on dataset and file IDs or task IDs, determining whether the download is pending, completed, or cancelled.
  • Initiates data downloads in chunks and manages zip file extraction, looking specifically for geo data. Definition of geo data is defined in the function docstring in the notes.

ClmsApiTokenHandler

  • Handles the creation and refreshing of the CLMS API token given the credentials which can be obtained following the steps here

FileProcessor

  • Handles the processing of downloaded data, extracting, stacking and storing geo files from downloaded zip files.

ClmsPreloadHandle

  • The main class responsible for orchestrating the preloading of datasets.
  • It coordinates with DownloadTaskManager, ClmsApiTokenHandler and FileProcessor classes to handle the complete process of caching, data downloading, making sure token is valid and processing of downloaded data.

CLMS API

  • Requires an EU account to register on the CLMS site.
  • Once registered, the user should create an access token json file as described here

CLMS API issues

This API has some problems as listed below

  • The datasets which are made available via requests, contain a download link to a zip file, which is valid only for 3 days. But we found that this is not true and we cannot rely on this time to make sure that the download link still works. So, we have to create a workaround to manage our own expiry times. This issue has been raised with the CLMS service desk. Quoting their reply For the first issue mentioned by you: The status is completed and there is indicated that there are 2 days for expiring, but the download link is already expired, we are going to investigate this bug.
  • We use the API to figure out if a certain data_id has already been requested to the CLMS server and its status so that we can get the download link directly or if it has not been requested yet or expired, we request it. But this is also not possible because although on their web UI, we cannot see the old downloads that have expired, the API does return the expired requests which were completed and do not contain any information that they are expired or when they will expire. Quoting the CLMS helpdesk replies For the second issue mentioned by you: the @datarequest_search endpoint does not seem to be working as expected, we are going to consult the API experts so to check its functioning and in case an improvement is feasible in our side, we´ll let you know. and its follow up after a week After having analysed the possibility to improve the status of the downloads, our team answers the following: Currently, our download system is not able to extract information on whether the link has expired or not, therefore our API does not provide this information.. Due to this, we had to create workarounds to figure out if a certain dataset's link was expired or not.
  • The cancel endpoint for the API does not work and the issue was raised with the helpdesk team as well. Quoting their reply Recently a new firewall of the CLMS Portal machine has been setup. This new firewall is blocking some of the process cancelation request. We've detected the issue and working with the IT team to solve it.

About

The data store to access datasets from the CLMS API

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages