Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Things to work on while we are waiting for Pleiades credits #10

Open
6 of 12 tasks
rabernat opened this issue Dec 2, 2022 · 1 comment
Open
6 of 12 tasks

Things to work on while we are waiting for Pleiades credits #10

rabernat opened this issue Dec 2, 2022 · 1 comment

Comments

@rabernat
Copy link
Collaborator

rabernat commented Dec 2, 2022

None of this requires submitting any jobs to the Pleiades queue.

  • Review the OSN user guide (unfortunately not very informative.) Note that we do not have an account to log in to the portal, only bucket credentials.
  • Try to connect to the object storage service at https://mghp.osn.xsede.org/ (Our bucket is cnh-bucket-1)
    • Receive credentials from Ryan via keybase (S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY)
    • Try adapting Ryan's short user guide to read and write data from the bucket
    • Try to use rclone to copy data to / from the bucket
    • Try to use the aws s3 CLI to copy data to / from the bucket
  • Benchmark read performance from OSN with different file types
    • Upload a netcdf file from Pleiades (for a small amount of data, you can do the transfer from the head node--no job submission required)
    • Read back the file from a local python process using xarray and h5py (no kerchunk); benchmark the speed of common queries (report results in MB/s throughput)
    • Upload the kerchunk indexes and use kerchunk + fsspec reference file system for the same queries; compare results
    • Upload an equivalent Zarr and perform the same queries; compare results
  • Try to understand Chris Hill's upload scripts in https://github.com/ocean-transport/pleiades_llc_recipes/tree/master/osn_transfer - These are pretty confusing but ultimately they allow you to transfer files from Pleiades to OSN within a batch job. Let's refactor them to be more clear and understandable. As soon as the credits come online, we can start shipping data.

cc @rsaim

@rsaim
Copy link
Contributor

rsaim commented Dec 16, 2022

Here is how we can access the S3 buckets on OSN in Python.

> sraza2 @ pfe21 ~/pleiades_llc_recipes (master) 20:31:32
$ ipython3
In [1]: import boto3

In [2]: session = boto3.session.Session()

In [3]: client = session.client(service_name='s3', endpoint_url='https://mghp.osn.xsede.org')

In [4]: from pprint import pprint as pp

In [5]: pp(client.list_buckets())
{'Buckets': [{'CreationDate': datetime.datetime(2021, 2, 21, 17, 29, 43, 614000, tzinfo=tzutc()),
              'Name': 'cnh-bucket-2'},
             {'CreationDate': datetime.datetime(2021, 1, 14, 1, 22, 48, 485000, tzinfo=tzutc()),
              'Name': 'llc4320_tests'}],
 'Owner': {'DisplayName': 'cnh-bucket-1 Datamanager',
           'ID': 'cnh-bucket-1_datamanager'},
 'ResponseMetadata': {'HTTPHeaders': {'content-type': 'application/xml',
                                      'date': 'Fri, 16 Dec 2022 04:32:31 GMT',
                                      'transfer-encoding': 'chunked',
                                      'x-amz-request-id': 'tx00000a5f4fbcc55ace462-00639bf4df-2991e0-default'},
                      'HTTPStatusCode': 200,
                      'HostId': '',
                      'RequestId': 'tx00000a5f4fbcc55ace462-00639bf4df-2991e0-default',
                      'RetryAttempts': 0}}

Notes:

  • The above fails on singularity running on interactive nodes like r483i2n7 as we don't have network access from the such node.
  • I can see the bucket cnh-bucket-2 but not cnh-bucket-1. Do we need to create it? @rabernat

Try to understand Chris Hill's upload scripts in https://github.com/oceantransport/pleiades_llc_recipes/tree/master/osn_transfer

These scripts creates commands usingrclone and launches them pfe using ssh.
Example command created by transfer_to_osn.sh:

$ bash osn_transfer/transfer_to_osn.sh testfile.nc
Transferring "testfile.nc" to "s3://mghp.osn.xsede.org//testfile.nc".
#!/bin/bash
hostname
export RCLONE_S3_ACCESS_KEY_ID=SET_ACCESS_KEY 
export RCLONE_S3_SECRET_ACCESS_KEY=SET_SECRET_KEY 
~cnhill1/bin/rclone   --s3-no-check-bucket --multi-thread-streams 24 --s3-upload-concurrency 12 --s3-chunk-size 100M --transfers 8 --log-file=mylogfile.txt --log-level INFO   --s3-endpoint https://mghp.osn.xsede.org   copyto testfile.nc :s3:///testfile.nc.QUG3xQ3

I guess we can do equivalent things in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants