Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try to understand metering on the object store #56

Open
alaniwi opened this issue Jun 24, 2021 · 2 comments
Open

try to understand metering on the object store #56

alaniwi opened this issue Jun 24, 2021 · 2 comments
Assignees

Comments

@alaniwi
Copy link
Contributor

alaniwi commented Jun 24, 2021

Docs at https://caringo.atlassian.net/wiki/spaces/public/pages/2443817185/Content+Metering

For an example, see Alan H's message on the ops channel on Slack (15:10, 24 June).

@alaniwi alaniwi self-assigned this Jun 24, 2021
@alaniwi
Copy link
Contributor Author

alaniwi commented Jun 24, 2021

Examples of where mc du values are greater than - or less than - values on filesystem as recorded in Ruth's CSV file.

Reading CSV file using:

sizes = {}
with open("cmip6-datasets_2020-10-27.csv") as f:
    reader = csv.DictReader(f)
    for row in reader:
        sizes[row["dataset_id"]] = float(row[" size_mb"])
  • example where: mc du gives smaller size
mc du s3/CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
1.2GiB	CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
>>> sizes["CMIP6.DAMIP.CAS.FGOALS-g3.hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411"]
1753.73
  • example where: mc du gives larger size
mc du s3/CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
2.3GiB	CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
>>> sizes["CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL.historical.r13i1p1f2.Amon.va.gn.v20200205"]
1962.33

@alaniwi
Copy link
Contributor Author

alaniwi commented Jun 24, 2021

~/mc-du-s3.out on sci contains sizes of all the datasets as seen with mc du (look for lines ending .zarr). For each of these, may want to compare with the CSV file.

Lines can be parsed with e.g.

import re
units = {"B": 1, "KiB": 2**10, "MiB": 2**20, "GiB": 2**30, "TiB": 2**40}
pattern = '([0-9.]+)([A-Za-z]+)\s+(.*).zarr$'

for line in ........:
    m = re.match(pattern,line)
    if m:
         size = float(m.group(1)) * units[m.group(2)]
         dataset_id = m.group(3)
         # now compare with sizes[dataset_id] as shown above...

But this is about understanding mc du. It might be that the metrics shown at the above quoted URL do what we want and then maybe we don't care about size values from mc du / mc ls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant