You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Examples of where mc du values are greater than - or less than - values on filesystem as recorded in Ruth's CSV file.
Reading CSV file using:
sizes = {}
with open("cmip6-datasets_2020-10-27.csv") as f:
reader = csv.DictReader(f)
for row in reader:
sizes[row["dataset_id"]] = float(row[" size_mb"])
example where: mc du gives smaller size
mc du s3/CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
1.2GiB CMIP6.DAMIP.CAS.FGOALS-g3/hist-GHG.r1i1p1f1.Amon.hus.gn.v20200411.zarr
mc du s3/CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
2.3GiB CMIP6.CMIP.NIMS-KMA.UKESM1-0-LL/historical.r13i1p1f2.Amon.va.gn.v20200205.zarr
~/mc-du-s3.out on sci contains sizes of all the datasets as seen with mc du (look for lines ending .zarr). For each of these, may want to compare with the CSV file.
Lines can be parsed with e.g.
import re
units = {"B": 1, "KiB": 2**10, "MiB": 2**20, "GiB": 2**30, "TiB": 2**40}
pattern = '([0-9.]+)([A-Za-z]+)\s+(.*).zarr$'
for line in ........:
m = re.match(pattern,line)
if m:
size = float(m.group(1)) * units[m.group(2)]
dataset_id = m.group(3)
# now compare with sizes[dataset_id] as shown above...
But this is about understanding mc du. It might be that the metrics shown at the above quoted URL do what we want and then maybe we don't care about size values from mc du / mc ls.
Docs at https://caringo.atlassian.net/wiki/spaces/public/pages/2443817185/Content+Metering
For an example, see Alan H's message on the ops channel on Slack (15:10, 24 June).
The text was updated successfully, but these errors were encountered: