-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consolidated three EOBS recipe in meta recipe #169
consolidated three EOBS recipe in meta recipe #169
Conversation
When I tried to load You should be able to replicate this error yourself. First make sure you're in the root of your cloned import yaml # note: you may need to `pip install yaml` first
with open("recipes/EOBS/meta.yaml", "r") as f:
yaml.safe_load(f) Please correct |
Hey there @cisaacstern. Any idea how to fix this pre-commit issue? Also, I think this is ready for a test run. |
Thanks for checking in here, @norlandrhagen. The pre-commit issue is being tracked in #88. It's on my radar to address soon, but we can ignore here for now. Re: test run, yes, I'll do that now. 😄 |
/run eobs-tg-tn-tx-rr-hu-pp |
/run eobs-wind-speed |
/run eobs-surface-downwelling |
Awesome thanks! |
I'm confused about what happened to these test runs. If they failed to submit to Dataflow, the bot should have added a 😕 emoji reaction to the comments above. And if they failed on Dataflow, we should have gotten a new comment back about that. 🤔 This will require some digging in the logs to get to the bottom of. I'll try to do that this afternoon. Feel free to ping me here next week if I haven't gotten back with a response yet. |
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
you bet, @norlandrhagen! |
@andersy005, similar to my confusion in #169 (comment), I do not see jobs for these recipe runs on the Dataflow dashboard. I am wondering if, in light of pangeo-forge/pangeo-forge-orchestrator#147, the issue is that starting these tests in such close succession to each other is spiking memory on the backend, and causing their submissions to fail. I'm not done with pangeo-forge/pangeo-forge-orchestrator#143 yet, so actually "proving" this in the logs is difficult, but I'll try an experiment now, of just deploying one of these, and watching the logs in my terminal in real time... |
/run eobs-wind-speed |
recipes/EOBS/meta.yaml
Outdated
object: 'recipe:eobs-tg-tn-tx-rr-hu-pp' | ||
- id: eobs-wind-speed | ||
object: 'recipe:eobs_wind_speed' | ||
- id: eobs-surface-downwelling | ||
object: 'recipe:eobs_surface_downwelling' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@norlandrhagen & @andersy005 the problem is that these object
s defined in the meta.yaml
don't exist in the recipe.py
.
Raphael, I tried to figure out which of the recipes
tg_tn_tx_rr_hu_pp_recipe = XarrayZarrRecipe(
file_pattern=tg_tn_tx_rr_hu_pp_pattern, target_chunks=target_chunks, subset_inputs=subset_inputs
)
qq_pattern_recipe = XarrayZarrRecipe(
file_pattern=qq_pattern, target_chunks=target_chunks, subset_inputs=subset_inputs
)
fg_pattern_recipe = XarrayZarrRecipe(
file_pattern=fg_pattern, target_chunks=target_chunks, subset_inputs=subset_inputs
)
correspond to each of these objects, but I couldn't figure it out based on the names alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that these objects don't exist in the recipe.py
i presume you got this information from dataflow logs, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andersy005, actually no. This is an important debugging trick, so I'll describe it here, but then also an open an issue elsewhere to track document it.
So what I did was watch the server logs right after /run
ing the test. And I saw:
2022-09-29T02:02:56.875589+00:00 app[web.1]: 2022-09-29 02:02:56,875 DEBUG - orchestrator - Dumping bakery config to json: {'Bake': {'bakery_class': 'pangeo_forge_runner.bakery.dataflow.DataflowBakery', 'job_name': 'a6170692e70616e67656f2d666f7267652e6f7267251191'}, 'TargetStorage': {'fsspec_class': 's3fs.S3FileSystem', 'fsspec_args': {'client_kwargs': {'endpoint_url': 'https://ncsa.osn.xsede.org'}, 'default_cache_type': 'none', 'default_fill_cache': False, 'use_listings_cache': False, 'key': SecretStr('**********'), 'secret': SecretStr('**********')}, 'root_path': 'Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1191/eobs-wind-speed.zarr', 'public_url': 'https://ncsa.osn.xsede.org/{root_path}'}, 'InputCacheStorage': {'fsspec_class': 'gcsfs.GCSFileSystem', 'fsspec_args': {'bucket': 'pangeo-forge-prod-cache'}, 'root_path': 'pangeo-forge-prod-cache'}, 'MetadataCacheStorage': {'fsspec_class': 'gcsfs.GCSFileSystem', 'fsspec_args': {}, 'root_path': 'pangeo-forge-prod-cache/metadata/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1191/eobs-wind-speed.zarr'}, 'DataflowBakery': {'temp_gcs_location': 'gs://pangeo-forge-prod-dataflow/temp'}}
2022-09-29T02:02:56.877275+00:00 app[web.1]: 2022-09-29 02:02:56,877 DEBUG - orchestrator - Running command: ['pangeo-forge-runner', 'bake', '--repo=https://github.com/norlandrhagen/staged-recipes', '--ref=be1d60de2f3b9aed5fe480328b9d60e1ef0694ef', '--json', '--prune', '--Bake.recipe_id=eobs-wind-speed', '-f=/tmp/tmplol5klds.json', '--feedstock-subdir=recipes/EOBS']
2022-09-29T02:02:58.116682+00:00 heroku[web.1]: Process running mem=647M(120.0%)
2022-09-29T02:02:58.143937+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
2022-09-29T02:03:00.449388+00:00 app[web.1]: [2022-09-29 02:03:00 +0000] [60] [ERROR] Exception in ASGI application
2022-09-29T02:03:00.449397+00:00 app[web.1]: Traceback (most recent call last):
2022-09-29T02:03:00.449398+00:00 app[web.1]: File "/opt/app/pangeo_forge_orchestrator/routers/github_app.py", line 621, in run
2022-09-29T02:03:00.449398+00:00 app[web.1]: out = subprocess.check_output(cmd)
2022-09-29T02:03:00.449399+00:00 app[web.1]: File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
2022-09-29T02:03:00.449399+00:00 app[web.1]: return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2022-09-29T02:03:00.449400+00:00 app[web.1]: File "/usr/lib/python3.9/subprocess.py", line 528, in run
2022-09-29T02:03:00.449400+00:00 app[web.1]: raise CalledProcessError(retcode, process.args,
2022-09-29T02:03:00.449414+00:00 app[web.1]: subprocess.CalledProcessError: Command '['pangeo-forge-runner', 'bake', '--repo=https://github.com/norlandrhagen/staged-recipes', '--ref=be1d60de2f3b9aed5fe480328b9d60e1ef0694ef', '--json', '--prune', '--Bake.recipe_id=eobs-wind-speed', '-f=/tmp/tmplol5klds.json', '--feedstock-subdir=recipes/EOBS']' returned non-zero exit status 1.
But I didn't get much detail on what the called process error issue actually was. So I copy-and-pasted the config in the first log line
2022-09-29T02:02:56.875589+00:00 app[web.1]: 2022-09-29 02:02:56,875 DEBUG - orchestrator - Dumping bakery config to json: {'Bake': {'bakery_class': 'pangeo_forge_runner.bakery.dataflow.DataflowBakery', 'job_name': 'a6170692e70616e67656f2d666f7267652e6f7267251191'}, 'TargetStorage': {'fsspec_class': 's3fs.S3FileSystem', 'fsspec_args': {'client_kwargs': {'endpoint_url': 'https://ncsa.osn.xsede.org'}, 'default_cache_type': 'none', 'default_fill_cache': False, 'use_listings_cache': False, 'key': SecretStr('**********'), 'secret': SecretStr('**********')}, 'root_path': 'Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1191/eobs-wind-speed.zarr', 'public_url': 'https://ncsa.osn.xsede.org/{root_path}'}, 'InputCacheStorage': {'fsspec_class': 'gcsfs.GCSFileSystem', 'fsspec_args': {'bucket': 'pangeo-forge-prod-cache'}, 'root_path': 'pangeo-forge-prod-cache'}, 'MetadataCacheStorage': {'fsspec_class': 'gcsfs.GCSFileSystem', 'fsspec_args': {}, 'root_path': 'pangeo-forge-prod-cache/metadata/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1191/eobs-wind-speed.zarr'}, 'DataflowBakery': {'temp_gcs_location': 'gs://pangeo-forge-prod-dataflow/temp'}}
into a local JSON file on my laptop
{"Bake": {"bakery_class": "pangeo_forge_runner.bakery.dataflow.DataflowBakery", "job_name": "a6170692e70616e67656f2d666f7267652e6f7267251191"}, "TargetStorage": {"fsspec_class": "s3fs.S3FileSystem", "fsspec_args": {"client_kwargs": {"endpoint_url": "https://ncsa.osn.xsede.org"}, "default_cache_type": "none", "default_fill_cache": false, "use_listings_cache": false, "key": "**********", "secret": "**********"}, "root_path": "Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1191/eobs-wind-speed.zarr", "public_url": "https://ncsa.osn.xsede.org/{root_path}"}, "InputCacheStorage": {"fsspec_class": "gcsfs.GCSFileSystem", "fsspec_args": {"bucket": "pangeo-forge-prod-cache"}, "root_path": "pangeo-forge-prod-cache"}, "MetadataCacheStorage": {"fsspec_class": "gcsfs.GCSFileSystem", "fsspec_args": {}, "root_path": "pangeo-forge-prod-cache/metadata/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1191/eobs-wind-speed.zarr"}, "DataflowBakery": {"temp_gcs_location": "gs://pangeo-forge-prod-dataflow/temp"}}
and then I copied the pangeo-forge-runner
command from the second log line
2022-09-29T02:02:56.877275+00:00 app[web.1]: 2022-09-29 02:02:56,877 DEBUG - orchestrator - Running command: ['pangeo-forge-runner', 'bake', '--repo=https://github.com/norlandrhagen/staged-recipes', '--ref=be1d60de2f3b9aed5fe480328b9d60e1ef0694ef', '--json', '--prune', '--Bake.recipe_id=eobs-wind-speed', '-f=/tmp/tmplol5klds.json', '--feedstock-subdir=recipes/EOBS']
and, replacing '-f=/tmp/tmplol5klds.json'
with the path to my local JSON config (c.json
), ran
$ pangeo-forge-runner bake --repo=https://github.com/norlandrhagen/staged-recipes --ref=be1d60de2f3b9aed5fe480328b9d60e1ef0694ef --json --prune --Bake.recipe_id=eobs-wind-speed -f=c.json --feedstock-subdir=recipes/EOBS
which gave me a descriptive error
{"message": "Error during running: 'eobs-tg-tn-tx-rr-hu-pp'", "exc_info": "Traceback (most recent call last):\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/bin/pangeo-forge-runner\", line 8, in <module>\n sys.exit(main())\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/lib/python3.9/site-packages/pangeo_forge_runner/cli.py\", line 28, in main\n app.start()\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/lib/python3.9/site-packages/pangeo_forge_runner/cli.py\", line 23, in start\n super().start()\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/lib/python3.9/site-packages/traitlets/config/application.py\", line 462, in start\n return self.subapp.start()\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/lib/python3.9/site-packages/pangeo_forge_runner/commands/bake.py\", line 112, in start\n recipes = feedstock.parse_recipes()\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/lib/python3.9/site-packages/pangeo_forge_runner/feedstock.py\", line 55, in parse_recipes\n recipes[r[\"id\"]] = self._import(r[\"object\"])\n File \"/Users/charlesstern/miniconda3/envs/pfo-new/lib/python3.9/site-packages/pangeo_forge_runner/feedstock.py\", line 43, in _import\n return self._import_cache[module][export]\nKeyError: 'eobs-tg-tn-tx-rr-hu-pp'", "status": "failed"}
For the record, my theory in #169 (comment) about memory overruns was incorrect. I believe the problem is #169 (comment) |
thank you for the heads up about potential memory spikes and tracking this issue, @cisaacstern! |
@cisaacstern Thanks for catching that issue. I updated the recipe, so that each id in the meta.yaml should now correspond to the correct recipe. |
/run eobs-tg-tn-tx-rr-hu-pp |
The test failed, but I'm sure we can find out why! Pangeo Forge maintainers are working diligently to provide public logs for contributors. |
@andersy005, rather than manually grab these error logs myself, I'm going to try to add a dataflow logs endpoint to the orchestrator right now, so that you can be more self-sufficient in diagnosing these issues going forward. I'll check back here once it's ready! |
that would be awesome, @cisaacstern. I haven't had a chance to do a deep dive into the dataflow functionality. Ping me if you need any feedback/review. |
/run eobs-tg-tn-tx-rr-hu-pp |
The test failed, but I'm sure we can find out why! Pangeo Forge maintainers are working diligently to provide public logs for contributors. |
recipes/EOBS/recipe.py
Outdated
target_chunks = {'time': 40} | ||
dataset_version = 'v23.1e' | ||
grid_res = '0.1' | ||
subset_inputs = {'time': 700} | ||
|
||
|
||
def make_filename(time: str, variable: str) -> str: | ||
return f'https://knmi-ecad-assets-prd.s3.amazonaws.com/ensembles/data/Grid_{grid_res}deg_reg_ensemble/{variable}_ens_mean_{grid_res}deg_reg_{dataset_version}.nc' # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@norlandrhagen, it appears apache-beam doesn't work well with globally defined variables such as grid_res
. Could you move these into the function body of make_filename
?
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 155, in cache_input
fname = config.file_pattern[input_key]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/patterns.py", line 219, in __getitem__
fname = self.format_function(**format_function_kwargs)
File "/tmp/tmpis_skwvg/recipes/EOBS/recipe.py", line 16, in make_filename
NameError: name 'grid_res' is not defined [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah totally. Thanks for looking into this @andersy005
/run eobs-tg-tn-tx-rr-hu-pp |
/run eobs-wind-speed |
🎉 The test run of import xarray as xr
store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1284/eobs-wind-speed.zarr"
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds |
/run eobs-surface-downwelling |
Woohoo! Thanks for picking this back up @andersy005 |
🎉 The test run of import xarray as xr
store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1285/eobs-surface-downwelling.zarr"
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds |
🎉 The test run of import xarray as xr
store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1283/eobs-tg-tn-tx-rr-hu-pp.zarr"
ds = xr.open_dataset(store, engine='zarr', chunks={})
ds |
You bet! Do you plan on pushing more commits to this or should I go ahead and merge it? |
Don't plan on pushing any more commits, so ready to merge! |
pangeo-forge/pangeo-forge-recipes#395