-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
berkeley
: Function ensure_schema_collections_and_alldocs
is incompatible with Berkeley schema
#658
Comments
hoping Donny or Eric can look at this in @PeopleMakeCulture absence. cc @shreddd |
Here's an excerpt from the logs of the GitHub Actions workflow run that @aclum linked to above (thanks!): Show/hide excerptwait-for-it.sh: fastapi:8000 is available after 0 seconds
/usr/local/lib/python3.10/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
rootdir: /code
plugins: anyio-4.4.0, asyncio-0.24.0, requests-mock-1.12.1, cov-5.0.0, mock-3.14.0
asyncio: mode=strict, default_loop_scope=None
collected 109 items
demo/metadata_migration/notebooks/test_helpers.py . [ 0%]
tests/e2e/test_minter_api.py xxxx [ 4%]
tests/integration/test_minter_repository.py ....X... [ 11%]
tests/test_api/test_db_mongo.py ..... [ 16%]
tests/test_api/test_endpoints.py ssE
==================================== ERRORS ====================================
_______________ ERROR at setup of test_metadata_validate_json_0 ________________
@pytest.fixture
def api_site_client():
mdb = get_mongo_db()
> rs = ensure_test_resources(mdb)
tests/test_api/test_endpoints.py:204:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_api/test_endpoints.py:121: in ensure_test_resources
ensure_schema_collections_and_alldocs()
tests/test_api/test_endpoints.py:67: in ensure_schema_collections_and_alldocs
materialize_alldocs(
/usr/local/lib/python3.10/site-packages/dagster/_core/definitions/op_definition.py:457: in __call__
return direct_invocation_result(self, *args, **kwargs)
/usr/local/lib/python3.10/site-packages/dagster/_core/definitions/op_invocation.py:217: in direct_invocation_result
result = invoke_compute_fn(
/usr/local/lib/python3.10/site-packages/dagster/_core/execution/plan/compute_generator.py:129: in invoke_compute_fn
return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
nmdc_runtime/site/ops.py:1025: in materialize_alldocs
raise e
nmdc_runtime/site/ops.py:1018: in materialize_alldocs
nmdcdb = NMDCDatabase(
<string>:21: in __init__
???
/usr/local/lib/python3.10/site-packages/nmdc_schema/nmdc.py:440: in __post_init__
self._normalize_inlined_as_list(slot_name="biosample_set", slot_type=Biosample, key_name="id", keyed=True)
/usr/local/lib/python3.10/site-packages/linkml_runtime/utils/yamlutils.py:105: in _normalize_inlined_as_list
self._normalize_inlined(slot_name, slot_type, key_name, keyed, True)
/usr/local/lib/python3.10/site-packages/linkml_runtime/utils/yamlutils.py:190: in _normalize_inlined
cooked_obj = slot_type(**as_dict(list_entry))
<string>:589: in __init__
???
/usr/local/lib/python3.10/site-packages/nmdc_schema/nmdc.py:2120: in __post_init__
self.MissingRequiredField("associated_studies")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Biosample(id='nmdc:bsm-13-amrnys72', type='nmdc:Biosample', name='Sand microcosm microbial communities from a hyporhei...ample_type=None, technical_reps=None, analysis_type=[], sample_link=[], bulk_elect_conductivity=None, infiltrations=[])
field_name = 'associated_studies'
def MissingRequiredField(self, field_name: str) -> None:
""" Generic loader error handler """
> raise ValueError(f"{field_name} must be supplied")
E ValueError: associated_studies must be supplied
/usr/local/lib/python3.10/site-packages/linkml_runtime/utils/yamlutils.py:281: ValueError
---------------------------- Captured stdout setup -----------------------------
local mongodump already exists at /code/tests/nmdcdb For reference, that PR's base branch is the |
The error occurred during the "setup" phase of the following test:
|
def test_metadata_validate_json_0(api_site_client): | |
rv = api_site_client.request( | |
"POST", | |
"/metadata/json:validate", | |
{ | |
"field_research_site_set": [ | |
{"id": "nmdc:frsite-11-s2dqk408", "name": "BESC-470-CL2_38_23"}, | |
{"id": "nmdc:frsite-11-s2dqk408", "name": "BESC-470-CL2_38_23"}, | |
{"id": "nmdc:frsite-11-s2dqk408", "name": "BESC-470-CL2_38_23"}, | |
] | |
}, | |
) | |
assert rv.json()["result"] == "errors" |
Specifically, it occurred while the following fixture was being built:
@pytest.fixture
def api_site_client():
mdb = get_mongo_db()
rs = ensure_test_resources(mdb) # <-- the error occurred here
return RuntimeApiSiteClient(base_url=os.getenv("API_HOST"), **rs["site_client"])
ensure_test_resources
is defined here. Among other things, it calls a function named ensure_schema_collections_and_alldocs
, which is defined here.
Here's an excerpt from the dump_dir = ensure_local_mongodump_exists()
mongorestore_from_dir(mdb, dump_dir, skip_collections=["functional_annotation_agg"]) It calls a function named That function (and the constant it accesses) is defined here: nmdc-runtime/tests/test_api/test_endpoints.py Lines 31 to 51 in 2c300cc
Looks like that function downloads a pre-made database dump from the NERSC filesystem. |
The value of |
In other words, I think this couples the test suite to the specific schema that happened to be in effect when that specific dump was generated. I think this will be insufficient both for (a) the legacy-to-Berkeley roll out we are working on now; and (b) changes to certain aspects of the schema in the future. |
berkeley
ensure_schema_collections_and_alldocs test failingberkeley
: Test ensure_schema_collections_and_alldocs
is failing (mismatch between DB dump and schema)
berkeley
: Test ensure_schema_collections_and_alldocs
is failing (mismatch between DB dump and schema)berkeley
: Function ensure_schema_collections_and_alldocs
causing test failures (mismatch between DB and schema)
During today's infrastructure meeting, we discussed the possibility of disabling this problematic code (which may involve disabling some tests) in order to unblock some current work; and developers can work on implementing a long term fix in parallel. I'll look into this with that path forward in mind today. |
berkeley
: Function ensure_schema_collections_and_alldocs
causing test failures (mismatch between DB and schema)berkeley
: Function ensure_schema_collections_and_alldocs
is incompatible with Berkeley schema (causing test failures)
I created a PR in which I disabled the problematic code. The PR is here: #664 |
berkeley
: Function ensure_schema_collections_and_alldocs
is incompatible with Berkeley schema (causing test failures)berkeley
: Function ensure_schema_collections_and_alldocs
is incompatible with Berkeley schema
This came up in a PR where Patrick was working on updating ETL code.
there are errors about associated_studies being missing from biosample_set, upon further inspection it is trying to compare prod mongo, I inferred this based on it pulling in collection names which don't exist in berkeley like
omics_processing_set
, with the berkeley nmdc-schema release candidate releaseWe need an alldocs in mongo berkeley and for this test to point to that.
cc @dwinston @eecavanna @pkalita-lbl
The text was updated successfully, but these errors were encountered: