Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] _convert_payloads fails with One or more mappers failed to initialize #646

Closed
rakesh-163 opened this issue Sep 18, 2024 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@rakesh-163
Copy link

What are you really trying to do?

I have an activity that calls a function that performs database operations. The first operation in that activity is a read. The SQLModel that I am trying to read is called a "Story".

Describe the bug

The activity runs fine... until it starts to fail and cause failures for the workflow that calls it... It says something about the workflow accessing os.environ.get (See stack trace below) but the function that calls does not have that. So, it is either that the failure should not occur or it may be that it is telling the wrong reason why it occurs. In either case, it is a bug.

Also, I have scoured the library (i.e. SQLAlchemy) module for signs of an os.environ.get, I did not find any.

I have also passed through all external library imports at this point...

Any help would be appreciated!

Minimal Reproduction

It is hard to reproduce because a lot of the times the activity just succeeds. It is usually after an hour or so, this particular activity starts to fail.

Environment/Versions

  • OS and processor: Linux
  • Temporal Version: Python SDK: temporalio==1.6.0
  • Are you using Docker or Kubernetes or building Temporal from source? I am using Docker for the workers.

Additional context

Here's the full stack trace that I see when the failure occurs:

{"message":"Failed decoding arguments","stackTrace":" File "/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 326, in activate\n self._apply(job)\n\n File "/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 422, in _apply\n self._apply_resolve_activity(job.resolve_activity)\n\n File "/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 654, in _apply_resolve_activity\n ret_vals = self._convert_payloads(\n ^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1563, in _convert_payloads\n raise RuntimeError("Failed decoding arguments") from err\n","cause":{"message":"One or more mappers failed to initialize - can't proceed with initialization of other mappers. Triggering mapper: 'Mapper[Story(story)]'. Original exception was: Cannot access os.environ.get from inside a workflow. If this is code from a module not used in a workflow or known to only be used deterministically from a workflow, mark the import as pass through.","stackTrace":" File "/usr/local/lib/python3.12/site-packages/temporalio/worker/_workflow_instance.py", line 1555, in _convert_payloads\n return self._payload_converter.from_payloads(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/temporalio/converter.py", line 307, in from_payloads\n values.append(converter.from_payload(payload, type_hint))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/temporalio/converter.py", line 583, in from_payload\n obj = value_to_type(type_hint, obj, self._custom_type_converters)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/temporalio/converter.py", line 1533, in value_to_type\n return getattr(hint, "parse_obj")(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/typing_extensions.py", line 2853, in wrapper\n return arg(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/sqlmodel/main.py", line 951, in parse_obj\n return cls.model_validate(obj, update=update)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/sqlmodel/main.py", line 848, in model_validate\n return sqlmodel_validate(\n ^^^^^^^^^^^^^^^^^^\n\n File "/usr/local/lib/python3.12/site-packages/sqlmodel/_compat.py", line 311, in sqlmodel_validate\n new_obj = cls()\n ^^^^^\n\n File "", line 4, in init\n\n File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/state.py", line 566, in _initialize_instance\n manager.dispatch.init(self, args, kwargs)\n\n File "/usr/local/lib/python3.12/site-packages/sqlalchemy/event/attr.py", line 497, in call\n fn(*args, **kw)\n\n File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py", line 4396, in _event_on_init\n instrumenting_mapper._check_configure()\n\n File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py", line 2388, in _check_configure\n _configure_registries({self.registry}, cascade=True)\n\n File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py", line 4204, in _configure_registries\n _do_configure_registries(registries, cascade)\n\n File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/mapper.py", line 4241, in _do_configure_registries\n raise e\n","applicationFailureInfo":{"type":"InvalidRequestError"}},"applicationFailureInfo":{"type":"RuntimeError"}}


Also, I am using this sandboxed runner on the workers to help deal with the datetime issue that the Pydantic models pose. I am not sure if this interacts with the converter bits in the stack trace above.

def new_sandbox_runner() -> SandboxedWorkflowRunner:
# TODO(cretz): Use with_child_unrestricted when #254
# is fixed and released
invalid_module_member_children = dict(
SandboxRestrictions.invalid_module_members_default.children
)
del invalid_module_member_children["datetime"]
return SandboxedWorkflowRunner(
restrictions=dataclasses.replace(
SandboxRestrictions.default,
invalid_module_members=dataclasses.replace(
SandboxRestrictions.invalid_module_members_default,
children=invalid_module_member_children,
),
)
)

@rakesh-163 rakesh-163 added the bug Something isn't working label Sep 18, 2024
@cretz
Copy link
Member

cretz commented Sep 23, 2024

Something in sqlalchemy is calling os.environ but they choose to swallow that stack trace and wrap with "One or more mappers failed to initialize" so you can't see where. I would recommend not using sqlalchemy ORM objects inside a workflow, but instead have simpler dataclass objects you translate to sqlalchemy equivalents in activities as needed.

@rakesh-163
Copy link
Author

Hey Chad, Thanks for the reply. Appreciate that you looked inside the sqlalchemy codebase. Could you point me to the line of code that is doing the os.environ call? I could not find it when I grepped for it.

@cretz
Copy link
Member

cretz commented Sep 25, 2024

Could you point me to the line of code that is doing the os.environ call? I could not find it when I grepped for it.

It may be nested in something else and not directly called. To debug, first you'd need to patch sqlalchemy to not swallow the true stack trace of why a mapper fails to initialize. Probably around https://github.com/sqlalchemy/sqlalchemy/blob/rel_2_0_35/lib/sqlalchemy/orm/mapper.py#L4232-L4252. You need to true stack trace of that exception. Regardless, I would not recommend using sqlalchemy models in workflows because they are likely non-deterministic.

@nvachhar
Copy link

I've hit lots of issues with SQLModel and Temporal. One thing I learned is that using any SQLModel where table=True as a type to pass between Workflows and Activities is fraught. These objects have tight relationships to sessions and it requires far too much care to get things right. Instead, having a base class where table=False is far safer. This class behaves like a Pydantic BaseModel and is a lot safer/easier to use.

@rakesh-163
Copy link
Author

Yes! Indeed, I ended up doing exactly what you are suggesting. It is not ideal though.

There is something in the SQLAlchemy (the ORM) that irritates the Python SDK.

I am not sure if there is any "real" non-determinism though. I am just trying to write to a database using an ORM, really. It is either that the ORM is doing something fancy (and I never had the time to dig into why), or it is because the Python SDK is being very aggressive about the use of os.environ.get -- when practically, the environment doesn't really change.

@nvachhar
Copy link

I think assuming the environment doesn't change is dangerous. Workflows and activities can be running in different processes on different machines. If you scale out your worker, there could be multiple instances with different environments. Since workflows have to be deterministic, blocking the environment seems reasonable.

IMHO, SQLModel is very appealing but differences between how table models and non-table models work is a bit jarring. I don't have the example handy, but even model validation is different in the 2 cases in a way I found shocking. I've never used SQLAlchemy without SQLModel, but there's quite a bit of subtlety in how to use ORM objects. Combine that with the strict rules of Temporal, and it makes for some complex work.

But let us all know if you ever do figure out what was accessing the environment. These errors are the most annoying to track down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants