Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF_FATAL can cause a process to exit cleanly instead of terminate with an error code #3317

Closed
andrewkaufman opened this issue Sep 25, 2024 · 2 comments

Comments

@andrewkaufman
Copy link

andrewkaufman commented Sep 25, 2024

Description of Issue

Issuing a TF_FATAL should trigger an abort/terminate with a non-zero exit code and emit a crash report to stderr.

However, on posix systems, the mechanism uses Arch_DebuggerIsAttachedPosix to conditionally exit 0, without the crash report, presumably to avoid interfearing with the debugger. The current implementation is subject to ptrace permissions settings, and on Linux distros which default to "restricted ptrace", it can return false positives.

The result is that TF_FATAL causes an early terminate with no indication to the user of what happened or why. Worse, the exit code 0 indicates it was a successful process completion, so when used in e.g. a render farm or other cloud compute scenario, it will appear as a successfully completed task.

Steps to Reproduce

  1. Create a Dockerfile:
FROM ubuntu:22.04

RUN apt update && apt install -y python3.10 libpython3.10 pip

RUN pip install usd-core
  1. Build a container:
docker build -t fatal-no-op /path/to/dockerfile
  1. Run python and emit a Tf.Fatal:
> docker run -it fatal-no-op python3.10 -c 'import pxr.Tf; pxr.Tf.Fatal("abort")'; echo "Python Exit code: $?"
Python Exit code: 0

Steps to Fix

This is fixed by #3014, if you re-build the container using the updated USD builds from that MR, then the final step produces the expected result:

> docker run -it fatal-fixed python3.10 -c 'import pxr.Tf; pxr.Tf.Fatal("abort")'; echo "Python Exit code: $?"

---------------------------- python3.10 terminated -----------------------------
python3.10 crashed. FATAL ERROR: Python Fatal Error: abort
in __main__.<module> at line 1 of <string>
writing crash report to [ 0bc0d5b385fa:/var/tmp/st_python3.10.1 ] ... done.
--------------------------------------------------------------------------------
Python Exit code: 139

System Information (OS, Hardware)

Ubuntu 22.04 (or any Linux distro with restricted ptrace permissions)

@jesschimein
Copy link
Collaborator

Filed as internal issue #USD-10191

@nvmkuruc
Copy link
Collaborator

@jesschimein This is fixed by #3014 which was released as part of 25.02.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants