-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-856569: Segmentation Fault in cache.py pickle dump #1627
Comments
Current workaround is to patch these two lines out to prevent the cache from read/writing to disk:
|
@Ben-A-Canva Do you have the code to reproduce the problem? |
Plus, do you see the same problem on other OS if you happen to use other OS? |
At the request of our support agent (case 00555584), adding our repro steps here. We're experiencing the same thing as @Ben-A-Canva, and I'd add that this only started happening on 6/29AM Pacific - we have previously had 500+ successful runs with exactly the same dependencies and project configuration. I understand you can't provide support for dbt-core, but since the error started occurring on a particular day, my suspicion is that it's originating from an infrastructure change on the Snowflake side rather than a code change in dbt or the connector library. Python version$ python3 -VV
Python 3.9.16 (main, May 23 2023, 14:24:31)
[GCC 8.3.0] Operating system and processor architecture$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
$ arch
x86_64 I also cannot get it to repro on MacOS 13.4.1, Installed packages$ pip freeze | grep snowflake
dbt-snowflake==1.5.1
snowflake-connector-python==3.0.3 # also repros on 3.0.4 and commit hash 48aa932aeffd84da101928297dfb851f96342d98 What did you do?Same as above, running our existing What did you expect to see?No segfaults, which was the behavior prior to 2023-06-28 Abridged stack trace (from a single thread)Expand
|
I can't share our entire dbt project, but here's an example repo that I've been able to get the error to repro in. |
Thanks @verhey. |
I was able to avoid the segfault by replacing these lines: with with open(tmp_file, "wb") as w_file:
w_file.write(pickle.dumps(self)) I tried this after noting that the faulting thread reliably had the same native stack trace:
That suggested a problem with the CPython implementation of pickle.dump(). Perhaps part of the issue is that the object being pickled is modified by another thread during serialization? I verified the fix using the repro project (https://github.com/verhey/snowflake-thread-repro) created by @verhey. I was able to trigger the segfault every few runs with that method, and found that with the change above there was no segfault in over 20 runs. |
This seems to fix our segfaults. I'll need to do a few more test runs but so far it's been fine for the 2 runs i've tried (it would fail about 95% of the time without the patch) |
I couldn't reproduce it in my env yet. Could you try if removing the def __del__(self) -> None:
try:
self._save()
except Exception:
# At tear-down time builtins module might be already gone, ignore every error
pass |
@sfc-gh-yixie My change is in the |
I should have resolved all the issues now! |
@sfc-gh-mkeller Glad the repro repo was useful. I've tested with that repo against two commits - the fix from @peterallenwebb's PR on this branch and the (at the time of writing this) tip of your current PR on this one. Didn't get any segfaults in 20 runs for either of them via I also did a quick test of 10 runs of Wouldn't call my testing alone conclusive, but on previous builds that almost certainly would've triggered at least one segfault, so I'd be optimistic about the fixes at least. |
Appreciate all of your work on this! Just in case this is a helpful datapoint: Exactly as @verhey had said, on 6/28 our logging revealed a sudden spike of segfaults across many internal workloads on our data platform, where we historically had none before. Our data workloads run on effectively static servers in static containers, so it does not feel like a sudden dependency update thing. We are using thread based parallelism with Prefect, afaik we are creating a new connection object per-thread (which to be frank I am not sure is OK to do).
|
I did re-run my testing with your latest code in #1635 , @sfc-gh-mkeller, and found that the segfaults were fixed. |
@A132770 This python stack trace, particularly with the thread stopped at line 528 in cache.py, fits the pattern exactly. |
In our case we found it temporarily sufficient to roll back to before the cache change was implemented (2.7.9). Looking at some of our various containers looks like some slightly newer versions that include the cache change (e.g. 2.7.12) also did not manifest the segfaults -- so it's not quite clear to me at what version this issue started, and why it seemed to be 'activated' on 6/28. Hope this helps someone! |
Python version
Python 3.8.12 (default, Nov 17 2021, 08:36:07) [Clang 7.1.0 (tags/RELEASE_710/final)]
Operating system and processor architecture
Linux x86_64
Installed packages
What did you do?
Running DBT on our linux machines with more than 1 thread triggers a segmentation fault in snowflake-connector-python for versions >= 3.x.x
We're not seeing this issue locally on MacOS.
What did you expect to see?
The offending line is here and the truncated stack trace:
The text was updated successfully, but these errors were encountered: