-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
At times Segfault during deconstruction after upgrade from 1.76 to 1.80 #111
Comments
can you provide a script that causes the segfaut? |
Unfortunately not really. The script is huge (~70'000 lines) and does a lot. And it does only fail on certain data, but the difference is not easy to figure out as it does a lot of queries and it only segfaults on the teardown of the driver. |
You can enable tracing? https://metacpan.org/pod/DBI#trace |
I created 2 tracesfiles, one from a run of the script that segfaults and one from a run where it doesn't. |
We‘re having the same issue with each test case that uses DBIx::Class for example via Test::WWW::Mechanize::Catalyst. |
The trace files are quite big. Each contains round about 2 million lines. Can you repeat your test with Does anybody have a stack trace or a core dump? |
Yes, doing an explicit disconnect e.g. with |
Made new traces with tracelevel 5. |
I have this random issue too but I'm not sure if it was the upgrade from oracle-instantclient12.2 to oracle-instantclient19.6 or the upgrade from perl-DBD-Oracle-1.74-12.2.0.1.0 to perl-DBD-Oracle-1.80-19.6.0.0.0. I'll try to down grade perl-DBD-Oracle to see if we still get the random Seq Faults but it is weird. |
@mrdvt92 it would almost certainly be 1.80 of dbd::oracle |
It happens for me as well in 1.791. In my case I'm able to recreate in situations where there are mutliple connections, at least one of them lives outside the main script and no disconnect is called. Ex. connect.pl
connect.inc
Uncommenting $dbh->disconnect does fix the Seg Fault in this example. Setting local scope for $dbh2 also fixes it. Perl 5.30.0 (with threads) |
I am also observing this issue for a module with multiple oracle connections. using installs from Backpan I was able to zero in on a change between versions 1.75_2 (has no segmentation fault) and 1.77_1 (has segmentation fault). [there were no versions available in between] I also see the segmentation fault clear out if there is an explicit disconnect for 1.77_1 and beyond. Perl 5.30.2 (no threads) [also observed for 5.30.1 with threads] |
We just ran into this issue by upgrading to the 19c client. Here is what I sent to the dbi-users list,
I assuming this is the change causing the segfault with 19c client. Destroy envhp with last dbh (GH#93, GH#89, Dean Hamstead, CarstenGrohmann) |
This appears to have something to do with global destruction. The following code segfaults: use DBI; { use DBI; { So there must be some object that's being destroyed in the wrong order when global destruction happens. (Tested on Perl 5.16.3, CentOS 7.8, DBD::Oracle 1.80, Oracle 18c) |
I added some debugging code. The one that does not segfault (with the my variables) prints this: In destructor: Calling dbd_db_disconnect The one that does segfault prints this: In destructor: Calling dbd_db_disconnect Notice how in the one that segfaults, dbd_dr_destroy is called before the second $dbh destructor is called. The global destructor is destroying objects in the wrong order. |
The attached patch fixes the problem for me. I would not say I'm particularly happy with this patch; I see it more as a workaround than a proper fix, but I'm attaching it for anyone who wants to try it out. |
Unfortunately the patch did not work for my case. I still got the same seg fault. It would be nice to have a proper fix for this because as it is now my $work is locked in at v1.76. |
What perl and oracle versions did people try with this patch? |
On 2020-08-07 17:47, Wesley Hinds wrote:
Unfortunately the patch did not work for my case
Are you sure you ran it against the patched version? I copied exactly
your case, and this is what I get:
First, I run it against the original DBD::Oracle version 1.80 and I get
the segmentation fault:
# make -C DBD-Oracle-1.80-ORIG install
# perl connect.pl
Segmentation fault
Next, I run it against the patched version and there's no segfualt:
# make -C DBD-Oracle-1.80 install
# perl connect.pl
connect.pl and connect.inc are exactly as you posted. Here's connect.pl:
…------------------------------------------------
use DBI;
use DBD::Oracle;
$dbh = DBI->connect("dbi:Oracle:XEPDB1", 'rtdb1', 'password');
require("connect.inc");
------------------------------------------------
and here is connect.inc:
------------------------------------------------
$dbh2 = DBI->connect("dbi:Oracle:XEPDB1", 'rtdb1', 'password');
------------------------------------------------
This is on CentOS Linux release 7.8.2003, and Perl 5.16.3.
Regards,
Dianne.
|
CentOS 7.8.2003 CentOS 8.2.2004 I tried with both my case and the @dfskoll case. I'm not using Oracle XE btw. I applied the patch correctly. I don't know what I'm doing wrong, it seems like it should work. Maybe someone else can give it a go. |
Can you flip this over to a pull request? That will have it run through Travis |
Hi,
I tried to create a pull request, but I lack permission.
$ git push --set-upstream origin work-around-segfault-on-handle-destruction
Username for 'https://github.com': dfskoll
Password for 'https://[email protected]':
remote: Permission to perl5-dbi/DBD-Oracle.git denied to dfskoll.
fatal: unable to access 'https://github.com/perl5-dbi/DBD-Oracle/': The
requested URL returned error: 403
Regards,
Dianne.
|
On 11/8/20 10:55 pm, Dianne Skoll wrote:
Hi,
I tried to create a pull request, but I lack permission.
$ git push --set-upstream origin work-around-segfault-on-handle-destruction
Username for 'https://github.com'
<https://urldefense.com/v3/__https://github.com'__;!!GqivPVa7Brio!NnAn1kUR7uKe8N6UxfjVzhGNbT32jQEcaz03adA-Wupd7Wg-alLCng1SHGwOB7OEsSJOzw$>: dfskoll
Password for ***@***.***'
***@***.***'__;!!GqivPVa7Brio!NnAn1kUR7uKe8N6UxfjVzhGNbT32jQEcaz03adA-Wupd7Wg-alLCng1SHGwOB7OVXg7hrA$>:
remote: Permission to perl5-dbi/DBD-Oracle.git denied to dfskoll.
fatal: unable to access 'https://github.com/perl5-dbi/DBD-Oracle/'
<https://urldefense.com/v3/__https://github.com/perl5-dbi/DBD-Oracle/'__;!!GqivPVa7Brio!NnAn1kUR7uKe8N6UxfjVzhGNbT32jQEcaz03adA-Wupd7Wg-alLCng1SHGwOB7NhkPQCww$>:
The
requested URL returned error: 403
Regards,
Dianne.
Were you pushing to your own fork? If not, try that, and then on on your repo's page a button that will create a PR in the perl5-dbi/DBD-Oracle repo
should magically appear.
Chris
…
|
@mjegh are you around? its looking like time for a release |
I'm not sure I can. I retired and don't have access to Oracle now and so I cannot even run the test suite. Also, the Linux machine I did the build on was at work. I might be able to get access for a while at the weekend. Can you point me at the distzilla instructions you gave me before? as I can't find them. I'll try and work out what has been changed as I've not been keeping up. |
This patch fixes perl5-dbi#111 During global destruction, the function dbd_dr_destroy is sometimes called before all handles are destroyed. It frees resources uses in the handle DESTROY function, causing a segfault when the handle DESTROY function tries to disconnect the handle. This patch simply sets a flag in dbd_dr_destroy which makes per-handle DESTROY functions skip trying to disconnect the handle.
On 2020-08-12 00:48, Christopher Jones wrote:
Were you pushing to your own fork? If not, try that, and then on on
your repo's page a button that will create a PR in the
perl5-dbi/DBD-Oracle repo should magically appear.
OK, thank you! I'm relatively new to the Github workflow. That worked,
and I've created the PR.
Regards,
Dianne.
|
I've tried to rewrite login6 function, to support more concise caching of OCIEnv*. Attached patch fixes 2 problems:
The rewrite is relatively large. I had to add refcounting to cached environments and removed use of global variables for storing information about charsets. Additionally there are few fixes that silence warnings (which really were small errors) I've added 2 tests. One reproduces problem with Segfault, and another problem with different charsets. Well, everything is relativ. Tests work in CYGWIN. Also I didn't have chance to test DRCP and shared connections. Though, I suspect that support for shared connections is broken. At least it looks very suspicious. It would be good, if someone tries to run it in other environments, since even copying of code could introduce some unexpected side-effects. |
@avorop wow awesome effort! |
@avorop are you able to submit this as a pull request? |
Ive pulled the patch in to a commit, one very minor tweak to get it to apply. here is the commit on a brand for people to test |
it looks like there are 64bit windows accommodations in that diff too |
I can't comment on the charset issue or DRCP. But the segfault issue was ever present in my $work environment. So far I'm happy to say I'm unable to reproduce the segfault after quite a bit a of testing. CentOS 7.9.2009 3.10.0-1160.53.1.el7.x86_64 I am seeing this though:
And this rare one that is probably related:
So far it seems pretty good. I'll keep running this patch for a while and see if anything else comes up. Hopefully others can test as well. Thanks @avorop. |
if you can just run perl -Ilib t/14threads.t you should get a full error output. |
This test fails in all versions that I have. Unfortunately, not on every
run,
which is normal for threads. I don't have exact message, something about
freeing unreferenced SV. The interesting part is, I don't have ithreads in
my perl,
so I would assume that the test shall be skipped in such environment. Though
it is interesting, what is wrong there, it might be perl problem.
…On Thu, Feb 24, 2022 at 5:32 AM Dean Hamstead ***@***.***> wrote:
t/14threads.t
if you can just run perl -Ilib t/14threads.t you should get a full error
output.
—
Reply to this email directly, view it on GitHub
<#111 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHVTP3PGS7OXG644JVNVFBTU4WYG5ANCNFSM4JF4VQCQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It appears I was wrong. My perl does have ithreads, at least in Cygwin were I spent some time trying to understand, where the issue comes from. It is connected to my changes. this t/14threads.t uses imp_data to copy connection information between threads. So, I create SV to hold envhp. Pointer to it is stored in imp_dbh. When imp_data is captured, this pointer is copied as is, of course pointers to allocated envhp and other handles are copied in the same way. Then this data is passed to another thread and used by that thread to access data pointed to. Everything works. EXCEPT that all SV suddenly get funny reference counts. Here is trace for one of such SV: init_drh 8011e0b30 new envhp 4f48d0 in SV 8013d5588 The SV is not visible to Perl, so perl should not mess with refcount. And it is so, unless imp_data is captured. But in the trace above the reference count is changed. There are even really funny values like 36. And then I come across this passage in perldoc perlapi:
That implies, that on Windows such blind copying of pointers between threads is dangerous, and if the memory gets overwritten, then it would result in crash. I don't know if handles allocated by Oracle suffer from the same problem or it uses some memory outside of current thread. Since there are not so many crashes, then the latter must be true. Anyway, below is patch that protects against "Freeing unreferenced SV". Though since I've spent so much time on this already, I shall try to rewrite current support of multi-threading to fix memory leaking when threads are in use. At least it appears to be possible. I'll let you know how it goes. |
Our system is using DBD::Oracle 1.74 and Oracle client 12.1.0.2. As soon as I pointing to Oracle 19.3.0.0 client it got a lot core dump because of segmentation fault. Then I compiled DBD::Oracle 1.83 using Oracle client 19.3.0.0 it got worse more segmentation fault. Downgrade using 1.76 getting less segfault. Downgrade to 1.74 much better but still got segfault. Adding ulimit -u unlimited now it getting stable and so far no more segmentation fault. I hope that the resolution. |
Please try this branch #147 |
Everyone please try https://github.com/perl5-dbi/DBD-Oracle/tree/cand-v1.90 |
I tested with 19c and 1.90_1 works for me, thanks! -sunnavy |
1.90_3 is now tagged, watch for it on metacpan soon |
I have created a v1.90 milestone and attached this issue to it |
This one better make test connect to oracle passed. But I still get segfault every 1000 run
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Dean Hamstead ***@***.***>
Sent: Wednesday, April 20, 2022 11:48:20 PM
To: perl5-dbi/DBD-Oracle ***@***.***>
Cc: andynmaas ***@***.***>; Comment ***@***.***>
Subject: Re: [perl5-dbi/DBD-Oracle] At times Segfault during deconstruction after upgrade from 1.76 to 1.80 (#111)
1.90_3 is now tagged, watch for it on metacpan soon
—
Reply to this email directly, view it on GitHub<#111 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADJ2MNPSJO6WJMZEKPKJ433VGDMZJANCNFSM4JF4VQCQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
@andynmaas can you provide more details as to how you are consistently creating a segfault? |
Hi,
The 4 cyclic job running parallel in control-m every 10 min. Every job running more than one file transfer and it accessing configuration in oracle table. Around 2000 file transfer it will failed with segfault if using oracle 19c library. If I keep binary on 19c but library using 18c it will never fail with segfault. It maybe problem on control-m. So for now i just use 18c library as fixed in prod.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Dean Hamstead ***@***.***>
Sent: Thursday, August 11, 2022 12:05:53 AM
To: perl5-dbi/DBD-Oracle ***@***.***>
Cc: andynmaas ***@***.***>; Mention ***@***.***>
Subject: Re: [perl5-dbi/DBD-Oracle] At times Segfault during deconstruction after upgrade from 1.76 to 1.80 (#111)
@andynmaas<https://github.com/andynmaas> can you provide more details as to how you are consistently creating a segfault?
—
Reply to this email directly, view it on GitHub<#111 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADJ2MNIGLGIZHPLMX5JUK7TVYSC3DANCNFSM4JF4VQCQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Would you be able to make some tiny script that replicates it? |
Hi,
I have test using Oracle client 19.3.0.0. It failed test t25 three of 83 test. It failed intermittently after sround 4000 run with segmentation fault. It also failed on DBD:Oracle 1.74, 1.76, and 1.84. Using Oracle 18.5.0.0 or 12.1.0.2 it pass make test connect to database and never fail with segmentation fault.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: sunnavy ***@***.***>
Sent: Monday, April 4, 2022 5:28:45 PM
To: perl5-dbi/DBD-Oracle ***@***.***>
Cc: andynmaas ***@***.***>; Comment ***@***.***>
Subject: Re: [perl5-dbi/DBD-Oracle] At times Segfault during deconstruction after upgrade from 1.76 to 1.80 (#111)
I tested with 19c and 1.90_1 works for me, thanks!
-sunnavy
—
Reply to this email directly, view it on GitHub<#111 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADJ2MNOTCTJNHLZTJDBKSTTVDNUJ3ANCNFSM4JF4VQCQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I would say it is an issue with Oracle Client 19.3 in particular. I ran 25plsql.t 4000 times without issue against 19.6 and cand-v1.90 (on a docker). The current client version available from Oracle is 19.16 so I would use that. I don't see this as issue with DBD::Oracle cand-v1.90. |
Hi,
I just want to update this ORA-24550 segmentation fault. I did upgrade DBD:Oracle to version 1.90_5 and compiled with Oracle client 19.19.0.0. This version not failed in 25plsql.t make test. But running the control-m 9.20fp2 after 4000 run it failed again with segmentation fault
ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1] [si_int=0] [si_ptr=(nil)] [si_addr=0x4c4].
Look like I am stuck to use client version 18.5.0.0 which never cause that error
Sent via the Samsung Galaxy S20+ 5G, an AT&T 5G smartphone
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Dean Hamstead ***@***.***>
Sent: Wednesday, March 22, 2023 5:07:10 PM
To: perl5-dbi/DBD-Oracle ***@***.***>
Cc: andynmaas ***@***.***>; Mention ***@***.***>
Subject: Re: [perl5-dbi/DBD-Oracle] At times Segfault during deconstruction after upgrade from 1.76 to 1.80 (#111)
Closed #111<#111> as completed.
—
Reply to this email directly, view it on GitHub<#111 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADJ2MNP2QWVXLQ5P67MHWETW5NZY5ANCNFSM4JF4VQCQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Sorry, can you please clarify. Does segfault happen with version 19.19 of
Oracle libraries, but doesn't happen with version 18?
Best regards
Andrey Voropaev
…On Tue, 21 Nov 2023, 18:11 andynmaas, ***@***.***> wrote:
Hi,
I just want to update this ORA-24550 segmentation fault. I did upgrade
DBD:Oracle to version 1.90_5 and compiled with Oracle client 19.19.0.0.
This version not failed in 25plsql.t make test. But running the control-m
9.20fp2 after 4000 run it failed again with segmentation fault
ORA-24550: signal received: [si_signo=11] [si_errno=0] [si_code=1]
[si_int=0] [si_ptr=(nil)] [si_addr=0x4c4].
Look like I am stuck to use client version 18.5.0.0 which never cause that
error
Sent via the Samsung Galaxy S20+ 5G, an AT&T 5G smartphone
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Dean Hamstead ***@***.***>
Sent: Wednesday, March 22, 2023 5:07:10 PM
To: perl5-dbi/DBD-Oracle ***@***.***>
Cc: andynmaas ***@***.***>; Mention ***@***.***>
Subject: Re: [perl5-dbi/DBD-Oracle] At times Segfault during
deconstruction after upgrade from 1.76 to 1.80 (#111)
Closed #111<#111> as
completed.
—
Reply to this email directly, view it on GitHub<
#111 (comment)>, or
unsubscribe<
https://github.com/notifications/unsubscribe-auth/ADJ2MNP2QWVXLQ5P67MHWETW5NZY5ANCNFSM4JF4VQCQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub
<#111 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHVTP3PGB2HZCWNU2OZZ6JTYFTOCLAVCNFSM4JF4VQC2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBSGEZTENJZGYZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
For info : I also observed segfaults with DBD::Oracle 1.83, ocli client 21.8.0.0 and server 19.0.0.0.0. The workaround was to add an END block in our module responsible for opening database connections :
|
Via the changelog we found the maybe relevant change:
https://metacpan.org/diff/file?target=MJEVANS/DBD-Oracle-1.80/&source=ZARQUON%2FDBD-Oracle-1.76#dbdimp.c
We use it in a rather complex internal tool and the segfault sometimes happens at the very end. Still we are able to consistently reproduce it.
How could we support you in finding the root cause?
The text was updated successfully, but these errors were encountered: