Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dictionary-related crash in 6.24.00-patches #7657

Closed
scott-snyder opened this issue Mar 24, 2021 · 10 comments · Fixed by #7773 or #7780
Closed

New dictionary-related crash in 6.24.00-patches #7657

scott-snyder opened this issue Mar 24, 2021 · 10 comments · Fixed by #7773 or #7780
Assignees
Milestone

Comments

@scott-snyder
Copy link
Contributor

hi -

In the past few days, we've been seeing a new dictionary-related crash
in the atlas builds using root-6.24.00-patches.

I apologize that i don't have a standalone reproducer for this, but it
can be reproduced readily on lxplus in a recent atlas nightly:

$ asetup master--dev4LCG,Athena,r2021-03-21
$ python
>>> import ROOT
>>> cl1 = ROOT.TClass.GetClass('LArAutoCorrMC')
 *** Break *** segmentation violation
 ...

Note that the problem does not occur if run against the dbg root libraries.

Here is the stack trace:

#0  0x00007fffefb5d19f in TCling::RegisterModule(char const*, char const**, char const**, char const*, char const*, void (*)(), std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > const&, char const**, bool, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#1  0x00007ffff551ec61 in TROOT::RegisterModule(char const*, char const**, char const**, char const*, char const*, void (*)(), std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > const&, char const**, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#2  0x00007fffd50a6cf2 in (anonymous namespace)::TriggerDictionaryInitialization_libAthenaPoolUtilitiesDict_Impl() ()
   from /cvmfs/atlas-nightlies.cern.ch/repo/sw/master--dev4LCG_Athena_x86_64-centos7-gcc8-opt/2021-03-21T0600/Athena/22.0.30/InstallArea/x86_64-centos7-gcc8-opt/lib/libAthenaPoolUtilitiesDict.so
#3  0x00007fffd508bad3 in __static_initialization_and_destruction_0(int, int) [clone .constprop.966] ()
   from /cvmfs/atlas-nightlies.cern.ch/repo/sw/master--dev4LCG_Athena_x86_64-centos7-gcc8-opt/2021-03-21T0600/Athena/22.0.30/InstallArea/x86_64-centos7-gcc8-opt/lib/libAthenaPoolUtilitiesDict.so
#4  0x00007fffd508bd1b in _GLOBAL__sub_I_AthenaPoolUtilitiesDictReflexDict.cxx
    ()
   from /cvmfs/atlas-nightlies.cern.ch/repo/sw/master--dev4LCG_Athena_x86_64-centos7-gcc8-opt/2021-03-21T0600/Athena/22.0.30/InstallArea/x86_64-centos7-gcc8-opt/lib/libAthenaPoolUtilitiesDict.so
#5  0x00007ffff7dea9c3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff7def59e in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7dea7d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7deeb8b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff720afab in dlopen_doit () from /lib64/libdl.so.2
#10 0x00007ffff7dea7d4 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff720b5ad in _dlerror_run () from /lib64/libdl.so.2
#12 0x00007ffff720b041 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#13 0x00007fffefcd2e24 in cling::utils::platform::DLOpen(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#14 0x00007fffefbd7838 in cling::DynamicLibraryManager::loadLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#15 0x00007fffefb06b14 in TCling::Load(char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#16 0x00007ffff55908e6 in TSystem::Load(char const*, char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#17 0x00007ffff551c9e3 in TROOT::LoadClass(char const*, char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#18 0x00007fffefb05a65 in TCling::ShallowAutoLoadImpl(char const*) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#19 0x00007fffefb1acd2 in TCling::DeepAutoLoadImpl(char const*, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#20 0x00007fffefb1b377 in TCling::AutoLoad(char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#21 0x00007ffff560ee86 in TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#22 0x00007ffff563608a in TProtoClass::FillTClass(TClass*) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#23 0x00007ffff56188e5 in TClass::Init(char const*, short, std::type_info const*, TVirtualIsAProxy*, char const*, char const*, int, int, ClassInfo_t*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#24 0x00007ffff5619f76 in TClass::TClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#25 0x00007ffff561a48a in ROOT::CreateClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#26 0x00007ffff562906a in ROOT::TGenericClassInfo::GetClass() ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#27 0x00007ffff560ee22 in TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#28 0x00007ffff563608a in TProtoClass::FillTClass(TClass*) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#29 0x00007ffff56188e5 in TClass::Init(char const*, short, std::type_info const*, TVirtualIsAProxy*, char const*, char const*, int, int, ClassInfo_t*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#30 0x00007ffff5619f76 in TClass::TClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#31 0x00007ffff561a48a in ROOT::CreateClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#32 0x00007ffff562906a in ROOT::TGenericClassInfo::GetClass() ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#33 0x00007ffff5606b70 in TClass::LoadClassDefault(char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#34 0x00007ffff560ee9c in TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patche
s/x86_64-centos7-gcc8-opt/lib/libCore.so
#35 0x00007ffff645805b in ?? ()
#36 0x000000000000042e in ?? ()
#37 0x00007fffefb8337e in TClingMethodInfo::TClingMethodInfo(TClingMethodInfo const&) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#38 0x00007ffff560f4b0 in ?? ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#39 0x00007fffffff3208 in ?? ()
#40 0x00007fffffff3180 in ?? ()
#41 0x000000010559b1b0 in ?? ()
#42 0x0000000000000000 in ?? ()

The ultimate trigger here is what it usually is --- trying to find the TClass
for something for which no dictionary was generated, leading to autoparsing.
I can get rid of the crash in the atlas build by generating dictionaries
for these classes (here, various specialization of
CondMultChanCollection<LArConditionsSubset<T> >).
However, i also looked at bit at the root cause of the crash.

The summary is that it seems to involve a Transaction object being released
multiple times back to the TransactionPool. And the fact that TransactionPool
is used only in opt builds is why we don't see this crash if libCling
is compiled with debugging.

The crash in RegisterModule happens in the last line here:

   for (auto& fwdDeclArgToSkipPair : fwdDeclsArgToSkip){
      const std::string& fwdDecl = fwdDeclArgToSkipPair.first;
      const int nArgsToSkip = fwdDeclArgToSkipPair.second;
      auto compRes = fInterpreter->declare(fwdDecl.c_str(), &T);
      assert(cling::Interpreter::kSuccess == compRes &&
            "A fwd declaration could not be compiled");
      if (compRes!=cling::Interpreter::kSuccess){
         Warning("TCling::RegisterModule",
               "Problems in declaring string '%s' were encountered.",
               fwdDecl.c_str()) ;
         continue;
      }

      // Drill through namespaces recursively until the template is found
      if(ClassTemplateDecl* TD = FindTemplateInNamespace(T->getFirstDecl().getSingleDecl())){

because the Transaction T we get back is empty (size()==0),
so T->getFirstDecl() is undefined.

The reason this happens is because we have the same Transaction object
entered twice in the TransactionPool. Interpreter::declare gets the Transaction
object from the pool and fills it in. It has a valid entry at this time.
But it then creates a nested transation. It gets a new pointer from the
pool, which points at the same Transaction object still in use.
When the constructor is run, the entry in that Transaction object is erased,
ultimately leading to getting an empty transaction above.

Now, the TransactionPool gets messed up on an earlier GetClass call,
at this code in LookupHelper.cxx:

                        // NOTE: We cannot instantiate the scope: not a valid decl.
                        // Need to unload it if this decl is a definition.
                        // But do not unload pre-existing fwd decls. Note that this might have failed
                        // because several other Decls failed to instantiate, leaving several Decls
                        // in invalid state. We should be unloading all of them, i.e. inload the
                        // current (possibly nested) transaction.
                        auto *T = const_cast<Transaction*>(m_Interpreter->getCurrentTransaction());
                        m_Interpreter->unload(*T);
                        *setResultType = nullptr;
                        return 0;

The call stack here is

#0  0x00007fffef9dde70 in cling::Interpreter::unload(cling::Transaction&)@plt
    ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#1  0x00007fffefbf28ce in cling::LookupHelper::findScope(llvm::StringRef, cling::LookupHelper::DiagSetting, clang::Type const**, bool) const ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#2  0x00007fffefb854d9 in TClingClassInfo::TClingClassInfo(cling::Interpreter*, char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#3  0x00007fffefb06c54 in TCling::GetInterpreterTypeName(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool)
    ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCling.so
#4  0x00007ffff560f184 in TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#5  0x00007ffff563608a in TProtoClass::FillTClass(TClass*) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#6  0x00007ffff56188e5 in TClass::Init(char const*, short, std::type_info const*, TVirtualIsAProxy*, char const*, char const*, int, int, ClassInfo_t*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#7  0x00007ffff5619f76 in TClass::TClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#8  0x00007ffff561a48a in ROOT::CreateClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#9  0x00007ffff562906a in ROOT::TGenericClassInfo::GetClass() ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#10 0x00007ffff560ee22 in TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#11 0x00007ffff563608a in TProtoClass::FillTClass(TClass*) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#12 0x00007ffff56188e5 in TClass::Init(char const*, short, std::type_info const*, TVirtualIsAProxy*, char const*, char const*, int, int, ClassInfo_t*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#13 0x00007ffff5619f76 in TClass::TClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#14 0x00007ffff561a48a in ROOT::CreateClass(char const*, short, std::type_info const&, TVirtualIsAProxy*, char const*, char const*, int, int) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#15 0x00007ffff562906a in ROOT::TGenericClassInfo::GetClass() ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#16 0x00007ffff5606b70 in TClass::LoadClassDefault(char const*, bool) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
#17 0x00007ffff560ee9c in TClass::GetClass(char const*, bool, bool, unsigned long, unsigned long) ()
   from /cvmfs/sft-nightlies.cern.ch/lcg/nightlies/dev4/Sun/ROOT/v6-24-00-patches/x86_64-centos7-gcc8-opt/lib/libCore.so
...

The unload() call here may free the Transaction T
(via IncrementalParser::deregisterTransaction) and return it to the pool.
But the code in findScope is protected by

    // The user wants to see the template instantiation, existing or not.
    // Here we might not have an active transaction to handle
    // the caused instantiation decl.
    // Also quickFindDecl could trigger deserialization of decls.
    Interpreter::PushTransactionRAII pushedT(m_Interpreter);

The dtor of this runs when we hit the return above --- and this will
end up releasing the same Transaction object to the pool.

This version of the code in findScope was introduced by this change:

commit 036133908aebab2c2e17e97c36923eed6ab87c5b
Author: Axel Naumann <[email protected]>
Date:   Thu Dec 17 18:41:51 2020 +0100

    [cling] In failed lookup, unload only decls produced, not existing fwd decl:
    
    Before, pre-existing fwd decls of specializations got unloaded.
...

As a workaround, this change makes the crash go away:

diff --git a/interpreter/cling/lib/Interpreter/Interpreter.cpp b/interpreter/cling/lib/Interpreter/Interpreter.cpp
index 9686a1dcff..444b11c09b 100644
--- a/interpreter/cling/lib/Interpreter/Interpreter.cpp
+++ b/interpreter/cling/lib/Interpreter/Interpreter.cpp
@@ -106,7 +106,8 @@ namespace cling {
   }
 
   void Interpreter::PushTransactionRAII::pop() const {
-    if (m_Transaction->getState() == Transaction::kRolledBack)
+    if (m_Transaction->getState() == Transaction::kRolledBack ||
+        m_Transaction->getState() == Transaction::kNumStates)
       return;
     IncrementalParser::ParseResultTransaction PRT
       = m_Interpreter->m_IncrParser->endTransaction(m_Transaction);

but this doesn't feel like a proper fix.

I apologize again for not having a standalone reproducer for this,
but i hope this is helpful anyway.

@krasznaa
Copy link
Contributor

Note that we observed the same type of crash in some other places as well in ATLAS, which could also be handled by cleaning up the dictionaries of the affected libraries. So the error has an upside as well. (Though a simple error message from ROOT would've been more welcome. 😛)

(I mainly just wrote this comment to get notifications about this issue's evolution... 😉)

@Axel-Naumann
Copy link
Member

What a fantastic bug report, thank you SO much, @scott-snyder ! I'll have it fixed in the coming days / hopefully tomorrow. I'll keep you posted. (Just stating the obvious: this is blocking v6.24/00.)

Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 30, 2021
When a Transaction is unloaded for which *also* a ScopedTransactionRAII
is waiting, the latter will potentially access a deleted Transaction.
It can be deleted due to the Pool being full. Even if it is not deleted
(as is the case in root-project#7657 )
it will get pushed into the Pool once by unload() and a second time by
the RAII!

The two options to track this case were full-blown ref counting or noting
that a Transaction is attached to an RAII scope. If that is the case,
freeing the Transaction must be skipped, as the RAII will take care of it.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 30, 2021
When a Transaction is unloaded for which *also* a ScopedTransactionRAII
is waiting, the latter will potentially access a deleted Transaction.
It can be deleted due to the Pool being full. Even if it is not deleted
(as is the case in root-project#7657 )
it will get pushed into the Pool once by unload() and a second time by
the RAII!

The two options to track this case were full-blown ref counting or noting
that a Transaction is attached to an RAII scope. If that is the case,
freeing the Transaction must be skipped, as the RAII will take care of it.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 31, 2021
When a Transaction is unloaded for which *also* a ScopedTransactionRAII
is waiting, the latter will potentially access a deleted Transaction.
It can be deleted due to the Pool being full. Even if it is not deleted
(as is the case in root-project#7657 )
it will get pushed into the Pool once by unload() and a second time by
the RAII!

The two options to track this case were full-blown ref counting or noting
that a Transaction is attached to an RAII scope. If that is the case,
freeing the Transaction must be skipped, as the RAII will take care of it.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 31, 2021
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 31, 2021
When a Transaction is unloaded for which *also* a ScopedTransactionRAII
is waiting, the latter will potentially access a deleted Transaction.
It can be deleted due to the Pool being full. Even if it is not deleted
(as is the case in root-project#7657 )
it will get pushed into the Pool once by unload() and a second time by
the RAII!

The two options to track this case were full-blown ref counting or noting
that a Transaction is attached to an RAII scope. If that is the case,
freeing the Transaction must be skipped, as the RAII will take care of it.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 31, 2021
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 31, 2021
When a Transaction is unloaded for which *also* a ScopedTransactionRAII
is waiting, the latter will potentially access a deleted Transaction.
It can be deleted due to the Pool being full. Even if it is not deleted
(as is the case in root-project#7657 )
it will get pushed into the Pool once by unload() and a second time by
the RAII!

The two options to track this case were full-blown ref counting or noting
that a Transaction is attached to an RAII scope. If that is the case,
freeing the Transaction must be skipped, as the RAII will take care of it.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Mar 31, 2021
@Axel-Naumann
Copy link
Member

@scott-snyder @krasznaa is there any chance that you can confirm #7767 (for v6-24-00-patches) or #7752 (for master) is resolving this for ATLAS?

@krasznaa
Copy link
Contributor

I'm not 100% sure about this, but #7767 does not seem to be ABI compatible with the current head of v6-24-00-patches. Or is it?

Testing an ABI compatible change, while a bit cumbersome, would be doable. But testing a change that's not ABI compatible would require rebuilding pretty much all our code. Including all the parts of LCG that depend on ROOT. That we are just not set up to be able to do. 😦

But Scott may have a different view...

@pcanal
Copy link
Member

pcanal commented Mar 31, 2021

It is not ABI compatible per se but for your purpose it is sufficiently so. The code in cling/llvm is only used/access from libCling which should only be accessed through the TIntepre ter interface (i.e. no user code should need to be rebuild).

@krasznaa
Copy link
Contributor

In that case Scott will probably try this out himself. But if he doesn't, I'll give it a go tomorrow. 😉

@scott-snyder
Copy link
Contributor Author

hi -

This fixes at least the specific crashes i was looking at here.
Thanks!

Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Apr 1, 2021
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Apr 1, 2021
Outer RAIIs might still reference the Transaction, and unload is
assuming that it owns the transaction and can delete it / put it
into the TransactionPool.

This fixes root-project#7657

We still need to track ownership as what has happened here (unload
of a Transaction held by an RAII) can happen again / elsewhere.
This will be addressed by a subsequent PR in master.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Apr 1, 2021
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Apr 1, 2021
Outer RAIIs might still reference the Transaction, and unload is
assuming that it owns the transaction and can delete it / put it
into the TransactionPool.

This fixes root-project#7657

We still need to track ownership as what has happened here (unload
of a Transaction held by an RAII) can happen again / elsewhere.
This will be addressed by a subsequent PR in master.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Apr 2, 2021
Outer RAIIs might still reference the Transaction, and unload is
assuming that it owns the transaction and can delete it / put it
into the TransactionPool.

This fixes root-project#7657

We still need to track ownership as what has happened here (unload
of a Transaction held by an RAII) can happen again / elsewhere.
This will be addressed by a subsequent PR in master.
Axel-Naumann added a commit to Axel-Naumann/root that referenced this issue Apr 2, 2021
Outer RAIIs might still reference the Transaction, and unload is
assuming that it owns the transaction and can delete it / put it
into the TransactionPool.

This fixes root-project#7657

We still need to track ownership as what has happened here (unload
of a Transaction held by an RAII) can happen again / elsewhere.
This will be addressed by a subsequent PR in master.

(cherry picked from commit 0202a4b)
@Axel-Naumann
Copy link
Member

I had to redo the PRs; I now have the following:

Could you let me know whether that also solves the issues you saw? Any one of them is enough. Thank you, @krasznaa @scott-snyder !

@scott-snyder
Copy link
Contributor Author

Yes, that also seems to fix them.
thanks!

@Axel-Naumann
Copy link
Member

Thank you very much, @scott-snyder - both for this excellent analysis of the problem and the your quick turnaround testing!

Axel-Naumann added a commit that referenced this issue Apr 5, 2021
a cheap way to notice what went wrong in #7657.
@Axel-Naumann Axel-Naumann reopened this Apr 5, 2021
Axel-Naumann added a commit that referenced this issue Apr 5, 2021
Outer RAIIs might still reference the Transaction, and unload is
assuming that it owns the transaction and can delete it / put it
into the TransactionPool.

This fixes #7657

We still need to track ownership as what has happened here (unload
of a Transaction held by an RAII) can happen again / elsewhere.
This will be addressed by a subsequent PR in master.

(cherry picked from commit 0202a4b)
FonsRademakers pushed a commit to root-project/cling that referenced this issue Apr 5, 2021
FonsRademakers pushed a commit to root-project/cling that referenced this issue Apr 5, 2021
Outer RAIIs might still reference the Transaction, and unload is
assuming that it owns the transaction and can delete it / put it
into the TransactionPool.

This fixes root-project/root#7657

We still need to track ownership as what has happened here (unload
of a Transaction held by an RAII) can happen again / elsewhere.
This will be addressed by a subsequent PR in master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment