Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deadlock in XrdRequestManager #12852

Merged
merged 2 commits into from
Jan 7, 2016

Conversation

Dr15Jones
Copy link
Contributor

Avoid a lock inversion which leads to a deadlock which can happen if an error occurs.

In order to avoid possibilities of deadlocks from the use of mutexes, the code dealing with disabled sources was changed to use thread safe containers.
We were getting a lock inversion between m_mutex and m_source_mutex in the case of an error. This is now avoided by avoiding calls to other functions while holding the m_source_mutex lock.
@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dr15Jones (Chris Jones) for CMSSW_8_0_X.

It involves the following packages:

Utilities/XrdAdaptor

@cmsbuild, @smuzaffar, @Dr15Jones, @davidlange6 can you please review it and eventually sign? Thanks.
@Martin-Grunewald, @wddgit this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

Following commands in first line of a comment are recognized

  • +1|approve[d]|sign[ed]: L1/L2's to approve it
  • -1|reject[ed]: L1/L2's to reject it
  • assign <category>[,<category>[,...]]: L1/L2's to request signatures from other categories
  • unassign <category>[,<category>[,...]]: L1/L2's to remove signatures from other categories
  • hold: L1/all L2's/release manager to mark it as on hold
  • unhold: L1/user who put this PR on hold
  • merge: L1/release managers to merge this request
  • [@cmsbuild,] please test: L1/L2 and selected users to start jenkins tests
  • [@cmsbuild,] please test with cms-sw/cmsdist#<PR>: L1/L2 and selected users to start jenkins tests using externals from cmsdist

@Dr15Jones
Copy link
Contributor Author

@bbockelm please review. Can m_lastSourceCheck and m_nextActiveSourceCheck be read/written by different threads simultaneously?

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/10404/console

@Dr15Jones
Copy link
Contributor Author

This is intended to fix such deadlocks as

Thread 6 (Thread 0x7fc148d3a700 (LWP 31499)):
#0  0x0000003a6860e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003a686095a3 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x0000003a68609487 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fc14ac75a89 in XrdAdaptor::RequestManager::getPrettyActiveSourceNames(std::vector<std::string, std::allocator<std::string> >&) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#4  0x00007fc14ac75b82 in XrdAdaptor::RequestManager::addConnections(cms::Exception&) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#5  0x00007fc14ac76332 in XrdAdaptor::RequestManager::OpenHandler::HandleResponseWithHosts(XrdCl::XRootDStatus*, XrdCl::AnyObject*, std::vector<XrdCl::HostInfo, std::allocator<XrdCl::HostInfo> >*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#6  0x00007fc14ad24dfc in ?? () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libXrdCl.so.2
#7  0x00007fc14ad080cf in XrdCl::XRootDMsgHandler::HandleResponse() () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libXrdCl.so.2
#8  0x00007fc14ad09202 in XrdCl::XRootDMsgHandler::HandleError(XrdCl::Status, XrdCl::Message*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libXrdCl.so.2
#9  0x00007fc14ad09ca8 in XrdCl::XRootDMsgHandler::Process(XrdCl::Message*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libXrdCl.so.2
#10 0x00007fc14aced20a in XrdCl::Stream::HandleIncMsgJob::Run(void*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libXrdCl.so.2
#11 0x00007fc14ad4dc16 in XrdCl::JobManager::RunJobs() () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libXrdCl.so.2

...

Thread 2 (Thread 0x7fc09a53f700 (LWP 31644)):
#0  0x0000003a6860e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003a686095a3 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x0000003a68609487 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fc14ac777ad in XrdAdaptor::RequestManager::OpenHandler::open() () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#4  0x00007fc14ac7ab85 in XrdAdaptor::RequestManager::checkSourcesImpl(timespec&, unsigned long) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#5  0x00007fc14ac7bb0c in XrdAdaptor::RequestManager::checkSources(timespec&, unsigned long) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#6  0x00007fc14ac7bccc in XrdAdaptor::RequestManager::handle(std::shared_ptr<XrdAdaptor::ClientRequest>) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#7  0x00007fc14ac67a5e in XrdFile::read(void*, unsigned long) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesXrdAdaptor.so
#8  0x00007fc18cbecf29 in StorageAccountProxy::read(void*, unsigned long) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesStorageFactory.so
#9  0x00007fc18cbd193a in IOInput::xread(void*, unsigned long) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libUtilitiesStorageFactory.so
#10 0x00007fc18c20ccce in TStorageFactoryFile::ReadBuffer(char*, int) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/lib/slc6_amd64_gcc493/libIOPoolTFileAdaptor.so
#11 0x00007fc1979ad85d in TBasket::ReadBasketBuffers(long long, int, TFile*) () from /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc493/cms/cmssw-patch/CMSSW_8_0_THREADED_X_2015-12-23-1100/external/slc6_amd64_gcc493/lib/libTree.so

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@bbockelm
Copy link
Contributor

bbockelm commented Jan 6, 2016

Looks good to me!

I'd still like to get the mutex out of getPrettyActiveSourceNames, getDisabledSourceNames, and prepareOpaqueString ... those are potential sources of pain in the future. However, this should solve the issue at hand.

@Dr15Jones
Copy link
Contributor Author

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 6, 2016

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_0_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @Degano, @smuzaffar

@Dr15Jones
Copy link
Contributor Author

@davidlange6 ping

@davidlange6
Copy link
Contributor

+1

cmsbuild added a commit that referenced this pull request Jan 7, 2016
@cmsbuild cmsbuild merged commit b7cd0d5 into cms-sw:CMSSW_8_0_X Jan 7, 2016
@Dr15Jones Dr15Jones deleted the threadSafeRequestManager branch January 8, 2016 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants