Infinite recursion in the component manager with pending service
Categories
(Core :: XPCOM, defect)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
References
Details
(Keywords: crash)
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/9db15b80-17a2-489a-8788-3e52c0201016
Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Top 10 frames of crashing thread:
0 XUL mozilla::dom::Document::HasRecentlyStartedForegroundLoads dom/base/Document.cpp:16919
1 XUL mozilla::MainThreadIdlePeriod::GetIdlePeriodHint xpcom/threads/MainThreadIdlePeriod.cpp:59
2 XUL mozilla::IdlePeriodState::GetIdleDeadlineInternal xpcom/threads/IdlePeriodState.cpp:68
3 XUL mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal xpcom/threads/nsThreadUtils.h:577
4 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1197
5 XUL NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:513
6 XUL nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:1517
7 XUL nsGetServiceByContractIDWithError::operator const xpcom/components/nsComponentManagerUtils.cpp:253
8 XUL nsCOMPtr_base::assign_from_gs_contractid_with_error xpcom/base/nsCOMPtr.cpp:91
9 XUL mozilla::dom::PerformanceMainThread::GetRandomTimelineSeed dom/performance/PerformanceMainThread.h:54
Low-volume NULL-pointer access. This does not seem a new crash since we've got reports going back six months with consistent stacks across all platforms.
Comment 1•4 years ago
|
||
I started looking at this.
This looks like it is infinite recursion. I'm not sure why it shows up as a null pointer.
- mozilla::MainThreadIdlePeriod::GetIdlePeriodHint(mozilla::TimeStamp*)
- mozilla::dom::Document::HasRecentlyStartedForegroundLoads()
- mozilla::dom::Performance::Now()
- mozilla::dom::PerformanceMainThread::GetRandomTimelineSeed()
- nsCOMPtr_base::assign_from_gs_contractid_with_error(nsGetServiceByContractIDWithError const&, nsID const&)
- nsGetServiceByContractIDWithError::operator()(nsID const&, void**) const
- nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**)
- NS_ProcessNextEvent(nsIThread*, bool)
- mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_4>::Run()
- mozilla::IdlePeriodState::GetIdleDeadlineInternal(bool, mozilla::detail::BaseAutoUnlock<mozilla::Mutex&> const&)
...and then it repeats.
I'm guessing that the transition from step 4 to 5 is in RelativeTimeline::GetRandomTimelineSeed()
, on this line:
nsCOMPtr<nsIRandomGenerator> randomGenerator = do_GetService("@mozilla.org/security/random-generator;1", &rv);
I'm guessing that the transition from step 7 to 8 happens in nsComponentManagerImpl::GetServiceLocked()
. I haven't looked at this code before, but I think what is happening is that we see that somebody else is trying to create the same service, so we spin the event loop to wait for them to finish. If the thread that is already trying to create the service is the current one, then we bail out.
The implementation of the random generator is nsRandomGenerator, which is constructed by a call to mozilla::psm::NSSConstructor<nsRandomGenerator>(nullptr, aIID, aResult)
. nsRandomGenerator is completely trivial, so I don't think this will do anything fancy.
I'm guessing that some other thread starts creating the random generator, and then after that fails or while it is happening, the main thread comes along and patiently waits. It has to be another thread because that bypasses the reentrance check. In the report in crash 0, the other threads looked idle to me.
I see a few issues. One is that the reentrance check fails to catch the same thread case, if another thread tried to create the service first.
The other is that nsComponentManagerImpl::GetServiceLocked() can return after it calls AddPendingService without calling RemovePendingService in the "Factory did not return an object but returned success" case, though I have no idea how that could actually happen for nsRandomGenerator, so maybe that's not the issue.
Comment 2•4 years ago
|
||
This idle performance stuff could obviously detect the infinite recursion, but it looks like it should be the job of the component manager, so I've moved this bug to XPCOM.
Comment 3•4 years ago
|
||
Nika mentioned bug 1634065 which involves the same component manager loop spinning.
Comment 4•4 years ago
|
||
Nika even noticed the same issue with pending services in her patch review: "As a side note, this early return worries me a bit, as we'll never remove the pending service entry if we take it, and any other requesting threads will block indefinitely"
Comment 5•4 years ago
|
||
It looks like it was bug 684887 that moved the return before the remove.
Comment 6•4 years ago
|
||
I looked at a number of other crashes where the protosignature contains nsComponentManagerImpl, but I didn't see anything else that looked like this.
Comment 7•4 years ago
|
||
I'm still not sure how nsRandomGenerator could cause this issue. I think I'll wait to see how bug 1634065 shakes out, because that might also end up fixing the issue.
Updated•4 years ago
|
Comment 8•2 years ago
|
||
Looks like the error case I noticed which you quoted in comment 4 is still around (https://searchfox.org/mozilla-central/rev/2d24d893669ad0fe8d76b0427b25369d35fcc19b/xpcom/components/nsComponentManager.cpp#979,988). This particular signature hasn't happened since October, but it might still be worth fixing that edge case.
Comment 9•8 months ago
|
||
Closing because no crashes reported for 12 weeks.
Description
•