Closed Bug 1671914 Opened 4 years ago Closed 8 months ago

Infinite recursion in the component manager with pending service

Categories

(Core :: XPCOM, defect)

Unspecified
All
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: gsvelto, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/9db15b80-17a2-489a-8788-3e52c0201016

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Top 10 frames of crashing thread:

0 XUL mozilla::dom::Document::HasRecentlyStartedForegroundLoads dom/base/Document.cpp:16919
1 XUL mozilla::MainThreadIdlePeriod::GetIdlePeriodHint xpcom/threads/MainThreadIdlePeriod.cpp:59
2 XUL mozilla::IdlePeriodState::GetIdleDeadlineInternal xpcom/threads/IdlePeriodState.cpp:68
3 XUL mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal xpcom/threads/nsThreadUtils.h:577
4 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1197
5 XUL NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:513
6 XUL nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:1517
7 XUL nsGetServiceByContractIDWithError::operator const xpcom/components/nsComponentManagerUtils.cpp:253
8 XUL nsCOMPtr_base::assign_from_gs_contractid_with_error xpcom/base/nsCOMPtr.cpp:91
9 XUL mozilla::dom::PerformanceMainThread::GetRandomTimelineSeed dom/performance/PerformanceMainThread.h:54

Low-volume NULL-pointer access. This does not seem a new crash since we've got reports going back six months with consistent stacks across all platforms.

I started looking at this.

This looks like it is infinite recursion. I'm not sure why it shows up as a null pointer.

  1. mozilla::MainThreadIdlePeriod::GetIdlePeriodHint(mozilla::TimeStamp*)
  2. mozilla::dom::Document::HasRecentlyStartedForegroundLoads()
  3. mozilla::dom::Performance::Now()
  4. mozilla::dom::PerformanceMainThread::GetRandomTimelineSeed()
  5. nsCOMPtr_base::assign_from_gs_contractid_with_error(nsGetServiceByContractIDWithError const&, nsID const&)
  6. nsGetServiceByContractIDWithError::operator()(nsID const&, void**) const
  7. nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**)
  8. NS_ProcessNextEvent(nsIThread*, bool)
  9. mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_4>::Run()
  10. mozilla::IdlePeriodState::GetIdleDeadlineInternal(bool, mozilla::detail::BaseAutoUnlock<mozilla::Mutex&> const&)

...and then it repeats.

I'm guessing that the transition from step 4 to 5 is in RelativeTimeline::GetRandomTimelineSeed(), on this line:
nsCOMPtr<nsIRandomGenerator> randomGenerator = do_GetService("@mozilla.org/security/random-generator;1", &rv);

I'm guessing that the transition from step 7 to 8 happens in nsComponentManagerImpl::GetServiceLocked(). I haven't looked at this code before, but I think what is happening is that we see that somebody else is trying to create the same service, so we spin the event loop to wait for them to finish. If the thread that is already trying to create the service is the current one, then we bail out.

The implementation of the random generator is nsRandomGenerator, which is constructed by a call to mozilla::psm::NSSConstructor<nsRandomGenerator>(nullptr, aIID, aResult). nsRandomGenerator is completely trivial, so I don't think this will do anything fancy.

I'm guessing that some other thread starts creating the random generator, and then after that fails or while it is happening, the main thread comes along and patiently waits. It has to be another thread because that bypasses the reentrance check. In the report in crash 0, the other threads looked idle to me.

I see a few issues. One is that the reentrance check fails to catch the same thread case, if another thread tried to create the service first.

The other is that nsComponentManagerImpl::GetServiceLocked() can return after it calls AddPendingService without calling RemovePendingService in the "Factory did not return an object but returned success" case, though I have no idea how that could actually happen for nsRandomGenerator, so maybe that's not the issue.

Assignee: nobody → continuation
Component: DOM: Navigation → XPCOM

This idle performance stuff could obviously detect the infinite recursion, but it looks like it should be the job of the component manager, so I've moved this bug to XPCOM.

Summary: Crash in [@ mozilla::dom::Document::HasRecentlyStartedForegroundLoads] → Infinite recursion in the component manager with pending service

Nika mentioned bug 1634065 which involves the same component manager loop spinning.

See Also: → 1634065

Nika even noticed the same issue with pending services in her patch review: "As a side note, this early return worries me a bit, as we'll never remove the pending service entry if we take it, and any other requesting threads will block indefinitely"

It looks like it was bug 684887 that moved the return before the remove.

I looked at a number of other crashes where the protosignature contains nsComponentManagerImpl, but I didn't see anything else that looked like this.

I'm still not sure how nsRandomGenerator could cause this issue. I think I'll wait to see how bug 1634065 shakes out, because that might also end up fixing the issue.

Assignee: continuation → nobody

Looks like the error case I noticed which you quoted in comment 4 is still around (https://searchfox.org/mozilla-central/rev/2d24d893669ad0fe8d76b0427b25369d35fcc19b/xpcom/components/nsComponentManager.cpp#979,988). This particular signature hasn't happened since October, but it might still be worth fixing that edge case.

Severity: -- → S3

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.