Closed Bug 1528620 Opened 5 years ago Closed 2 years ago

Crash in [@ mozilla::dom::TimeoutExecutor::MaybeExecute]

Categories

(Core :: DOM: Core & HTML, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- wontfix
firefox66 --- fix-optional
firefox67 --- fix-optional

People

(Reporter: calixte, Unassigned)

References

(Blocks 1 open bug)

Details

(5 keywords)

Crash Data

This bug is for crash report bp-c49282ab-903e-4965-b4bd-ea2180190216.

Top 10 frames of crashing thread:

0 xul.dll void mozilla::dom::TimeoutExecutor::MaybeExecute dom/base/TimeoutExecutor.cpp:177
1 xul.dll nsresult mozilla::dom::TimeoutExecutor::Run dom/base/TimeoutExecutor.cpp:231
2 xul.dll nsresult mozilla::ThrottledEventQueue::Inner::Executor::Run xpcom/threads/ThrottledEventQueue.cpp:76
3 xul.dll nsresult mozilla::SchedulerGroup::Runnable::Run xpcom/threads/SchedulerGroup.cpp:292
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1162
5 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:474
6 xul.dll void mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:88
7 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
9 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137

There are 15 crashes (from 7 installations) in nightly 67 starting with buildid 20190216093716.
The moz_crash_reason is always: MOZ_RELEASE_ASSERT(timeout->mFiringIndex > mLastFiringIndex) which has been added in [1].
For 10 of the crashes, the useragent locale is th.

[1] https://hg.mozilla.org/mozilla-central/rev?node=f8058a73d119

Flags: needinfo?(rjesup)

Thanks! This was what making that assertion a DIAGNOSTIC_ASSERT was meant to do (and it should be DIAGNOSTIC, not RELEASE...I'll check)

Flags: needinfo?(rjesup)

bc: FYI.

There are some repeated crashes by a few individual installs, separated in time enough that they aren't mistakes in reporting. They're on unsurprising sites (facebook, etc) so I wonder if there's an extension involved, or something else that would make specific installations vulnerable. 19 crashes in a day and a half (with less installs) means it's definitely not easy to trigger. I'm going to let this ride at least through the weekend; perhaps land something to dump how invalid the firing index is (make sure it wasn't trashed or something).

Flags: needinfo?(bob)

Interesting: right now (27 crashes) we have ~60% of them with <1min uptime. This implies it's on tab restoration, though not guaranteed. Some are much longer, though, with one >1 hour. No unusual patterns for extensions, though 22/27 have ublock.

I have a number of opt crashes from over the weekend. I've sent you a log. Let me know what else I can do to help.

Flags: needinfo?(bob)

Here are the correlations from Nightly, specifically the addons - Also note 80% of the crashes are people running Windows insider builds:

(43.64% in signature vs 01.80% overall) Addon "No Coin" = true
(41.82% in signature vs 00.46% overall) Addon "ShopBack Cashback Button" = true
(41.82% in signature vs 00.46% overall) Addon "Add to Buyee" = true
(41.82% in signature vs 00.46% overall) Addon "{d263384e-1c00-4dce-bedf-22f8f7dbd1c1}" = true
(41.82% in signature vs 00.46% overall) Addon "Show Facebook Computer Vision Tags" = true
(41.82% in signature vs 00.46% overall) Addon "Dealcha! ช้อปออนไลน์ได้เงินคืน" = true
(41.82% in signature vs 00.46% overall) Addon "mmbfionaddihoimkpiebodbgfmhjolkf@chrome-store-foxified--687092732" = true
(41.82% in signature vs 00.48% overall) Addon "{d1646fcf-76ad-49c5-b8b2-e496e9b71189}" = true
(41.82% in signature vs 00.48% overall) Addon "TubeBuddy Plugin" = true
(41.82% in signature vs 00.48% overall) Addon "YouTube Stop AutoPlay Next" = true
(41.82% in signature vs 00.52% overall) Addon "Restartfirefox@vazextension.com" = true
(41.82% in signature vs 00.54% overall) Addon "Location Guard" = true
(41.82% in signature vs 00.66% overall) Addon "Webmail Ad Blocker" = true
(41.82% in signature vs 00.68% overall) Addon "enable-right-click-and-copy@afnankhan" = true
(41.82% in signature vs 00.76% overall) Addon "fbpElectroWebExt@fbpurity.com" = true
(41.82% in signature vs 00.90% overall) Addon "Text Link" = true
(41.82% in signature vs 00.92% overall) Addon "Share Backported" = true
(41.82% in signature vs 01.00% overall) Addon "No Coin" Version = 0.4.14
(45.45% in signature vs 11.88% overall) Module "libGLESv2.dll" = true [55.81% vs 12.83% if platform_version = 10.0.17763]
(41.82% in signature vs 01.20% overall) Addon "LeechBlock NG" = true
(41.82% in signature vs 01.24% overall) Addon "@min-vid" = true
(41.82% in signature vs 01.34% overall) Addon "mozilla_cc3@internetdownloadmanager.com" Version = 6.32.3
(43.64% in signature vs 03.17% overall) Addon "mozilla_cc3@internetdownloadmanager.com" = true
(41.82% in signature vs 01.64% overall) Addon "User-Agent Switcher" = true
(41.82% in signature vs 01.66% overall) Addon "MEGA" = true
(41.82% in signature vs 01.72% overall) Addon "Enhanced Steam" = true
(41.82% in signature vs 01.80% overall) Addon "Smart HTTPS" = true
(41.82% in signature vs 01.84% overall) Addon "Image Search Options" = true
(43.64% in signature vs 07.94% overall) Module "sxs.dll" = true [55.81% vs 13.92% if platform_version = 10.0.17763]
(80.00% in signature vs 37.66% overall) Module "Windows.UI.dll" = true [100.0% vs 70.38% if platform_version = 10.0.17763]
(41.82% in signature vs 02.28% overall) Addon "Disconnect" = true
(76.36% in signature vs 38.88% overall) Addon "uBlock Origin" = true
(41.82% in signature vs 02.83% overall) Addon "Firefox Color" = true

Priority: -- → P2

Is this a duplicate of bug 1527414?

Flags: needinfo?(rjesup)

The short spike of breakpoint crashes was the same as bug 1527414. There is one spurt of EXEC crashes in 66b13, but they're all from a single installation.

However, the older crashes here (65 and before) show that we have an ongoing sec issue, even one clear UAF:
https://crash-stats.mozilla.com/report/index/1ddcfc8b-5394-44e9-a191-64b790190201
(there are 3 more in the last 3 months; this one and one other are from the same installation, a day apart)

Most of the failures are on mOwner->RunTimeout(); often READ failures. Implies an issue with mOwner.

Group: core-security
Flags: needinfo?(rjesup)

I looked at this a little. mOwner is indeed a weak pointer, but AFAICT it is a weak pointer to TimeoutManager, and ~TimeoutManager clears the pointer so I think that can't be it. Maybe we're calling a method on a field somewhere, and the method ends up clearing the field? I can't tell where that would be from the stack.

Group: core-security → dom-core-security

Yeah, assuming both are always on mainthread (it's not asserted, but looks to be the case) - mTimeoutManager is part of the nsPIDOMWindow.

Failures are almost always on the mOwner-> call that I can see.

Component: DOM → DOM: Core & HTML

What's our next step here?

Flags: needinfo?(rjesup)

FWIW, I was also looking at this, various kungfuDeathGrip-type of usage and such, and couldn't find the reason for this.
Timer callback should be kept alive while calling it, and RunTimeouts keeps window alive etc.

(In reply to Olli Pettay [:smaug] from comment #11)

FWIW, I was also looking at this, various kungfuDeathGrip-type of usage and such, and couldn't find the reason for this.
Timer callback should be kept alive while calling it, and RunTimeouts keeps window alive etc.

is it reasonable that we mark this "stalled" then?

I think so. I did ask jesup to take another look too (but IIRC he is on vacation right now).

The stack today reads as follows:

we have an EXCEPTION_ACCESS_VIOLATION_READ on address 0xffffffffffffffff , probably while reading TimeDuration TimeoutExecutor::mAllowedEarlyFiringTime, member of a TimeoutExecutor instance (which is also an instance of nsITimerCallback). We arrived here through callbackDuringFire.mCallback.i->Notify(mITimer);, instance of a callback union defined for this case as: nsITimerCallback* MOZ_OWNING_REF i; and most probably with i pointing to an already freed (and overwritten) portion of memory.

As we use raw (but annotated) pointers in that callback union, I assume that we see a UAF on the TimerExecutor object, though it might have happened even earlier in the pointer chain that leads there. I am not sure, if the intended behavior here is that the sender of such an event assures the survival of all associated objects here, which would sound fragile to me (and I ignore who is the sender in this case)?

Reading a bit further, it seems that a TimeoutExecutor can be created only by a TimeoutManager, which in turn is bound to a nsPIDOMWindowInner instance. Interestingly there I read:

  // The executor is specific to the nsGlobalWindow/TimeoutManager, but it
  // can live past the destruction of the window if its scheduled.  Therefore
  // it must be a separate ref-counted object.
  RefPtr<TimeoutExecutor> mExecutor;
  // For timeouts run off the idle queue
  RefPtr<TimeoutExecutor> mIdleExecutor;

So "it can live past the destruction of the window if it's scheduled". But I see only very few RefPtr who could assure this (two of them surround just the dispatch but not the execution) and actually I wonder, if there is any object with a longer lifetime than the window where we could attach it if not the scheduled event object itself.

:smaug, do you see something more actionable now or should we keep it stalled?

Flags: needinfo?(bugs)
Flags: needinfo?(rjesup)

There are still crashes with this signature, but they are different from the one described in comment 0, and they mostly look like random junk, so I'm going to close this.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bugs)
Resolution: --- → INCOMPLETE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.