Closed Bug 1446038 Opened 6 years ago Closed 2 years ago

Crash in XPCWrappedNative::CallMethod while starting the addon manager

Categories

(Core :: XPConnect, defect, P3)

59 Branch
All
Windows
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox59 + wontfix
firefox60 + wontfix
firefox61 - wontfix
firefox62 - wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix

People

(Reporter: philipp, Unassigned)

Details

(4 keywords)

Crash Data

This bug was filed from the Socorro interface and is
report bp-8a94a204-a751-433b-b2e9-df8f50180315.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1234
1 xul.dll XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:929
2 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:473
3 xul.dll InternalCall js/src/vm/Interpreter.cpp:522
4 xul.dll Interpret js/src/vm/Interpreter.cpp:3096
5 xul.dll js::RunScript js/src/vm/Interpreter.cpp:423
6 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:495
7 xul.dll InternalCall js/src/vm/Interpreter.cpp:522
8 xul.dll js::fun_call js/src/jsfun.cpp:1189
9 xul.dll js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:473

=============================================================

this startup crash signature is rising in 59.0 release builds on windows. the crash address is indicating a UAF situation in many cases.
Crash Signature: [@ XPCWrappedNative::CallMethod] → [@ XPCWrappedNative::CallMethod] [@ XPCConvert::NativeData2JS ]
Tracking this for 59 since it's a fairly high volume startup crash. 

mccr8, can you take a look? (since overholt is out on PTO)
Flags: needinfo?(continuation)
I put off looking into this because the top of the stack is just us calling into C++ from JS, but I noticed just now that below all of the stuff about running JS on the stack, there's a call to nsXREDirProvider::DoStartup(). Which matches up with comment 1 which says it is a startup crash. In fact, 68% of the crashes with this signature have this exact same proto signature.

The crash is on this call to observe:
    // Init the Extension Manager
    nsCOMPtr<nsIObserver> em = do_GetService("@mozilla.org/addons/integration;1");
    if (em) {
      em->Observe(nullptr, "addons-startup", nullptr);

Andrew, could you look at this? I'm not sure how easy it will be to investigate without a JS stack, but maybe you know of something that changed in this code. Thanks.
Component: XPConnect → Add-ons Manager
Flags: needinfo?(continuation) → needinfo?(aswan)
Product: Core → Toolkit
Summary: Crash in XPCWrappedNative::CallMethod → Crash in XPCWrappedNative::CallMethod while starting the addon manager
(In reply to Andrew McCreight [:mccr8] from comment #2)
> I put off looking into this because the top of the stack is just us calling
> into C++ from JS, but I noticed just now that below all of the stuff about
> running JS on the stack, there's a call to nsXREDirProvider::DoStartup().
> Which matches up with comment 1 which says it is a startup crash. In fact,
> 68% of the crashes with this signature have this exact same proto signature.
> 
> The crash is on this call to observe:
>     // Init the Extension Manager
>     nsCOMPtr<nsIObserver> em =
> do_GetService("@mozilla.org/addons/integration;1");
>     if (em) {
>       em->Observe(nullptr, "addons-startup", nullptr);
> 
> Andrew, could you look at this? I'm not sure how easy it will be to
> investigate without a JS stack, but maybe you know of something that changed
> in this code. Thanks.

This is just the call that triggers add-on manager startup. It's unfortunately not a lot to go on, since that triggers the loading of all sorts of toolkit stuff, and a half dozen or so system add-ons.

And, unfortunately, I don't see any useful clues in that stack... I wonder if we can find a cheap way to annotate crash reports with the name/interface of XPConnect methods that are being called...
Flags: needinfo?(aswan)
I'm going to mark this as sec-moderate, as the crash happens so early in startup, I doubt there's any web content running, so I can't imagine it is exploitable.
Keywords: sec-moderate
(In reply to Andrew McCreight [:mccr8] from comment #4)
> I'm going to mark this as sec-moderate, as the crash happens so early in
> startup, I doubt there's any web content running, so I can't imagine it is
> exploitable.

At that point in startup, there really can't be any web content running. Not unless we have something like a legacy extension running that makes sync HTTP requests very early in startup. Which shouldn't be possible in 59 release.
Group: core-security → toolkit-core-security
Not much we're likely going to be able to do for this on 59 at this point, but it's still happening often enough that we should try to figure out something for 60 if at all possible.
Group: toolkit-core-security → core-security
Component: Add-ons Manager → XPConnect
Product: Toolkit → Core
Group: core-security → dom-core-security
This doesn't seem actionable, so I don't think there's any point in relman tracking it. 
Andrew, do you think someone on your team might add diagnostic info (suggested in comment 3)?
Flags: needinfo?(overholt)
I'll take a look at this as I go through the "unactionable" JS crash list and see if there are any similarities.
Whiteboard: [#jsapi:crashes-retriage]
Thanks, Ted.
Flags: needinfo?(overholt)
I looked at a few minidumps and a lot of these are a mess. We seem to complete smash the stack in a number of cases. Sometimes we crash before the actual XPC call (but still after we start using method info and other virtual calls) and sometimes after.

One thing that looks odd to me is that I would expect crashes a few instructions after a call, should still have the expected return address on stack that CPU used to |ret| from the call, but I usually do not see this. It is unclear how we got to were we crash.

Often, further up the stack looks fine and has the data structures we expect either complete or with the lower-addressed (top-of-stack) words clobbered.

I'm at a loss for how these got so corrupted.
My best idea at this point is to annotate crash reports with a small memory region (in non-release builds), and copy the interface::method_name into it whenever we're about to make an XPC call. That should be pretty cheap relative to the total XPC call overhead, and should hopefully at least give us something to go on.
Given the reduction of the spike in reports here, and comment 5, I'm going to mark this P3.

Should we spin comment 11 out into a separate bug?
Flags: needinfo?(kmaglione+bmo)
Priority: -- → P3
(In reply to Andrew Overholt [:overholt] from comment #13)
> Should we spin comment 11 out into a separate bug?

I don't think so. It's not worth adding the overhead unless we need it to track this down, so if we're going to do it, it should probably be done as part of this bug.
Flags: needinfo?(kmaglione+bmo)
Still some crashes for 62/63. If we are going to add annotation as mentioned in comment 11 it would likely be for 64/65.

Ted, this has been around for a long time and wontfix for many releases. Do you know if this is actionable at this point?

Flags: needinfo?(tcampbell)

Crash rate seems to have dropped off. I don't see anything actionable here.

One thing that Nika pointed out to me is that the info described in Comment 11 is generally already in crash reports since it is on stack. We worked through one example and where able to determine the calling method. This was a one-off non-startup example so probably not representative.

With effort, we could look at stack locals in sample crashes and find names of methods and interfaces. I don't see enough crashes to justify myself doing the work right now.

Flags: needinfo?(tcampbell)
Keywords: stalled
Whiteboard: [#jsapi:crashes-retriage]

There are crashes with this signature, but I'm not seeing any UAFs.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.