[Crash][@ nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi]

RESOLVED FIXED in Firefox 40, Firefox OS v2.2

Status

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: ntroast, Assigned: mccr8)

Tracking

({crash})

unspecified
2.2 S10 (17apr)
ARM
Gonk (Firefox OS)
crash
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.2 fixed, b2g-master fixed)

Details

(Whiteboard: [b2g-crash][caf-crash 598][caf priority: p1][CR 817665])

Attachments

(11 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
We observed the following crash signature during testing.

[@ nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi | xpc_TryUnmarkWrappedGrayObject | nsObserverList::UnmarkGrayStrongObservers ]

Cafbot will upload the decoded minidump and extra file.
Created attachment 8587992 [details]
EXTRA file attachment -
Created attachment 8587993 [details]
decoded minidump -
NI Birunthan/Froyd, to start with as they seem to have touched this area of code or can help direct to right people.

Please note, this is a critical FXOS reqression impacting our critical release milestone and any help is appreciated.
Flags: needinfo?(nfroyd)
Flags: needinfo?(birunthan)
Uh.  Looks like we have a bogus nsIObserver pointer while we're notifying observers during a CC while we're minimizing memory?  ni?'ing mccr8 to see if the CC stack looks like something recognizable.
Flags: needinfo?(nfroyd) → needinfo?(continuation)
(Assignee)

Comment 5

3 years ago
I haven't seen anything like this in particular before.  It looks like we're QIing something and it crashes in there?  The crash address, 0x1000000, is odd.  Could the nsIObserver pointer be null?  I'd hope we'd be safe from totally bogus pointers, but maybe not.

poiru just did style clean up for this file.
Flags: needinfo?(continuation)
Flags: needinfo?(birunthan)
(Reporter)

Comment 6

3 years ago
Looks like all the ni? got cleared here. Bhavana, based on the information provided so far is there anyone else we can loop in here?
Flags: needinfo?(bbajaj)
(Assignee)

Comment 7

3 years ago
Nathan and I are looking into it.
Assignee: nobody → continuation
Flags: needinfo?(bbajaj)
(Assignee)

Comment 8

3 years ago
Is this a crash that happened a single time, or one that has come up more than once?
(Reporter)

Comment 9

3 years ago
This crash came up 4 times during the test run.
Exact same crash report (registers, crash address, etc.) each time?
(Reporter)

Comment 11

3 years ago
I can have cafbot upload all the decoded minidumps.
Created attachment 8588071 [details]
EXTRA file attachment -
Created attachment 8588072 [details]
decoded minidump -
Created attachment 8588073 [details]
EXTRA file attachment -
Created attachment 8588074 [details]
decoded minidump -
Created attachment 8588075 [details]
EXTRA file attachment -
Created attachment 8588076 [details]
decoded minidump -
(Assignee)

Comment 18

3 years ago
Thanks.  Yes, all of the crashes are on the same address, 0x1000000.

Updated

3 years ago
Whiteboard: [CR 817665]

Updated

3 years ago
Whiteboard: [CR 817665] → [caf priority: p1][CR 817665]

Updated

3 years ago
Whiteboard: [caf priority: p1][CR 817665] → [b2g-crash][caf-crash 598][caf priority: p1][CR 817665]

Updated

3 years ago
Keywords: crash
Created attachment 8588106 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.119
Created attachment 8588107 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.119
(Assignee)

Comment 22

3 years ago
I've been reading through this code but I haven't come up with anything in particular.

The best idea I've had is to write some code that looks for this value we keep crashing on, and print out the particular observer message thing we're looking for if it finds it, then crashes.  If we managed to hit that code, it would give us a better idea where things are going wrong, and, if it is related to the actual observer, a smaller amount of code to look at.

Comment 23

3 years ago
Do you want to provide a logging patch to identify the issue?
Flags: needinfo?(continuation)
(Assignee)

Comment 24

3 years ago
I'll work on a logging patch.

Benjamin, do you have any ideas?  As perhaps the person most familiar with nsObserverService.
Flags: needinfo?(benjamin)

Comment 25

3 years ago
Crash reason:  SIGSEGV
Crash address: 0x1000000

That's a totally suspicious crash address (it's the content of r0).

Not knowing much about b2g release engineering, do we have the disassembly of nsQueryInterface::operator() so that we know what is supposed to be in r0 at the time of this crash? And do we have a core which will show us whether 0x1000000 is a reasonable or crazy address?

I am automatically suspicious here also because this is inside of a hashtable enumeration: if anyone is modifying the observer hashtable from within this stack, then the enumeration is bad:

nsObserverList::UnmarkGrayStrongObservers
-> xpc_TryUnmarkWrappedGrayObject
Flags: needinfo?(benjamin)
(Assignee)

Comment 26

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> I am automatically suspicious here also because this is inside of a
> hashtable enumeration: if anyone is modifying the observer hashtable from
> within this stack, then the enumeration is bad:
> 
> nsObserverList::UnmarkGrayStrongObservers
> -> xpc_TryUnmarkWrappedGrayObject

Yeah, I suppose I should do address that before I go on more of a fishing expedition.  xpc_TryUnmarkWrappedGrayObject does a QI then an UnmarkGrayGCThing.  The latter is a JS engine thing that can't GC according to the static analysis, so that should be okay, but a QI could do anything, including triggering a GC, so that would be bad.

Updated

3 years ago
blocking-b2g: 2.2? → 2.2+
(Assignee)

Comment 27

3 years ago
Created attachment 8588758 [details] [diff] [review]
Be sure we aren't iterating over a hashtable while it is modified in nsObserverService::UnmarkGrayStrongObservers().

I have only tested this a little bit locally, but maybe this will help, if you want to try it out.

This copies out all strong observers into a single array, then does the unmarking iteration over that. I could certainly see a QI for a JS-implemented thing somehow triggering a GC and causing observer problems, though I wasn't able to trigger that locally.

We'll see how the try run goes:
 https://treeherder.mozilla.org/#/jobs?repo=try&revision=85ff1816a341
(Assignee)

Updated

3 years ago
Flags: needinfo?(continuation)
(Assignee)

Comment 28

3 years ago
The try run looks fine.  You could see if this patch fixes the crash.
Flags: needinfo?(ikumar)

Comment 29

3 years ago
Thanks.
Nick/Greg - can you please land it in our build.
Flags: needinfo?(ntroast)
Flags: needinfo?(ikumar)
Flags: needinfo?(ggrisco)
(Assignee)

Comment 30

3 years ago
Created attachment 8588813 [details] [diff] [review]
Be sure we aren't iterating over a hashtable while it is modified in nsObserverService::UnmarkGrayStrongObservers().

I just renamed a function and removed a comment.  This probably makes sense to land even if it doesn't fix this particular issue.
Attachment #8588813 - Flags: review?(nfroyd)
(Assignee)

Updated

3 years ago
Attachment #8588758 - Attachment is obsolete: true

Comment 31

3 years ago
(In reply to Inder from comment #29)
> Thanks.
> Nick/Greg - can you please land it in our build.

Ok, working on it.
Flags: needinfo?(ntroast)
Flags: needinfo?(ggrisco)
Comment on attachment 8588813 [details] [diff] [review]
Be sure we aren't iterating over a hashtable while it is modified in nsObserverService::UnmarkGrayStrongObservers().

Review of attachment 8588813 [details] [diff] [review]:
-----------------------------------------------------------------

I think making the commit comment a little more explicit, something like:

"It may be the case that the QI in xpc_TryUnmarkWrappedGrayObject could cause a GC, which might cause an observer to be destroyed, which might removeObserver() itself, which might mutate the hashtable that we're attempting to enumerate."

would be good.  I didn't fully grok what was going on until mccr8 made it clear to me on IRC.
Attachment #8588813 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 33

3 years ago
Updated the comment to be more explicit.  Thanks for the review.

https://hg.mozilla.org/integration/mozilla-inbound/rev/074e4fb089b1
Keywords: leave-open

Comment 34

3 years ago
Comment on attachment 8588813 [details] [diff] [review]
Be sure we aren't iterating over a hashtable while it is modified in nsObserverService::UnmarkGrayStrongObservers().

[Triage Comment]
Attachment #8588813 - Flags: approval-mozilla-b2g37+
Depends on: 1153373

Comment 39

3 years ago
Since this is landed on 2.2.
Mark the status as FIXED.

--
Keven
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
(Assignee)

Updated

3 years ago
Keywords: leave-open
status-b2g-v2.2: --- → fixed
status-b2g-master: --- → fixed
status-firefox38: --- → wontfix
status-firefox39: --- → wontfix
status-firefox40: --- → fixed
Target Milestone: --- → 2.2 S10 (17apr)
You need to log in before you can comment on or make changes to this bug.