Closed Bug 1150916 Opened 9 years ago Closed 9 years ago

[Crash][@ nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi]

Categories

(Firefox OS Graveyard :: Stability, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.2+, firefox38 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S10 (17apr)
blocking-b2g 2.2+
Tracking Status
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: ntroast, Assigned: mccr8)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 598][caf priority: p1][CR 817665])

Attachments

(11 files, 1 obsolete file)

We observed the following crash signature during testing.

[@ nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi | xpc_TryUnmarkWrappedGrayObject | nsObserverList::UnmarkGrayStrongObservers ]

Cafbot will upload the decoded minidump and extra file.
NI Birunthan/Froyd, to start with as they seem to have touched this area of code or can help direct to right people.

Please note, this is a critical FXOS reqression impacting our critical release milestone and any help is appreciated.
Flags: needinfo?(nfroyd)
Flags: needinfo?(birunthan)
Uh.  Looks like we have a bogus nsIObserver pointer while we're notifying observers during a CC while we're minimizing memory?  ni?'ing mccr8 to see if the CC stack looks like something recognizable.
Flags: needinfo?(nfroyd) → needinfo?(continuation)
I haven't seen anything like this in particular before.  It looks like we're QIing something and it crashes in there?  The crash address, 0x1000000, is odd.  Could the nsIObserver pointer be null?  I'd hope we'd be safe from totally bogus pointers, but maybe not.

poiru just did style clean up for this file.
Flags: needinfo?(continuation)
Flags: needinfo?(birunthan)
Looks like all the ni? got cleared here. Bhavana, based on the information provided so far is there anyone else we can loop in here?
Flags: needinfo?(bbajaj)
Nathan and I are looking into it.
Assignee: nobody → continuation
Flags: needinfo?(bbajaj)
Is this a crash that happened a single time, or one that has come up more than once?
This crash came up 4 times during the test run.
Exact same crash report (registers, crash address, etc.) each time?
I can have cafbot upload all the decoded minidumps.
Thanks.  Yes, all of the crashes are on the same address, 0x1000000.
Whiteboard: [CR 817665]
Whiteboard: [CR 817665] → [caf priority: p1][CR 817665]
Whiteboard: [caf priority: p1][CR 817665] → [b2g-crash][caf-crash 598][caf priority: p1][CR 817665]
Keywords: crash
I've been reading through this code but I haven't come up with anything in particular.

The best idea I've had is to write some code that looks for this value we keep crashing on, and print out the particular observer message thing we're looking for if it finds it, then crashes.  If we managed to hit that code, it would give us a better idea where things are going wrong, and, if it is related to the actual observer, a smaller amount of code to look at.
Do you want to provide a logging patch to identify the issue?
Flags: needinfo?(continuation)
I'll work on a logging patch.

Benjamin, do you have any ideas?  As perhaps the person most familiar with nsObserverService.
Flags: needinfo?(benjamin)
Crash reason:  SIGSEGV
Crash address: 0x1000000

That's a totally suspicious crash address (it's the content of r0).

Not knowing much about b2g release engineering, do we have the disassembly of nsQueryInterface::operator() so that we know what is supposed to be in r0 at the time of this crash? And do we have a core which will show us whether 0x1000000 is a reasonable or crazy address?

I am automatically suspicious here also because this is inside of a hashtable enumeration: if anyone is modifying the observer hashtable from within this stack, then the enumeration is bad:

nsObserverList::UnmarkGrayStrongObservers
-> xpc_TryUnmarkWrappedGrayObject
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> I am automatically suspicious here also because this is inside of a
> hashtable enumeration: if anyone is modifying the observer hashtable from
> within this stack, then the enumeration is bad:
> 
> nsObserverList::UnmarkGrayStrongObservers
> -> xpc_TryUnmarkWrappedGrayObject

Yeah, I suppose I should do address that before I go on more of a fishing expedition.  xpc_TryUnmarkWrappedGrayObject does a QI then an UnmarkGrayGCThing.  The latter is a JS engine thing that can't GC according to the static analysis, so that should be okay, but a QI could do anything, including triggering a GC, so that would be bad.
blocking-b2g: 2.2? → 2.2+
I have only tested this a little bit locally, but maybe this will help, if you want to try it out.

This copies out all strong observers into a single array, then does the unmarking iteration over that. I could certainly see a QI for a JS-implemented thing somehow triggering a GC and causing observer problems, though I wasn't able to trigger that locally.

We'll see how the try run goes:
 https://treeherder.mozilla.org/#/jobs?repo=try&revision=85ff1816a341
Flags: needinfo?(continuation)
The try run looks fine.  You could see if this patch fixes the crash.
Flags: needinfo?(ikumar)
Thanks.
Nick/Greg - can you please land it in our build.
Flags: needinfo?(ntroast)
Flags: needinfo?(ikumar)
Flags: needinfo?(ggrisco)
I just renamed a function and removed a comment.  This probably makes sense to land even if it doesn't fix this particular issue.
Attachment #8588813 - Flags: review?(nfroyd)
Attachment #8588758 - Attachment is obsolete: true
(In reply to Inder from comment #29)
> Thanks.
> Nick/Greg - can you please land it in our build.

Ok, working on it.
Flags: needinfo?(ntroast)
Flags: needinfo?(ggrisco)
Comment on attachment 8588813 [details] [diff] [review]
Be sure we aren't iterating over a hashtable while it is modified in nsObserverService::UnmarkGrayStrongObservers().

Review of attachment 8588813 [details] [diff] [review]:
-----------------------------------------------------------------

I think making the commit comment a little more explicit, something like:

"It may be the case that the QI in xpc_TryUnmarkWrappedGrayObject could cause a GC, which might cause an observer to be destroyed, which might removeObserver() itself, which might mutate the hashtable that we're attempting to enumerate."

would be good.  I didn't fully grok what was going on until mccr8 made it clear to me on IRC.
Attachment #8588813 - Flags: review?(nfroyd) → review+
Updated the comment to be more explicit.  Thanks for the review.

https://hg.mozilla.org/integration/mozilla-inbound/rev/074e4fb089b1
Keywords: leave-open
Comment on attachment 8588813 [details] [diff] [review]
Be sure we aren't iterating over a hashtable while it is modified in nsObserverService::UnmarkGrayStrongObservers().

[Triage Comment]
Attachment #8588813 - Flags: approval-mozilla-b2g37+
Depends on: 1153373
Since this is landed on 2.2.
Mark the status as FIXED.

--
Keven
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → 2.2 S10 (17apr)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: