Crash in InvalidArrayIndex_CRASH | nsTArray_Impl<T>::operator[] | mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor

RESOLVED FIXED in Firefox 57

Status

()

P2
critical
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: MarcoZ, Assigned: aklotz)

Tracking

(Blocks: 1 bug, {crash, regression})

Trunk
mozilla58
x86
Windows 10
crash, regression
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 disabled, firefox56 disabled, firefox57 fixed, firefox58 fixed)

Details

(Whiteboard: [AV:Webroot SecureAnywhere][aes+], crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-f319733b-ba0b-42c9-a426-191dd2170206.
=============================================================
I got this crash after Nightly froze again while I was working in a tab. This time, I was actually waiting for Jet Airways to process a booking update. Nightly froze, and I was forced to right-click its taskbar icon and select Close Window. Crash happened immediately afterwards. Leaving this here FYI, and perhaps you see something significant in the stack.

The site works fine with E10S off.
So, The only way I can see an out of bounds array access there is for the array to be mutated while we iterate over it, which really shouldn't be happening.  The only way I can see for it to happen is that while GetProxyAccessibleInSubtree() is makeing remote com calls to the content process the parent process handles IPC messages.  Aron does that seem like something that can be happening?
Flags: needinfo?(aklotz)
(Assignee)

Updated

2 years ago
Flags: needinfo?(aklotz)
Whiteboard: aes+
(Assignee)

Comment 2

2 years ago
(In reply to Trevor Saunders (:tbsaunde) from comment #1)
> So, The only way I can see an out of bounds array access there is for the
> array to be mutated while we iterate over it, which really shouldn't be
> happening.  The only way I can see for it to happen is that while
> GetProxyAccessibleInSubtree() is makeing remote com calls to the content
> process the parent process handles IPC messages.  Aron does that seem like
> something that can be happening?

I wouldn't expect that to happen given that the parent main thread is blocked on a remote COM call. ie, it shouldn't be able to dispatch any incoming IPDL while waiting for a reply from COM.

OTOH, since the main thread is in the single-threaded apartment, COM is continuously pumping messages while waiting for that reply. If we handle a message in such a way that we inadvertently trigger a nested event loop that ends up dispatching IPDL, then I guess it could happen.
really low volume crash, removing from our tracking list.
Whiteboard: aes+
Crash Signature: [@ InvalidArrayIndex_CRASH | nsTArray_Impl<T>::operator[] | mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor] → [@ InvalidArrayIndex_CRASH | nsTArray_Impl<T>::operator[] | mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor] [@ InvalidArrayIndex_CRASH | mozilla::a11y::AccessibleWrap::GetRemoteIAccessibleFor ]
(In reply to alexander :surkov from comment #4)
> Marco, do you have steps to reproduce?

Unfortunately not. I only got this once directly before I filed it. Haven't seen it happening since.

Comment 6

a year ago
the signature is regressing in volume during the 57.0b cycle.

one striking correlation is (61.88% in signature vs 00.34% overall) Module "WRusr.dll" = true
status-firefox56: --- → disabled
status-firefox57: --- → affected
status-firefox58: --- → affected
status-firefox-esr52: --- → disabled
Keywords: regression
Whiteboard: [AV:Webroot SecureAnywhere]
Currently #17 in the browser crash list. Wondering if there's something we can do here, this looks like some sort of race - 

https://crash-stats.mozilla.com/report/index/71c5d4c2-ea2a-4e73-aafa-1dc120171023
Whiteboard: [AV:Webroot SecureAnywhere] → [AV:Webroot SecureAnywhere][aes+]
(Assignee)

Comment 8

a year ago
Frame 36 is interesting, looks like COM reentry caused a mutation of the array in question.
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #2)
> (In reply to Trevor Saunders (:tbsaunde) from comment #1)
> > So, The only way I can see an out of bounds array access there is for the
> > array to be mutated while we iterate over it, which really shouldn't be
> > happening.  The only way I can see for it to happen is that while
> > GetProxyAccessibleInSubtree() is makeing remote com calls to the content
> > process the parent process handles IPC messages.  Aron does that seem like
> > something that can be happening?
> 
> I wouldn't expect that to happen given that the parent main thread is
> blocked on a remote COM call. ie, it shouldn't be able to dispatch any
> incoming IPDL while waiting for a reply from COM.
> 
> OTOH, since the main thread is in the single-threaded apartment, COM is
> continuously pumping messages while waiting for that reply. If we handle a
> message in such a way that we inadvertently trigger a nested event loop that
> ends up dispatching IPDL, then I guess it could happen.

Any way to guard this?
Priority: P3 → P2
(Assignee)

Comment 10

a year ago
Jamie said that he was going to take a look at this.
Assignee: nobody → jteh
Status: NEW → ASSIGNED
This crash has a high correlation with Webroot anti-virus so it may not be 100% our fault:

(60.87% in signature vs 00.39% overall) Module "WRusr.dll" = true [74.29% vs 02.36% if platform_version = 10.0.15063]
(Assignee)

Comment 12

a year ago
I'm going to take this since we need to land something ASAP for Beta. A band-aid fix is simple enough...
Assignee: jteh → aklotz
(Assignee)

Comment 13

a year ago
This patch isn't perfect, but it's a low-risk fix that is acceptable to beta. Since GetProxiedAccessibleInSubtree makes a remote COM call, it is possible for something to reenter and mutate the remoteDocs array while that call is pending.

By moving the length check into the loop's condition, we ensure that we always verify that the length is still valid.

I think a better long-term fix involves a bunch of stuff with iterators which I don't want to introduce for such a tight timeline.
Attachment #8923571 - Flags: review?(yzenevich)
(Assignee)

Comment 14

a year ago
Comment on attachment 8923571 [details] [diff] [review]
Always re-check the array length

Actually, Jamie, would you mind looking at this?
Attachment #8923571 - Flags: review?(yzenevich) → review?(jteh)
Comment on attachment 8923571 [details] [diff] [review]
Always re-check the array length

Nice! Please add a brief comment explaining that we're doing this to deal with the array length changing due to COM re-entry. I'm concerned that someone down the track might think this was a mistake and try to optimise it out. :) Also, don't forget to change the r= in the commit message.
Attachment #8923571 - Flags: review?(jteh) → review+
(Assignee)

Comment 16

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/1b310709a1998f913245e2aafe17fe3c95b5906e
Bug 1336971: Ensure that we always re-examine the length of the top-level remote doc array to pick up any changes due to mutation; r=Jamie

Comment 17

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/1b310709a199
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox58: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Please request Beta approval on this when you get a chance.
Flags: needinfo?(aklotz)
(Assignee)

Comment 19

a year ago
Comment on attachment 8923571 [details] [diff] [review]
Always re-check the array length

Approval Request Comment
[Feature/Bug causing the regression]: Windows a11y on e10s
[User impact if declined]: crashes
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: Really simple fix
[String changes made/needed]: None
Flags: needinfo?(aklotz)
Attachment #8923571 - Flags: approval-mozilla-beta?
Comment on attachment 8923571 [details] [diff] [review]
Always re-check the array length

Seems low risk, crash fix with decent volume on beta, beta57+
Attachment #8923571 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 21

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/05e050d6c868
status-firefox57: affected → fixed
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #19)
> [Is this code covered by automated tests?]: Yes
> [Has the fix been verified in Nightly?]: Yes
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on Aaron's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.