Closed Bug 1270916 Opened 3 years ago Closed 3 years ago

Crash in IPCError-browser | (msgtype=0x2C0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri

Categories

(Core :: Disability Access APIs, defect, P2, critical)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 + fixed
firefox53 --- fixed

People

(Reporter: kbrosnan, Assigned: tbsaunde)

References

(Depends on 4 open bugs, Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: aes+)

Crash Data

Attachments

(13 files)

3.31 KB, patch
davidb
: review+
Details | Diff | Splinter Review
2.06 KB, patch
davidb
: review+
Details | Diff | Splinter Review
2.45 KB, patch
davidb
: review+
Details | Diff | Splinter Review
2.73 KB, patch
davidb
: review+
Details | Diff | Splinter Review
1.54 KB, patch
davidb
: review+
Details | Diff | Splinter Review
4.56 KB, patch
davidb
: review+
Details | Diff | Splinter Review
1.67 KB, patch
davidb
: review+
Details | Diff | Splinter Review
3.33 KB, patch
davidb
: review+
Details | Diff | Splinter Review
22.91 KB, patch
davidb
: review+
Details | Diff | Splinter Review
1.86 KB, patch
davidb
: review+
Details | Diff | Splinter Review
1.26 KB, patch
davidb
: review+
Details | Diff | Splinter Review
18.79 KB, patch
davidb
: review+
surkov
: feedback-
Details | Diff | Splinter Review
1.50 KB, patch
davidb
: review+
surkov
: feedback+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-6c34c50b-23e3-4ca4-8ea2-91c142160506.
=============================================================

Crash in private browsing sometimes.
Component: IPC → Disability Access APIs
(In reply to Kevin Brosnan [:kbrosnan] from comment #0)
> This bug was filed from the Socorro interface and is 
> report bp-6c34c50b-23e3-4ca4-8ea2-91c142160506.
> =============================================================
> 
> Crash in private browsing sometimes.

do you have some kind of str?

There's not really much to go on here, TabParent::RecvPDocAccessibleConstructor returned false for some reason, but there's a couple reasons that could happen, and without any real data its hard to know which happened.  Even if we knew which it was odds are that would just be exposing some bug in what the child process is doing, and we'd need to look at the series of messages coming from the child process.
Force enable e10s on a touch screen W10 machine. Chance it could be related to fc67f24d-2726-4247-acf6-7dd4b2160504 / [@ mozilla::a11y::DocAccessibleParent::RecvShowEvent ]

I was seeing this crash around once a day. As of right now it has been ~3 days since I've seen the crash.
Would it help to turn those different failure cases into different NS_RUNTIMEABORT calls, perhaps with additional information in diagnostics (via nsPrintfCString or via crash report annotations), to distinguish which case people are hitting here?

(Note that that would change the crash signature.  Maybe even change a child process crash into a parent process crash???  But it would give you more information...)
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #3)
> Would it help to turn those different failure cases into different
> NS_RUNTIMEABORT calls, perhaps with additional information in diagnostics
> (via nsPrintfCString or via crash report annotations), to distinguish which
> case people are hitting here?
> 
> (Note that that would change the crash signature.  Maybe even change a child
> process crash into a parent process crash???  But it would give you more
> information...)

I've considered that, and it might help, but I'm afraid we need to know enough about what's happening in the child it'll not be that useful.  So for now I'd rather spend time fixing bug 1272706 and anything similar we can quickly find by adding more tests of e10s a11y.  Its very believable that bug causes these errors or at least some of them.
Flags: needinfo?(tbsaunde+mozbugs)
In case it helps:
* A comment from one crash: "Content gradually got slower and slower until crash"
* This URL seems to be common: https://www.torrentday.com/browse.php
Duplicate of this bug: 1195940
Whiteboard: aes-win
Whiteboard: aes-win → aes+
Priority: -- → P2
I can repro this consistently on my Surface Book. Since it is holding up my delivery of a11y+e10s builds, I'll take it. :-)
Assignee: nobody → aklotz
Blocks: 1258839
I can't repro this anymore :-(
Assignee: aklotz → nobody
No longer blocks: 1258839
Crash volume for signature 'IPCError-browser | (msgtype=0x2C0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri':
 - nightly (version 50): 296 crashes from 2016-06-06.
 - aurora  (version 49): 281 crashes from 2016-06-07.
 - beta    (version 48): 218 crashes from 2016-06-06.
 - release (version 47): 1 crash from 2016-05-31.
 - esr     (version 45): 0 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       4      33      45      50      51      74      21
 - aurora       31      20      32      48      58      48      30
 - beta         46      19      14      13      52      26      30
 - release       0       0       0       1       0       0       0
 - esr           0       0       0       0       0       0       0

Affected platforms: Windows, Mac OS X, Linux
This is on 48 beta so it' likely show up in release. It's not tied to accessibility clients either, but is in the accessibility code. David is this something your team can look at? Is it something we should be concerned about?
Flags: needinfo?(dbolter)
FWIW, this is not a top crasher currently.
(In reply to Jim Mathies [:jimm] from comment #10)
> This is on 48 beta so it' likely show up in release. It's not tied to
> accessibility clients either, but is in the accessibility code. David is
> this something your team can look at? Is it something we should be concerned
> about?

Comment 2 STR is "Force enable e10s on a touch screen W10 machine", will this be something to expect users to do on Release? I think we're counting on W10 touch screen users to quietly use single process right?

It is definitely something we want to fix for 51/a11y-e10s ship.
Flags: needinfo?(dbolter)
(In reply to David Bolter [:davidb] from comment #12)
> (In reply to Jim Mathies [:jimm] from comment #10)
> > This is on 48 beta so it' likely show up in release. It's not tied to
> > accessibility clients either, but is in the accessibility code. David is
> > this something your team can look at? Is it something we should be concerned
> > about?
> 
> Comment 2 STR is "Force enable e10s on a touch screen W10 machine", will
> this be something to expect users to do on Release? I think we're counting
> on W10 touch screen users to quietly use single process right?
> 
> It is definitely something we want to fix for 51/a11y-e10s ship.

I'm not sure. I guess these numbers could reflect forced enable users. They seem a bit high to me, hard to say.

- beta         46      19      14      13      52      26      30
This is the #3 (with just 23 crashes) in the top crasher list for 48 release with e10s enabled: https://crash-stats.mozilla.com/search/?product=Firefox&version=48.0&dom_ipc_enabled=%21__null__&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature.

There's a very small number of crashes on 48 release with e10s enabled for now, so I'm not sure whether this information is interesting or not.
Helping this stay on Trevor's radar.
Assignee: nobody → tbsaunde+mozbugs
Crash Signature: [@ IPCError-browser | (msgtype=0x2C0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] → [@ IPCError-browser | (msgtype=0x2C0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | (msgtype=0x2E0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri]
Duplicate of this bug: 1293555
I force-enable e10s on my Windows 10 laptop. I hit this PDocAccessibleConstructor multiple times per day (or per hour sometimes) in Nightly 51, but I seem to rarely hit it in Aurora 50.
Duplicate of this bug: 1303728
[Tracking Requested - why for this release]: I would like to ship proper APZ touch support in 52 which requires e10s. The accessibility code needs to not crash with e10s enabled on touch devices for a good user experience.
Tracking 52+ for the reason in Comment 20.
Hi :tbsaunde,
can you help shed some light here?
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to Gerry Chang [:gchang] from comment #22)
> Hi :tbsaunde,
> can you help shed some light here?

I was under the impression, and my searching crash-stats seems to agree that this crash has disappeared for unclear reasons.  So I'm not sure there's something to do here?
Flags: needinfo?(tbsaunde+mozbugs)
I think the signature morphs. Best to search on "Msg_PDocAccessibleConstructor".
Crash Signature: [@ IPCError-browser | (msgtype=0x2C0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | (msgtype=0x2E0004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] → [@ IPCError-browser | PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler] [@ IPCError-browser | (msgtype=0x300004,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPC…
Hi David,
From the first 2 signatures, I don't see any crashes on 51, does that mean it's gone for 51?
(In reply to David Bolter [:davidb] from comment #26)
> STR from bug 1307414 comment 0 is
> 
> 1) Go to http://theinspirationroom.com/daily/2016/samsung-touchable-ink/
> 2) Tab crashes

FWIW I filed the other bug originally, but I can not reproduce anymore. This seems to match Aaron's observation for reproducing in comment 7 and comment 8.

Maybe this was fixed due to some other change.
I'll mark 51 as wontfix. Feel free to change it if hit the crash again.
Crash Signature: ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] → ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler ]
Trevor please give this priority.
Flags: needinfo?(dbolter) → needinfo?(tbsaunde+mozbugs)
Crash Signature: ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler ] → ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler]
Crash Signature: ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler] → ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri]
#4 top crash on Windows for Nightly 20161028030204.
Crash Signature: ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] → ,name=PBrowser::Msg_PDocAccessibleConstructor) Processing error: message was deseri] [@ IPCError-browser | PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler ]
Double checking - I believe e10s + a11y + touchscreen is aimed at 52 and should be disabled by default on beta 51. Is that correct? Thanks.
Correct.
#2 crash in Nightly 20161104030212 with 204 occurrences.
Blocks: 1258839
#2 crash in Nightly with 237 occurrences.
Blocks: 1314707
Soon we will want to operate on either a reorder event or a mutation event.
Attachment #8809506 - Flags: review?(dbolter)
Soon we will use the list to track the order of the events.
Attachment #8809508 - Flags: review?(dbolter)
(In reply to Trevor Saunders (:tbsaunde) from comment #40)
> Created attachment 8809508 [details] [diff] [review]
> allow keeping AccTreeMutation in a list
> 
> Soon we will use the list to track the order of the events.

Is it something coming in replace of EventTree or it it supposed to work in parallel?

Can you share your thinking what makes us crashing and how the approach will address it.
Soon we will want to know if events should be emitted independt of EventTrees.
Attachment #8809597 - Flags: review?(dbolter)
They are pretty useless wrappers, and in the future it will be useful to have
  access to the event object independent of the EventTree.
Attachment #8809598 - Flags: review?(dbolter)
Mutation events are kept in a queue before firing.   The queue is only
coalesced when necessary, at present this is when queueing a hide event, or
just before firing all the events.  It may be possible to do without the
former, but that is left as future work.  The state of what types of events an
accessible is a target of is stored in the accessible.  Combining that with a
map from accessible and type pairs to events we can quickly remove unnecessary
events from the queue when we need to coalesce.
Attachment #8809707 - Flags: review?(dbolter)
This ensures that if creating the sub tree creates events to fire they will go
before the show event for the root of the tree.  It is fine to fire show events
for the subtree before the root because they will just get coalesced away
anyway.  However it is important that any hide events come before the hidden
subtree appears in the new tree.
Attachment #8809708 - Flags: review?(dbolter)
Since we will fire events in the correct order we can use the index of the
event target at the time the event is fired.  This protects from weird cases
where the target is inserted, and then children before the target are removed.
Attachment #8809709 - Flags: review?(dbolter)
This puts events in the queue instead of the event tree, and then fires them
based on the queue.  Some tests need to be adjusted to make sure they check
constraints on event order correctly.
Attachment #8809710 - Flags: review?(dbolter)
Now that these pass we can renabled them.
Attachment #8809711 - Flags: review?(dbolter)
(In reply to alexander :surkov from comment #41)
> (In reply to Trevor Saunders (:tbsaunde) from comment #40)
> > Created attachment 8809508 [details] [diff] [review]
> > allow keeping AccTreeMutation in a list
> > 
> > Soon we will use the list to track the order of the events.
> 
> Is it something coming in replace of EventTree or it it supposed to work in
> parallel?

the commit messages should explain that, and if they don't make it clear what they do then I'll fix that.

> Can you share your thinking what makes us crashing and how the approach will
> address it.

I forget exactly, but more or less the same issue that always causes this, bogus event ordering or data, so don't do that, and its fixed.
Flags: needinfo?(tbsaunde+mozbugs)
(In reply to Trevor Saunders (:tbsaunde) from comment #51)
> (In reply to alexander :surkov from comment #41)
> > (In reply to Trevor Saunders (:tbsaunde) from comment #40)
> > > Created attachment 8809508 [details] [diff] [review]
> > > allow keeping AccTreeMutation in a list
> > > 
> > > Soon we will use the list to track the order of the events.
> > 
> > Is it something coming in replace of EventTree or it it supposed to work in
> > parallel?
> 
> the commit messages should explain that, and if they don't make it clear
> what they do then I'll fix that.

there's a whole bunch of patches here, it'd be nice to get it answered explicitly.

> > Can you share your thinking what makes us crashing and how the approach will
> > address it.
> 
> I forget exactly, but more or less the same issue that always causes this,
> bogus event ordering or data, so don't do that, and its fixed.

bug 1294853 was supposed to address those cases, do you have examples, where bogus ordering happens?

do you have confirmation that the patch actually fixes the crash?
Comment on attachment 8809711 [details] [diff] [review]
enable the browser tests that where disabled for e10s

Review of attachment 8809711 [details] [diff] [review]:
-----------------------------------------------------------------

Alex this looks like a good sign.
Attachment #8809711 - Flags: feedback?(surkov.alexander)
Comment on attachment 8809711 [details] [diff] [review]
enable the browser tests that where disabled for e10s

Review of attachment 8809711 [details] [diff] [review]:
-----------------------------------------------------------------

do they crash/assert/fail without the patch? In bug 1272706 I suspected that the assertion will be fixed by bug 1294853, but it seems no one was around yet to check it.

this patch of course looks ok, so f+ :)
Attachment #8809711 - Flags: feedback?(surkov.alexander) → feedback+
Comment on attachment 8809499 [details] [diff] [review]
allow tracking in an accessible if it has a pending show / hide / reorder event

Review of attachment 8809499 [details] [diff] [review]:
-----------------------------------------------------------------

::: accessible/generic/Accessible.h
@@ +973,5 @@
> +
> +  /**
> +   * Set if there is a pending reorder event for this accessible.
> +   */
> +  void SetReorderEventTarget(bool aTarget) { mReorderEventTarget = aTarget; }

I'm not sure why these params are called aTarget?
Attachment #8809499 - Flags: review?(dbolter) → review+
Attachment #8809506 - Flags: review?(dbolter) → review+
Attachment #8809507 - Flags: review?(dbolter) → review+
Comment on attachment 8809508 [details] [diff] [review]
allow keeping AccTreeMutation in a list

Review of attachment 8809508 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, with caveat that we get some smoke testing and idea of perf before landing all these patches. I'm bolstered by the need for unbreaking e10s though...
Attachment #8809508 - Flags: review?(dbolter) → review+
Comment on attachment 8809597 [details] [diff] [review]
make TreeMutation track if it should be queueing events

Review of attachment 8809597 [details] [diff] [review]:
-----------------------------------------------------------------

::: accessible/base/EventTree.h
@@ +47,5 @@
>    Accessible* mParent;
>    uint32_t mStartIdx;
>    uint32_t mStateFlagsCopy;
>    EventTree* mEventTree;
> +  bool mQueueEvents;

I'd like a comment here.
Attachment #8809597 - Flags: review?(dbolter) → review+
Attachment #8809598 - Flags: review?(dbolter) → review+
Attachment #8809601 - Flags: review?(dbolter) → review+
Comment on attachment 8809603 [details] [diff] [review]
add a map from accessible and event type to an event

Review of attachment 8809603 [details] [diff] [review]:
-----------------------------------------------------------------

interesting
Attachment #8809603 - Flags: review?(dbolter) → review+
Comment on attachment 8809710 [details] [diff] [review]
switch to use the new mutation event queue system

Review of attachment 8809710 [details] [diff] [review]:
-----------------------------------------------------------------

::: accessible/tests/mochitest/events/test_coalescence.html
@@ +662,5 @@
>          new invokerChecker(EVENT_HIDE, getNode('t11_c2')),
> +        new orderChecker(),
> +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2_child'),
> +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2'),
> +        new orderChecker(),

what is async nature of these? and what is orderChecker (it's not part of this patch)?
(In reply to David Bolter [:davidb] from comment #56)
> Comment on attachment 8809508 [details] [diff] [review]
> allow keeping AccTreeMutation in a list
> 
> Review of attachment 8809508 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, with caveat that we get some smoke testing and idea of perf before
> landing all these patches.

I'd be interesting to see the numbers too, coalescence by accessible tree should be slower than coalescence by EventTree as accessible tree is expected to be bigger. Also the approach makes not trivial (impossible?) the implementation of show/hide/reorder coalescence for large numbers (bug 1301829).

technically, if the crash problem is caused by unexpected show/hide events ordering, then it should be possible to go with approach of event tree that preserves event ordering.
(In reply to alexander :surkov from comment #59)
> Comment on attachment 8809710 [details] [diff] [review]
> switch to use the new mutation event queue system
> 
> Review of attachment 8809710 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: accessible/tests/mochitest/events/test_coalescence.html
> @@ +662,5 @@
> >          new invokerChecker(EVENT_HIDE, getNode('t11_c2')),
> > +        new orderChecker(),
> > +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2_child'),
> > +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2'),
> > +        new orderChecker(),
> 
> what is async nature of these? and what is orderChecker (it's not part of
> this patch)?

One thing I discovered that might help is to look at all the patches together and use find in page:
https://bugzilla.mozilla.org/attachment.cgi?bugid=1270916&action=viewall
(In reply to David Bolter [:davidb] from comment #61)

> > what is async nature of these? and what is orderChecker (it's not part of
> > this patch)?
> 
> One thing I discovered that might help is to look at all the patches
> together and use find in page:
> https://bugzilla.mozilla.org/attachment.cgi?bugid=1270916&action=viewall

it doesn't locate orderChecker too.
(In reply to alexander :surkov from comment #62)
> (In reply to David Bolter [:davidb] from comment #61)
> 
> > > what is async nature of these? and what is orderChecker (it's not part of
> > > this patch)?
> > 
> > One thing I discovered that might help is to look at all the patches
> > together and use find in page:
> > https://bugzilla.mozilla.org/attachment.cgi?bugid=1270916&action=viewall
> 
> it doesn't locate orderChecker too.

You're right! Hmm.
(In reply to alexander :surkov from comment #52)
> (In reply to Trevor Saunders (:tbsaunde) from comment #51)
> > (In reply to alexander :surkov from comment #41)
> > > (In reply to Trevor Saunders (:tbsaunde) from comment #40)
> > > > Created attachment 8809508 [details] [diff] [review]
> > > > allow keeping AccTreeMutation in a list
> > > > 
> > > > Soon we will use the list to track the order of the events.
> > > 
> > > Is it something coming in replace of EventTree or it it supposed to work in
> > > parallel?
> > 
> > the commit messages should explain that, and if they don't make it clear
> > what they do then I'll fix that.
> 
> there's a whole bunch of patches here, it'd be nice to get it answered
> explicitly.

its a replacement.

> > > Can you share your thinking what makes us crashing and how the approach will
> > > address it.
> > 
> > I forget exactly, but more or less the same issue that always causes this,
> > bogus event ordering or data, so don't do that, and its fixed.
> 
> bug 1294853 was supposed to address those cases, do you have examples, where
> bogus ordering happens?

the browser tests that still fail.

> do you have confirmation that the patch actually fixes the crash?

tests now succeed so it seems very likely at least some of the issues are fixed.
(In reply to alexander :surkov from comment #59)
> Comment on attachment 8809710 [details] [diff] [review]
> switch to use the new mutation event queue system
> 
> Review of attachment 8809710 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: accessible/tests/mochitest/events/test_coalescence.html
> @@ +662,5 @@
> >          new invokerChecker(EVENT_HIDE, getNode('t11_c2')),
> > +        new orderChecker(),
> > +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2_child'),
> > +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2'),
> > +        new orderChecker(),
> 
> what is async nature of these? and what is orderChecker (it's not part of
> this patch)?

its nondeterministic which comes first, but in a sense it always was, we're free to build independent trees in any order.

it comes from bug 1316164

(In reply to alexander :surkov from comment #60)
> (In reply to David Bolter [:davidb] from comment #56)
> > Comment on attachment 8809508 [details] [diff] [review]
> > allow keeping AccTreeMutation in a list
> > 
> > Review of attachment 8809508 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > r=me, with caveat that we get some smoke testing and idea of perf before
> > landing all these patches.
> 
> I'd be interesting to see the numbers too, coalescence by accessible tree
> should be slower than coalescence by EventTree as accessible tree is
> expected to be bigger. Also the approach makes not trivial (impossible?) the
> implementation of show/hide/reorder coalescence for large numbers (bug
> 1301829).

that bug is just a really bad idea.  Anyway the case that it wants to deal with should be pretty fast with this system.

> technically, if the crash problem is caused by unexpected show/hide events
> ordering, then it should be possible to go with approach of event tree that
> preserves event ordering.

its just software, but that code is seriously complicated and comes with no explanation of how it is supposed to work.
(In reply to David Bolter [:davidb] from comment #55)
> Comment on attachment 8809499 [details] [diff] [review]
> allow tracking in an accessible if it has a pending show / hide / reorder
> event
> 
> Review of attachment 8809499 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: accessible/generic/Accessible.h
> @@ +973,5 @@
> > +
> > +  /**
> > +   * Set if there is a pending reorder event for this accessible.
> > +   */
> > +  void SetReorderEventTarget(bool aTarget) { mReorderEventTarget = aTarget; }
> 
> I'm not sure why these params are called aTarget?

Its just should it be set to being a target, or not.
I don't really care what they are named if you have a better idea.

(In reply to David Bolter [:davidb] from comment #56)
> Comment on attachment 8809508 [details] [diff] [review]
> allow keeping AccTreeMutation in a list
> 
> Review of attachment 8809508 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, with caveat that we get some smoke testing and idea of perf before
> landing all these patches. I'm bolstered by the need for unbreaking e10s
> though...

quickly testing a build I don't see anything wrong, and one of the sites in this bug that crashed a while back doesn't, but there could be many reasons for that.

talos couldn't find any statistically significant effect on performance.
Comment on attachment 8809707 [details] [diff] [review]
add methods to maintain a queue of mutation events to coalesce or fire

Review of attachment 8809707 [details] [diff] [review]:
-----------------------------------------------------------------

I bet I missed something... and I'm a little worried about not respecting DOM event order but we've already sort of gone down that road.

::: accessible/base/NotificationController.cpp
@@ +148,5 @@
> +  // Because we could be hiding the target of a show event we need to get rid
> +  // of any such events.  It may be possible to do less than coallesce all
> +  // events, however that is easiest.
> +  if (aEvent->GetEventType() == nsIAccessibleEvent::EVENT_HIDE) {
> +    CoalesceMutationEvents();

Do we know what kind of hit we get? (Let's see what Talos does)

@@ +297,5 @@
> +  mMutationMap.RemoveEvent(aEvent);
> +}
> +
> +void
> +NotificationController::CoalesceMutationEvents()

What is the expected time complexity for this method?

@@ +364,5 @@
> +          break;
> +        }
> +
> +        parent = parent->Parent();
> +        }

nit: remove two spaces of the above indent.

@@ +497,5 @@
> +                              event->mAccessible);
> +      if (!mDocument) {
> +        return;
> +      }
> +      }

nit: this last brace and the rest of the block (next 4 lines) are indented too much.

@@ +505,5 @@
> +        mDocument->ShutdownChildrenInSubtree(event->mAccessible);
> +      }
> +  }
> +
> +  // Group the show events by the parent of there target.

nit: "their"

@@ +518,5 @@
> +    showEvents.GetOrInsert(parent).AppendElement(event);
> +  }
> +
> +  // We need to fire show events for the children of an accessible in the order
> +  // of there indices at this point.  So sort each set of events for the same

nit: "their"

@@ +542,5 @@
> +    nsTArray<AccTreeMutationEvent*>& events = iter.Data();
> +    events.Sort(AccIdxComparator());
> +    for (AccTreeMutationEvent* event: events) {
> +      nsEventShell::FireEvent(event);
> +    if (!mDocument) {

nit: you lost indent here, please add it back for the rest of the 'for' block.

@@ +557,5 @@
> +
> +    }
> +  }
> +
> +  // Now we can fire the reorder events after all the show and hide events.

At some point I imagine event order will be spec'ed... I shudder to think about that.
Attachment #8809707 - Flags: review?(dbolter) → review+
Comment on attachment 8809708 [details] [diff] [review]
call CreateSubtree() before firing a show event for the tree root

Review of attachment 8809708 [details] [diff] [review]:
-----------------------------------------------------------------

hmm
Attachment #8809708 - Flags: review?(dbolter) → review+
Attachment #8809709 - Flags: review?(dbolter) → review+
Comment on attachment 8809710 [details] [diff] [review]
switch to use the new mutation event queue system

Review of attachment 8809710 [details] [diff] [review]:
-----------------------------------------------------------------

This really brings home that there is quite an event expectation change. How much it matters is unknown. This work may well bounce depending on real world testing.

::: accessible/base/EventTree.cpp
@@ +62,5 @@
>    if (static_cast<uint32_t>(aChild->mIndexInParent) < mStartIdx) {
>      mStartIdx = aChild->mIndexInParent + 1;
>    }
>  
> +  if (!mQueueEvents) {

Assuming we need to do this to ship e10s a11y right?

::: accessible/tests/mochitest/treeupdate/test_bug1189277.html
@@ +46,5 @@
>        }
>      }
>  
>      //enableLogging("tree");
> +    gA11yEventDumpToConsole = true;

nit: don't forget to comment this out.
Attachment #8809710 - Flags: review?(dbolter)
Attachment #8809710 - Flags: review+
Attachment #8809710 - Flags: feedback?(surkov.alexander)
Comment on attachment 8809711 [details] [diff] [review]
enable the browser tests that where disabled for e10s

Review of attachment 8809711 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8809711 - Flags: review?(dbolter) → review+
Summary: this is a bit scary. That said, we have a train system and I'd like to see performance characteristics and feedback from the wild. So with that in mind, I'm good with this landing now.

Risk: I'm a terrible reviewer. I'm okay reviewing this because of e10s a11y urgency but I can't be critical path for follow up work in this code. Alexander will have a better understanding than me of whether this can ultimately replace the event tree work, given both performance in the wild but also needing to consider code simplicity.
(In reply to Trevor Saunders (:tbsaunde) from comment #65)

> > > +        new orderChecker(),
> > > +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2_child'),
> > > +        new asyncInvokerChecker(EVENT_SHOW, 't11_c2'),
> > > +        new orderChecker(),
> > 
> > what is async nature of these? and what is orderChecker (it's not part of
> > this patch)?
> 
> its nondeterministic which comes first, but in a sense it always was, we're
> free to build independent trees in any order.

can you explain more? these mutations are caused by aria-owns, which are processed in same order as they were triggered. Also before these patches the test didn't excepted any nondeterministic ordering, and as far as I know, it had no intermittent failures, which would point at some async nature.

> it comes from bug 1316164

it was probably wrong bug number, at least it didn't give me any good explanation

> (In reply to alexander :surkov from comment #60)

> > I'd be interesting to see the numbers too, coalescence by accessible tree
> > should be slower than coalescence by EventTree as accessible tree is
> > expected to be bigger. Also the approach makes not trivial (impossible?) the
> > implementation of show/hide/reorder coalescence for large numbers (bug
> > 1301829).
> 
> that bug is just a really bad idea.

can you please comment into that bug, why you find this idea bad?

>  Anyway the case that it wants to deal
> with should be pretty fast with this system.

It's not about show/hide events only, it's about reorder events too, and the new system won't help with it, since it has to traverse subtrees for coalescence.

> > technically, if the crash problem is caused by unexpected show/hide events
> > ordering, then it should be possible to go with approach of event tree that
> > preserves event ordering.
> 
> its just software, but that code is seriously complicated and comes with no
> explanation of how it is supposed to work.

arguably the new system is not less complicated. agree EventTree could benefit from more detailed comments

(In reply to Trevor Saunders (:tbsaunde) from comment #66)

> > r=me, with caveat that we get some smoke testing and idea of perf before
> > landing all these patches. I'm bolstered by the need for unbreaking e10s
> > though...
> 
> quickly testing a build I don't see anything wrong, and one of the sites in
> this bug that crashed a while back doesn't, but there could be many reasons
> for that.
> 
> talos couldn't find any statistically significant effect on performance.

just for the record, a11y talos coverage is not good, so I wouldn't affirm we are in good shape based on talos result. 

There are two large real-life tests, which weren't yet added into the talos, bug 1268562 and bug 1264278, and it'd be interesting to see how the new system performs on them.
Comment on attachment 8809710 [details] [diff] [review]
switch to use the new mutation event queue system

Review of attachment 8809710 [details] [diff] [review]:
-----------------------------------------------------------------

Assuming feedback was requested for the approach in general, I set f- as no enough justification was given for switching between event coalesce systems. Having said that, I don't want the f- considered as a stopper for the patch as it possibly addresses the crashes.

Concerns:
* no trustworthy perf testing was done
* there's new async behavior in show events (not yet clearly explained)
* the approach prevents us from further event coalescence perfromance improvements
* no good justification was given why we need to replace old system on a new one
* this is fairly big and code complicated change late in the game, which may make us unstable for a while
Attachment #8809710 - Flags: feedback?(surkov.alexander) → feedback-
(In reply to David Bolter [:davidb] from comment #67)
> Comment on attachment 8809707 [details] [diff] [review]
> add methods to maintain a queue of mutation events to coalesce or fire
> 
> Review of attachment 8809707 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I bet I missed something... and I'm a little worried about not respecting
> DOM event order but we've already sort of gone down that road.

This actually kind of goes back to pre event tree order somewhat, in that most hide events used to come first, and now they do again.

> ::: accessible/base/NotificationController.cpp
> @@ +148,5 @@
> > +  // Because we could be hiding the target of a show event we need to get rid
> > +  // of any such events.  It may be possible to do less than coallesce all
> > +  // events, however that is easiest.
> > +  if (aEvent->GetEventType() == nsIAccessibleEvent::EVENT_HIDE) {
> > +    CoalesceMutationEvents();
> 
> Do we know what kind of hit we get? (Let's see what Talos does)

vs the other method suggested there? no, though that is certainly an interesting thing  to investigate, but I didn't want to spend the time.

as I said vs event tree on the tests we have there's no statistically noticeable difference when I ran the tests.

> @@ +297,5 @@
> > +  mMutationMap.RemoveEvent(aEvent);
> > +}
> > +
> > +void
> > +NotificationController::CoalesceMutationEvents()
> 
> What is the expected time complexity for this method?

its really hard to give a rigorous algorithmic complexity for anything involving any coalescing algorithm because you need to figure what the worst possible thing is and that's not necessarily obvious.

its probably more interesting to look at what common cases are anyway, and while this should do better than the event tree code in some cases I'm not sure if those are more or less common.
Keywords: leave-open
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/35ebbb5e0d02
allow tracking in an accessible if it has a pending show / hide / reorder event r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/baca5dd6bab6
add a common base class of AccMutationEvent and AccReorderEvent r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/2829bb27b8c4
allow downcasting AccEvent to AccTreeMutationEvent r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/10d579e5dec1
allow keeping AccTreeMutation in a list r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b27e6ae9481
make TreeMutation track if it should be queueing events r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2694ad14de7
allow tracking when a mutation event was fired relative to other mutation events r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/074b41be0f3e
add a map from accessible and event type to an event r=davidb
Sorry had to back these out for bustage, such as https://treeherder.mozilla.org/logviewer.html#?job_id=39283229&repo=mozilla-inbound#L4134
Flags: needinfo?(tbsaunde+mozbugs)
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8e5f19f19b7f
allow tracking in an accessible if it has a pending show / hide / reorder event r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/4c58195d6bcf
add a common base class of AccMutationEvent and AccReorderEvent r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd68b5fd88a6
allow downcasting AccEvent to AccTreeMutationEvent r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/91a8f65a69fe
allow keeping AccTreeMutation in a list r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/eefa622b467a
make TreeMutation track if it should be queueing events r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/a9d6410f068e
remove EventTree::{Shown,Hidden}() r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/034efc9a4408
allow tracking when a mutation event was fired relative to other mutation events r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8ead1b86059
add a map from accessible and event type to an event r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ca49b5d1c51
add methods to maintain a queue of mutation events to coalesce or fire r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/60189bf08272
call CreateSubtree() before firing a show event for the tree root
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ee47068d1fb
use accessible->IndexInParent() in DocAccessibleChildBase::ShowEvent() r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/916cbaf21a63
switch to use the new mutation event queue system r=davidb
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8f2d9f13579c
allow tracking in an accessible if it has a pending show / hide / reorder event r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/72e9db0aa74f
add a common base class of AccMutationEvent and AccReorderEvent r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/15e1b251bcc7
allow downcasting AccEvent to AccTreeMutationEvent r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/a61d922a5ae3
allow keeping AccTreeMutation in a list r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/e89942576524
make TreeMutation track if it should be queueing events r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c044a48a938
remove EventTree::{Shown,Hidden}() r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/9814186062bc
allow tracking when a mutation event was fired relative to other mutation events r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/f4585a8624ee
add a map from accessible and event type to an event r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/0919d58c4e9e
add methods to maintain a queue of mutation events to coalesce or fire r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/1b22ff1fe4a6
call CreateSubtree() before firing a show event for the tree root
https://hg.mozilla.org/integration/mozilla-inbound/rev/43708cd293fa
use accessible->IndexInParent() in DocAccessibleChildBase::ShowEvent() r=davidb
https://hg.mozilla.org/integration/mozilla-inbound/rev/387d3acae9e9
switch to use the new mutation event queue system r=davidb
\o/ woot!
Keywords: leave-open
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
+1 to the woot!

Clearing bustage NI on Trevor.

+NI: just want this on Marco's radar as a milestone for QA/perf testing, and for context with any reports from the wild.
Flags: needinfo?(tbsaunde+mozbugs) → needinfo?(mzehe)
Pushed by tsaunders@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0371c68cb129
enable the browser tests that were disabled for e10s r=davidb
Blocks: 1318725
I am now back from sick leave and am running with the latest nightly build in E10S. So far nothing bad found. If I find something, I'll file new bugs.
Flags: needinfo?(mzehe)
There are some crash reports with 51 Beta.
After testing for three days, I can say that there are some general performance/hang problems, described in bug 1319777, and a very specific performance degradation related to text editing in multi-line text fields, filed as bug 1319774.
Depends on: 1319777, 1319774
We need to make a call on uplifting this fix to 52. Without another fix in hand I think we should right?
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(mzehe)
Flags: needinfo?(jmathies)
Right, we should, but also be aware (and communicate) that there might be fallout followups.
Flags: needinfo?(mzehe)
(In reply to David Bolter [:davidb] from comment #90)
> We need to make a call on uplifting this fix to 52. Without another fix in
> hand I think we should right?

Yes, we need this for a11y+e10s stability.
Flags: needinfo?(jmathies)
(In reply to David Bolter [:davidb] from comment #90)
> We need to make a call on uplifting this fix to 52. Without another fix in
> hand I think we should right?

agreed, the worst fallout seems to be bug 1319777 which we should understand, but I think the results here are pretty good.
Flags: needinfo?(tbsaunde+mozbugs)
Comment on attachment 8809499 [details] [diff] [review]
allow tracking in an accessible if it has a pending show / hide / reorder event

Approval Request Comment
[Feature/Bug causing the regression]:
e10s
[User impact if declined]:
e10s+touch screen support will slip a release. currently targeted at 52.
[Is this code covered by automated tests?]:
yes
[Has the fix been verified in Nightly?]:
yes, ~3 weeks.
[Needs manual test from QE? If yes, steps to reproduce]: 
no
[List of other uplifts needed for the feature/fix]:
I'm flagging the first patch here, but this request is for the entire patch set that originally landed.
[Is the change risky?]:
low
[Why is the change risky/not risky?]:
this code landed right after the merge, aurora and nightly should be in pretty close sync.
[String changes made/needed]:
none
Attachment #8809499 - Flags: approval-mozilla-aurora?
Depends on: 1318569
Comment 94
Flags: needinfo?(jcristau)
Comment on attachment 8809499 [details] [diff] [review]
allow tracking in an accessible if it has a pending show / hide / reorder event

let's take these accessibility changes in aurora52.

As far as I can tell this is 8f2d9f13579c:387d3acae9e9 and 0371c68cb129 on m-c.
Flags: needinfo?(jcristau)
Attachment #8809499 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
needs rebasing for aurora
Flags: needinfo?(tbsaunde+mozbugs)
Marking this fixed in 53, belatedly.
Other work to uplift to aurora along with this: bug 1316789, bug 1316154,  bug 1316163, bug 1316119
Flags: needinfo?(tbsaunde+mozbugs)
Depends on: 1322593
Still top #7 of Aurora 20161225004004 on Windows, any ideas? Note the crash stack on Socorro doesn't match the signature.
Flags: needinfo?(tbsaunde+mozbugs)
A crash with the same signature is #10 top content crasher on Beta. I'm not sure if it's the same crash or a different crash with the same signature.
(In reply to Marco Castelluccio [:marco] (PTO until Jan 3) from comment #104)
> A crash with the same signature is #10 top content crasher on Beta. I'm not
> sure if it's the same crash or a different crash with the same signature.

Different signature. Those patches haven't been uplifted to beta because that code is not supposed to be running on beta. Probably people who have force-enabled.
(In reply to Aaron Klotz [:aklotz] from comment #105)
> (In reply to Marco Castelluccio [:marco] (PTO until Jan 3) from comment #104)
> Different signature.

Correction, it is the same signature as this bug. If it shows up on 52 and 53, that's for different reasons.
Flags: needinfo?(tbsaunde+mozbugs)
Depends on: 1346439
Duplicate of this bug: 1299243
Depends on: 1360160
Depends on: 1363027
Depends on: 1368784
Depends on: 1370669
You need to log in before you can comment on or make changes to this bug.