Closed Bug 563327 Opened 10 years ago Closed 10 years ago

Accessibility code subverts attempts to use refresh driver

Categories

(Core :: Disability Access APIs, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b1
Tracking Status
blocking2.0 --- final+

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

(Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(2 files, 3 obsolete files)

I just realized that if accessibility is active then on mutations that involve accessibility changes the following happens:

1)  nsAccEventQueue::Push is called to push the accessibility event.  This calls
    PrepareFlush.
2)  PrepareFlush posts a runnable to call Flush.
3)  Flush flushes layout on the presshell.

The upshot is that the reflows (and frame construction) involved are processed off the flush runnable, not off the refresh driver.  This is probably not a problem for restyles, since accessibility doesn't hook into specified style changes, and computed styles are recomputed off refresh driver already.

Is there a reason this code needs to Flush() as soon as possible?  If not, can it use a refresh observer to flush when the visual presentation would be refreshing anyway?
(In reply to comment #0)
> Is there a reason this code needs to Flush() as soon as possible?  If not, can
> it use a refresh observer to flush when the visual presentation would be
> refreshing anyway?

That was introduced when we were attempting to use bad frame pointers. The theory at the time was that we were holding onto the frame pointer and then making calls that ultimately made the frame pointer bad. We also moved to using weak frames. In at least one case it turned out we were causing and reframing that made the pointer go bad, it was just a broken frame tree.

I'm not sure we need to keep the flush or not - note we do it in two places.
> I'm not sure we need to keep the flush or not

That wasn't my question, and I'm not asking you to change how you flush the presshell in nsAccEventQueue::Flush.  The question is whether nsAccEventQueue::Flush needs to happen ASAP asynchronously after nsAccEventQueue::Push is called, or whether running nsAccEventQueue::Flush on the "update the display" heartbeat timer layout uses is ok.
(In reply to comment #2)
> > I'm not sure we need to keep the flush or not
> 
> That wasn't my question, and I'm not asking you to change how you flush the
> presshell in nsAccEventQueue::Flush.  The question is whether
> nsAccEventQueue::Flush needs to happen ASAP asynchronously after
> nsAccEventQueue::Push is called, or whether running nsAccEventQueue::Flush on
> the "update the display" heartbeat timer layout uses is ok.

Ah. Thanks for the clarification. I see no reason it could run off the "update the display" heartbeat. How do we make that happen?
Sigh. "could not run"...
See layout/base/nsRefreshDriver.h; you'd want to use a Flush_Display observer to make sure that it happens after the normal layout flush has already happened.  

You may need to add some virtual methods (probably on presshell?) to be able to call this from accessibility code.
Boris, can you make a call on whether or not this should block.  It's hard for me to tell.  Clearing the nom for now.
blocking2.0: ? → ---
I think this should block, yes.  This is subverting one of the more important layout performance optimizations we've made in 1.9.3.
blocking2.0: --- → ?
Just taking.  I mostly needed my questions from comment 0 answered by the a11y folks; I can do the hookup myself.
Assignee: nobody → bzbarsky
OK, so I have a partial patch, but I ran into two issues:

1)  Does nsAccEventQueue have longer lifetime than a given presshell?  If so,
    does it have some way to find out when a presshell is destroyed?  We may
    simply need a notification that presshell sends in Destroy()...
2)  Does Flush() really need to Flush_Layout, or is flushing interruptible layout
    good enough?
Attached patch What I have so far (obsolete) — Splinter Review
When a11y is notified about document changes (frame or DOM changes) then a11y events are fired after timeout. When pending a11y events are constructed all we need to know whether the changed node is visible or not to build correct accessible tree. When we are about to fire pending event to AT then we need to be ready to provide more frame information (nsAccessible::GetState()).

I would say we don't need to flush layout inside nsEventShell::Flush() since we either constructed accessible or will construct it after timeout (of course if we can be sure about whether node is visible or not at this point). And we don't need to flush layout when we are about to fire AT events. The one case we need to flush when GetState() is called so that AT is able to grab updated states. That's theoretically.

Flush layout has been introduced in bug 444644. The first David's patches were similar to theoretical things I said above. The latest patches have additional flush layout when pending a11y event is added into queue. I'm not sure about the reason. Either I missed point in the bug discussion why it's necessary or David added it just because to be more sure.

Let's get David's opinion, try to rollback part of bug 444644 patch and ask Marco for testing.
(In reply to comment #9)
> OK, so I have a partial patch, but I ran into two issues:
> 
> 1)  Does nsAccEventQueue have longer lifetime than a given presshell?  If so,
>     does it have some way to find out when a presshell is destroyed?  We may
>     simply need a notification that presshell sends in Destroy()...

It shouldn't. No presshell means no accessible. That's theoretical. I think presshell is destroyed when document is destroyed, right? So when we handle "pagehide" event then we make sure to destroy our accessibles (at least after bug 566103).

> 2)  Does Flush() really need to Flush_Layout, or is flushing interruptible
> layout
>     good enough?

What's the difference between them. All we need to know if we can get correct frame info about visibility, size, position and etc.
> I think presshell is destroyed when document is destroyed, right? 

No, it can be destroyed significantly earlier.  In fact it can be destroyed and then another one created for the same document, either immediately or much later.  All it takes is for the document to be in an iframe and for the display property of the iframe to change.

> What's the difference between them.

Flush_Layout guarantees up-to-date geometry.

Flush_InterruptibleLayout will start layout, but if it's taking too long stop and return to the calling code so as to avoid freezing up the browser.  So geometry might not be fully up to date after a Flush_InterruptibleLayout.

The latter is what we normally do before painting the page on screen.  Note that if an interruptible layout is interrupted, then another layout is immediately scheduled.

So the question is whether you really need size/position information that is fully up-to-date or whether a best effort at it is good enough.  You will get up-to-date style information no matter what.
(In reply to comment #13)
> > I think presshell is destroyed when document is destroyed, right? 
> 
> No, it can be destroyed significantly earlier.  In fact it can be destroyed and
> then another one created for the same document, either immediately or much
> later.  All it takes is for the document to be in an iframe and for the display
> property of the iframe to change.

Do we get "pagehide" event if presshell is destroyed?
(In reply to comment #13)
 
> So the question is whether you really need size/position information that is
> fully up-to-date or whether a best effort at it is good enough.  You will get
> up-to-date style information no matter what.

We need correct size/position for example when nsIAccessible::getBounds is called (and I think in some other cases).

However I don't think getState() requires updated size/position. All we need to know at this point is whether node is visible and on(off)screen. Iirc the current code of getState() deals with frame rect but it might be avoided I guess.

It makes sense to not do "full flush" for getState() since this method is called by AT often. I'm not sure they do the same for getBounds() and etc.
> Do we get "pagehide" event if presshell is destroyed?

No.  pagehide just refers to the state of the document, not the state of the presentation.

> All we need to know at this point is whether node is visible and on(off)screen.

Do you need to know more about that than the painting code would?  That is, if it's supposed to end up offscreen but hasn't gotten there yet because our reflow got interrupted but will end up there at the next refresh, is that good enough?
(In reply to comment #16)
> > Do we get "pagehide" event if presshell is destroyed?

Current code depends on presshell and it's assumed it must be not null, otherwise accessible is defunct. We need to handle this since currently we rely on "pagehide" event only iirc. I need to do some debugging and file a following up bug to deal with this this I think.

> Do you need to know more about that than the painting code would?  That is, if
> it's supposed to end up offscreen but hasn't gotten there yet because our
> reflow got interrupted but will end up there at the next refresh, is that good
> enough?

How does it look on screen? I guess nowise. In general AT shouldn't have more information than sighted user have. But it makes me bother since AT would grab wrong info and may cache it.
> I need to do some debugging and file a following up bug to deal with this this
> I think.

OK, sounds good.  Please cc me?  It sounds like it needs to block this bug...

How does AT handle layout changes (e.g. due to a style change or the window being resized of whatever) right now?
(In reply to comment #18)
> > I need to do some debugging and file a following up bug to deal with this this
> > I think.
> 
> OK, sounds good.  Please cc me?  It sounds like it needs to block this bug...

If the unique way of destroying presshell while document is kept alive is styling changes of iframe (or any other document container) then this should be handled successfully. Are there other options to destroy presshell while DOM document is alive?

> How does AT handle layout changes (e.g. due to a style change or the window
> being resized of whatever) right now?

Layout calls nsIAccessibleService::InvalidateSubtreeFor() (http://mxr.mozilla.org/mozilla-central/ident?i=InvalidateSubtreeFor&filter=)
> then this should be handled successfully.

What code handles it right now?

> Are there other options to destroy presshell while DOM document is alive?

Realistically, no. Just style changes on some ancestor frame.
(In reply to comment #20)
> > then this should be handled successfully.
> 
> What code handles it right now?

I think this code - http://mxr.mozilla.org/mozilla-central/source/layout/base/nsCSSFrameConstructor.cpp#9181. We get notifications about frame changes and invalidate the accessible tree so that if iframe is hidden then we destroy underlying accessible document.
Attached patch revert part of bug 444644 (obsolete) — Splinter Review
revert part of bug 444644, see comment #11.

David, what's the reason to flush layout when we're about to fire events (when I read the bug 444644 then I was failed to find a reason)?
Attachment #449451 - Flags: review?(bolterbugz)
Attachment #449451 - Flags: feedback?(marco.zehe)
I've got linux a11y mochitests fail:

1927 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/a11y/accessible/events/test_mutation.html | Test timed out.
1934 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/a11y/accessible/events/test_scroll.xul | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - uncaught exception: [accessible/events.js, gA11yEventObserver.observe] This is expected if a previous test has been aborted... Initial exception was: [ ReferenceError: nsIAccessibleEvent is not defined ] at :0
10744 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/a11y/accessible/test_name_markup.html | Test timed out.
10747 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/a11y/accessible/test_name_nsRootAcc.xul | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - uncaught exception: [accessible/events.js, gA11yEventObserver.observe] This is expected if a previous test has been aborted... Initial exception was: [ ReferenceError: nsIAccessibleEvent is not defined ] at :0

It sounds likely something wrong at the point where we get notifications from layout and DOM changes. I've filed bug 570275 for layout notifications today. But iirc test_name_markup.html uses DOM mutations so it might be not covered by that bug.

The same time I think we shouldn't flush layout when we flush events, however we should consider doing something when we're notified about changes (because at this point we create accessibles).
Comment on attachment 449451 [details] [diff] [review]
revert part of bug 444644

Yes, it makes a11y fail - I tested locally on windows: lazy frame construction seems to kill a11y. We need to figure out correct solution.
Attachment #449451 - Flags: review?(bolterbugz)
Attachment #449451 - Flags: feedback?(marco.zehe)
I really do think just notifying on refresh driver (see comment 5) should be fine, no?
(In reply to comment #26)
> I really do think just notifying on refresh driver (see comment 5) should be
> fine, no?

I'm not sure I really follow you. In general I think we don't need to flush layout when we're about to fire events. However we fail because we can't create an accessible because something wrong with frame. We have two bugs for the problem: bug 570322 and bug 570275.

When we are about to create an accessible then we should be sure correct frame is created for DOM node and we should know whether DOM node is hidden or not. Boris what magic can we do to ensure in these?
> In general I think we don't need to flush layout when we're about to fire
> events.

OK.  But you need to flush frame construction, no?  You'd want to do that off the refresh driver, I would think.

> Boris what magic can we do to ensure in these?

Flush with Flush_Frames.  But again, you don't want to do this off your current "do it really soon" timer.

And if you do it off the refresh driver, there's no extra cost to flushing layout first.  You did look at the patch I attached, right?
Your patch should definitely help in this case: when we are about to fire pending accessibility events then we can be sure we construct right accessibles.

But the problem is still here: we can't construct an accessible without frame. For example, in some cases we create an accessible when we put an event to queue. If we do frame flush on accessible creation then this then I think that makes this patch not useful. 

Probably we need to make an accessible "closer" to DOM node rather than to frame so that we can create right accessible without frames.
(In reply to comment #29)
> Probably we need to make an accessible "closer" to DOM node rather than to
> frame so that we can create right accessible without frames.

Maybe bug 544498 blocks us then... not sure though.
> If we do frame flush on accessible creation then this then I think that
> makes this patch not useful. 

Indeed.  But that problem is already there with or without this patch; I don't propose to try to fix it here.  Is that ok?
(In reply to comment #31)
> > If we do frame flush on accessible creation then this then I think that
> > makes this patch not useful. 
> 
> Indeed.  But that problem is already there with or without this patch; I don't
> propose to try to fix it here.  Is that ok?

Of course. I'll do investigation how we can move from accessible creation based on frame type to not avoid lazy frame construction. Since it sounds that's all we can do.
> We get notifications about frame changes and invalidate the accessible tree 

OK. So this happens _after_ presshell destruction.  So to avoid the asserts in ~nsRefreshDriver we need to add a notification that PresShell::Destroy sends to accessibility to unregister the refresh observer.
I filed bug 571459 for this.
(In reply to comment #29)
 
> Probably we need to make an accessible "closer" to DOM node rather than to
> frame so that we can create right accessible without frames.

Thinking more it find this wrong. If I'm right then table frames are created for <div style="display: table">, if so then we should use table accessible and it doesn't sound right if we would use styling information in this case). So probably we should make accessible objects closer to frames rather than to DOM nodes and then listen frame changes rather than DOM nodes changes. Technically screen readers don't care about DOM and they need to read what's on screen.
> If I'm right then table frames are created for <div style="display: table">

Yes.

Why are we creating accessibles for things with no frame again?
(In reply to comment #36)
 
> Why are we creating accessibles for things with no frame again?

No we don't create but we are trying and failed, for example, when we are notified by nsIMutationObserver, at this point I think we may fail to get an frame and create an accessible. However there are cases when we would want to have an accessible without a frame (for example, for hidden DOM nodes if they are referred by nodes that are accessible) but that's not supported now.
Boris, how long do frame live and when are they destroyed? Are they living for the DOM node (assuming display style isn't changed) while presshell exists?
> Boris, how long do frame live and when are they destroyed?

It varies....

> Are they living for the DOM node (assuming display style isn't changed)

Not necessarily.  Display changes on any ancestor, insertions of siblings of certain kids, changes to CSS properties other than display can all trigger frame reconstructs.
Ok, then we shouldn't rely on frame life cycle but we need to listen any changes that may result in accessible recreation: these are display style changes I think, at least as the first approximation. Also we shouldn't need to update an accessible tree on DOM mutations, we should do this when lazy frame construction is finished and frames are created. Though can we be sure frames stays uncreated forever and during significant amount of time which results AT user itsn't notified about changes (for example, if he has open chat in background tab and somebody pings him)?
Frames are what we paint, so if you stick to frame lifetimes you will be giving an experience no worse than that for sighted users, right?
(In reply to comment #41)
> Frames are what we paint, so if you stick to frame lifetimes you will be giving
> an experience no worse than that for sighted users, right?

I think yes though 
1) excepting the case of mouse moves can make frames create or something else: sighted and blind users have different styles to work.
2) and I afraid frames may be created and die a bit more often then we would want, for example like you said insertion a sibling may force recreate siblings frames or changes of CSS properties.
Depends on: 571459
Attached patch Part 2. Hook up the a11y bits. (obsolete) — Splinter Review
Attachment #448159 - Attachment is obsolete: true
Attachment #449451 - Attachment is obsolete: true
Attachment #450697 - Flags: review?(surkov.alexander)
(In reply to comment #44)
> Created an attachment (id=450697) [details]
> Part 2.  Hook up the a11y bits.

1) You use nsCOMPtr<nsIPresShell> presShell because AddRefreshObserver/RemoveRefreshObserver may destroy it?
2) I think I'm fine to take this patch since we don't have "correct" way either and the correct way might take some time but could you update your patch to trunk, few days ago I pushed changes into nsAccEventQueue and it sounds you didn't grab them.
> 1) You use nsCOMPtr<nsIPresShell> presShell because
> AddRefreshObserver/RemoveRefreshObserver may destroy it?

No, because nsDocAccessible::GetPresShell returns already_AddRefed.

> but could you update your patch to trunk,

Sure thing.
(In reply to comment #46)
> > 1) You use nsCOMPtr<nsIPresShell> presShell because
> > AddRefreshObserver/RemoveRefreshObserver may destroy it?
> 
> No, because nsDocAccessible::GetPresShell returns already_AddRefed.

oh, right, I got confused with nsIDocument :)
Attachment #450697 - Attachment is obsolete: true
Attachment #450726 - Flags: review?(surkov.alexander)
Attachment #450697 - Flags: review?(surkov.alexander)
Comment on attachment 450726 [details] [diff] [review]
Part 2 merged to tip

is swap elements approach quicker than current one?
Attachment #450726 - Flags: review?(surkov.alexander) → review+
Boris is there guarantee that refresh triggers in reasonable time if there's no mutation events for example, when focus event in the queue only?
> is swap elements approach quicker than current one?

It's somewhat faster (swap of two integer values in memory instead of a memmove), but more importantly it leads to what I see as simpler and more robust code: no need to sprinkle knowledge about the event-processing loop through the rest of the file.

> Boris is there guarantee that refresh triggers in reasonable time if there's no
> mutation events for example, when focus event in the queue only?

It doesn't matter what's in the queue.  Refresh is triggered if there are refresh observers; since this code adds a refresh observer once something is in the queue, refresh will get triggered sometime after that.  The refresh timer is a 20ms slack repeating timer, so it should fire within 20ms of adding as an observer (might be less if the timer was already armed when the observer is added), as long as the system is not under heavy load.  If the event queue is starved, or load is heavy, it might take longer, of course.
Depends on: 571986
I backed out both part 1 and part 2 of this patch (because I was not sure if it's OK to leave the part 1 in the tree) because of bug 571986.

http://hg.mozilla.org/mozilla-central/rev/4dc9a5d9167b
Here's the problem line or two:

    nsAccEvent *accEvent = mEvents[index];
    if (accEvent->mEventRule != nsAccEvent::eDoNotEmit)

That crashes on the accEvent dereference.  I should be using |events|, not |mEvents|, there.  Will fix locally and retest before pushing.
Depends on: 573085
Pushed:
  http://hg.mozilla.org/mozilla-central/rev/f5054d64d01d
  http://hg.mozilla.org/mozilla-central/rev/d4156799e66a
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a6
blocking2.0: ? → final+
You need to log in before you can comment on or make changes to this bug.