Closed Bug 1362760 Opened 7 years ago Closed 7 years ago

Consider starting a newly added tab as an active docshell

Categories

(Firefox :: Tabbed Browser, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

Details

Attachments

(1 file)

I have been debugging some test failures with my patches for bug 1343728 for weeks now and I finally found out the reason for why the failures happen!

Consider this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=514c0110a4a5b441bd9eafc276988ca2f12e68d5&selectedJob=93751442  There are many test failures but they're all of similar nature but I have picked layout/style/test/test_animations_effect_timing_duration.html for debugging.  What generally happens is that we we try to open a test page in a new tab, the test page runs some scripts during page load which do something like call nsIDOMWindowUtils.advanceTimeAndRefresh() which runs a refresh driver tick from JS.  The refresh driver tick ends up calling nsViewManager::ProcessPendingUpdatesPaint() since my patch is removing the create window sync IPC, now this code in the content process is running very much earlier than before, much earlier before the parent has actually opened a tab, and it turns out that the docshell isn't marked as active yet, so our widget thinks that we don't need to paint and this returns false <https://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/view/nsViewManager.cpp#445> and we bail out early without every calling PresShell::Paint() and actually painting something.  Then the test listens for a MozAfterPaint event which will arrive but since no actual painting will happen the MozAfterPaint will have an empty invalidated rect which will cause the test to fail.  If you modify this test to open its test page in a new window (which is also changed by my patches) the test works.

I finally narrowed this down to this code: <https://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/browser/base/content/tabbrowser.xml#2201>  This code is very old, it dates back to bug 343515, and to the first WIP patch there (attachment 401797 [details] [diff] [review]) and as far as I can tell there was no specific reason why this was done this way.  The specific thing that causes this test failure is that due to this line, when we get to ProcessPengingUpdatesPaint() above our widget thinks it doesn't need painting.  I see no reason why we can't change this to true.

I sent a patch to do that on top of my old try push to the try server last night and it seemed to work out well: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a32776f3a5c6e88a2f262278f6d8214f4645de9b  There are other test failures as a result of this (for example the test for bug 343515 and a few other things that depend on the current behavior break but that's not unexpected.)

Before I spent the time to chase down these test failures and fix them, Mike, do you think it's reasonable to start out the docshell as active in _insertBrowser()?
Flags: needinfo?(mconley)
As we discussed in person, so long as we can be sure that the tab that we're inserting is the one we're going to be showing, then yes, I think starting it activated is fine.

Naturally, the cases we want to avoid are when a browser insertion occurs in a background tab.
Flags: needinfo?(mconley)
I think I'm going to keep this restricted to the code path triggered by window.open().  Try server seems to think that our tests are mostly happy with that!
Assignee: nobody → ehsan
Currently the activation of the docshell races with some JavaScript triggered
paints in tests but the long running sync IPC that exists when creating a new
window acts as a synchronoization point of sorts to make these tests mostly
pass.  This path makes the docshell activated from the get-go to remove this
race condition and guarantee that the paints triggered in the test will always
succeed.  This side effect shouldn't be observable to web page script.
Attachment #8867559 - Flags: review?(mconley)
Hey ehsan, I haven't let this drop off my radar - just trying to get my head wrapped around the change to the assertion in onLayersReady. Give me a bit more time with rr.
Okay, I was able to reproduce this, added some logging, and I think I understand what's going on (and why this assertion failure gave me a bad feeling).

Because the docShellIsActive is being set to true _outside_ of the async tab switcher, the async tab switcher cannot be sure of the state of the layers for that tab by the time requestTab is called for it.

A workaround would be to maybe have the async tab switcher interpret a requested tab with docShellIsActive set to true, but a false nsITabParent.hasPresented as still "in flight" and set its state to STATE_LOADING...

but I'm also worried about the fact that we're not doing the display port suppression for the tab we're inserting either. I think it's possible we may regress some tab switch painting cases because we'll end up painting more than we need to during the tab switch.

ehsan, correct me if I'm wrong, but the _whole_ reason you needed to do this was for that one silly test that tries to make paint happen via JS, right? Like, there's no other reason, correct? If so, I wonder if we can find a way of altering that test to wait for the tab to be switched to (and presented) before trying to do its JS-initted paint.
Flags: needinfo?(ehsan)
Comment on attachment 8867559 [details] [diff] [review]
When creating a new <xul:browser> for a new tab, activate its docshell immediately

Clearing review request until I hear more from my previous needinfo.
Attachment #8867559 - Flags: review?(mconley)
I still have to digest the rest of your comment because I don't know how the tab switcher works, but this wasn't done just to fix a single test failure, but hundreds of them.

Here is a "before" try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=514c0110a4a5b441bd9eafc276988ca2f12e68d5
Here is an "after": https://treeherder.mozilla.org/#/jobs?repo=try&revision=c7500ffbd2036f399256c4e8ea825ce9ff77e0b7
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #8)
> I still have to digest the rest of your comment because I don't know how the
> tab switcher works, but this wasn't done just to fix a single test failure,
> but hundreds of them.
> 

Right - the assertion change was done to avoid many test failures, that I understand.

But, as I understand, the original reason why it's necessary to activate the docShell earlier like this is because of a single test that kicks off paint specially from JS, right? That's what I recall from our meeting last week about this. Can you point me at this test?
(In reply to Mike Conley (:mconley) from comment #9)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #8)
> > I still have to digest the rest of your comment because I don't know how the
> > tab switcher works, but this wasn't done just to fix a single test failure,
> > but hundreds of them.
> > 
> 
> Right - the assertion change was done to avoid many test failures, that I
> understand.

No, sorry, I didn't explain correctly.  The original reason why I needed to activate the docshell earlier was because of these hundreds of test failures all over the place (the try pushes above).

Without the assertion fix, there are fewer test failures (but still more than a single one), see this try push: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=a94c4150576a91f3415fd88a35e3fd21635ccb72>

Apologies for having confused you earlier.
Flags: needinfo?(ehsan)
billm, how do you feel about the assertion change that's happening here?

By putting the docShellIsActive = true outside of the async tab switcher, the state of that tab defaults to STATE_UNLOADED even though the request to get layers has been made... this is why we're asserting when onLayersReady is called on the tab with STATE_UNLOADED.

Should we just get rid of this assertion?
Flags: needinfo?(wmccloskey)
The tab switcher really needs to know whether we have layers or not. Changing the assertion definitely seems wrong to me.

I guess I don't understand the original problem. We do eventually set docShellIsActive on the tab, so it should get painted at that point. Is the problem that the test starts running sooner, and expects a MozAfterPaint event before that? If so, why isn't there a problem without the async window.open stuff? Couldn't the JS still run before the docShellIsActive=true message arrives?
Flags: needinfo?(wmccloskey)
The problem is that the test window.open()s a new window which loads a paint that during its onload handler causes a paint from JS by calling nsIDOMWindowUtils.advanceTimeAndRefresh() and then expects this paint to cause layer updates.  There is a race condition between this paint and the delivery of the SetDocShellIsActive message to the content process, because the view manager refuses to paint the presshell if the widget is marked as invisible (which means the docshell is inactive) but you still get a MozAfterPaint event in either case, but if no real paint happened the MozAfterPaint won't include any invalidated rects, which is the reason the test failure that I was trying to debug happened.  The change I had made was basically make window.open() a lot faster, and therefore make advanceTimeAndRefresh() run a lot earlier and therefore the view manager would refuse to paint.  I think this race condition would even manifest before my patch in the form of intermittent test failures.

It is important the situation in this test cannot happen in the real world since there is no API for a web page to synchronously paint from JS live advanceTimeAndRefresh() does.

As the solution, I decided to mark the docshell as active as soon as I created the browser element when opening a new tab in this case.  And this assertion started to fire as a result.  I think what I have done is effectively extend the meaning of STATE_UNLOADED to also include "unloaded but going to receive layers soon when the SetDocShellIsActive message is recieved in the content process".  Perhaps I should add a flag for that state and assert explicitly on that?
ni?ing billm for feedback on ehsan's idea in comment 13
Flags: needinfo?(wmccloskey)
Hmm, I'm not really sure what to do here. It seems like these tests are just broken. But I guess there are too many to fix them all. Is there any way to avoid sending the MozAfterPaint event until we've actually painted?

One thing I worry about with this patch is that it seems to make the docshell active even if we're not switching to the new tab. That is, it doesn't consider the bgLoad value here:
http://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/browser/base/content/tabbrowser.xml#1589
That seems pretty wrong to me. Is that behavior also required for these tests? If it is, then I'm much more concerned.

I guess I can't think of any actual problem that would arise from weakening the assertion. Generally, though, the tab switcher code assumes that it's possible to know which docshells are active when the switcher starts. This patch violates that since some docshells will be active and the tabswitcher won't know it. I guess the worst thing that would happen here is that we would needlessly have to pay an IPC extra roundtrip to figure out that the docshell is active even though we thought it wasn't. But that's not a regression, so I guess it's fine.

Another idea we might consider is to "seed" the tab switcher so it knows this docshell has already been requested. That shouldn't be too hard since the code to do the switch is close by:
http://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/browser/base/content/tabbrowser.xml#1617
Flags: needinfo?(wmccloskey)
WONTFIX'ing based on bug 1343728 comment 16
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: