Closed Bug 1409289 Opened 4 years ago Closed 3 years ago

TalosError("timeout") on tabswitch

Categories

(Core :: Graphics: WebRender, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox57 --- unaffected
firefox58 --- unaffected
firefox62 --- fixed

People

(Reporter: vliu, Assigned: kats)

References

Details

(Whiteboard: [wr-reserve])

Attachments

(1 file, 1 obsolete file)

It is still has this timeout error on try if we enabled talos-g2 in qr-talos field. This bug intends to fix it and enable this test.
Assignee: nobody → vliu
From looked into it in local, I found event.transactionId read as a zero value so skipped the check. I will look into it more. Also, I will further verify the problem I saw in local is the same on try. 

[1]: http://searchfox.org/mozilla-central/rev/a984558fa2bbde6492d3fb918496fc0b0835b2ce/testing/talos/talos/tests/tabswitch/bootstrap.js#314
It is obvious that we miss overriding GetLastTransactionId in WebRenderLayerManager. The patch also pass the g2 on try. Please see [2] for details. Could you please have a review for the patch? Thanks.

[2]:https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5cf4a655b40bc81a2f3c3974d228ca9f5ef275e&selectedJob=137482480
Attachment #8919236 - Flags: review?(bugmail)
Comment on attachment 8919236 [details] [diff] [review]
0001-Bug-1409289-Override-GetLastTransactionId-in-WebRend.patch

Review of attachment 8919236 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, thanks!
Attachment #8919236 - Flags: review?(bugmail) → review+
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [wr-mvp]
It seems that my patch caused a try fail on R4 and R8 in [3]. I have to take time to confirm that.

[3]:https://treeherder.mozilla.org/#/jobs?repo=try&revision=72aabaeddb3a568721f091264be3ad7d00842a6a
Whiteboard: [wr-mvp] → [wr-reserve]
The R4 and R8 failures are due to the mechanism of webrender's NotifyInvalidation. Those tests are about checking the snapshot of the animation when we receive requestAnimationFrame callback.
The original gecko triggers the NotifyInvalidation by calculating the invalidation of layer tree. For Webrender, we trigger the NotifyInvalidation when there are style changes or frame changes, but we cannot identify if there is any real invalidation. So for those tests, webrender will wait until the animation finishes, and the original gecko will stop the test once it receives requestAnimationFrame().
Blocks: 1416634
(In reply to Ethan Lin[:ethlin] from comment #6)
> The R4 and R8 failures are due to the mechanism of webrender's
> NotifyInvalidation. Those tests are about checking the snapshot of the
> animation when we receive requestAnimationFrame callback.
> The original gecko triggers the NotifyInvalidation by calculating the
> invalidation of layer tree. For Webrender, we trigger the NotifyInvalidation
> when there are style changes or frame changes, but we cannot identify if
> there is any real invalidation. So for those tests, webrender will wait
> until the animation finishes, and the original gecko will stop the test once
> it receives requestAnimationFrame().

Hi Matt,
From above comment, I wonder it is similar to the issue Bug 1419851 Comment 29. Before the patch in this bug, it is easy to call DelayedFireDOMPaintEvent because mTransactions[i].mTransactionId in [1] always be zero. With the patch in this bug, it is hard to satisfy this condition because animation keep painting. Do you think of any possible way to do it? Thanks 


[1]: https://searchfox.org/mozilla-central/rev/9d920555ec81f1c9d4b7fa3b08e23eb88efb60e1/layout/base/nsPresContext.cpp#2767
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(matt.woodrow)
Assignee: vincent.liu1013 → bugmail
Unfortunately bug 1437036 didn't magically fix this. I'll investigate more.
Looks like a race condition between the 'about:tabswitch#auto' page getting loaded by pageloader at [1] and the about: handler getting registered at [2]. When pageloader wins it tries to load the about: page before the handler is registered, and so we just get a "Hmm. That address doesn’t look right. Please check that the URL is correct and try again." page and the test hangs.

I'm not yet sure at a high level how these things get triggered. It seems to me that they're both extensions loaded into the browser so there's no guarantee of which one runs first, but I'll dig into it a bit more.

[1] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/pageloader/chrome/pageloader.js#325
[2] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/tests/tabswitch/content/tabswitch-content-process.js#39
So, the 'tabswitch' webextension entry point is the background.js file [1] which is run by the browser as a background script [2] when the webextension is loaded. The pageloader addon entry point is the bootstrap.js [3] startup function, which installs a command-line handler and loads pageloader.xul and then when that's loaded, runs the plInit function [4].

The correct fix here would be to have the pageloader addon wait until the rest of the addons/webextensions are loaded before proceeding, but as far as I can tell there's no API to actually detect this. The reason it works at all right now (in non-WebRender) is because the magical sleep delays at [5] are long enough that the webextensions get a chance to do their setup. So for WebRender purposes I think increasing the magic numbers should be good enough.

[1] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/tests/tabswitch/background.js#13
[2] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/tests/tabswitch/manifest.json#12
[3] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/pageloader/bootstrap.js#107
[4] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/pageloader/chrome/pageloader.js#103
[5] https://searchfox.org/mozilla-central/rev/2b9779c59390ecc47be7a70d99753653d8eb5afc/testing/talos/talos/pageloader/chrome/pageloader.js#241-242
Attachment #8919236 - Attachment is obsolete: true
Comment on attachment 8975894 [details]
Bug 1409289 - Increase magic number delay to satisfy increased WebRender initialization overhead.

https://reviewboard.mozilla.org/r/244088/#review250038

please test this on all talos jobs with --rebuild 10; I really want to make sure this doesn't break anything else.
Attachment #8975894 - Flags: review?(jmaher) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e4bbc6fc7355
Increase magic number delay to satisfy increased WebRender initialization overhead. r=jmaher
https://hg.mozilla.org/mozilla-central/rev/e4bbc6fc7355
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.