Perma-checkerboarding and broken paints after tab switching, sometimes (with Ctrl+Tab); fixed by another tab switch
Categories
(Firefox :: Tabbed Browser, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox84 | --- | unaffected |
firefox85 | + | verified |
firefox86 | --- | verified |
People
(Reporter: mstange, Assigned: emilio)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: correctness, regression)
Attachments
(6 files)
4.22 MB,
video/webm
|
Details | |
1.13 KB,
text/plain
|
Details | |
27.61 KB,
application/octet-stream
|
Details | |
48 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
155.74 KB,
image/png
|
Details | |
48 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
This happened to me about 5 times this week. It never happened to me before.
Basically, what happens is that a tab simply stops painting. The process is still responsive, APZ still works (albeit with perma-checkerboarding), the mouse cursor updates as I move the mouse, but the screen just never updates with new content. Until I switch tabs, that is!
One time it started happening after I opened a link in a background tab. The other times it happened after a tab switch.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Markus, is this with webrender enabled or disabled?
Reporter | ||
Comment 2•3 years ago
•
|
||
WebRender enabled.
I just hit this again, with build ID 20201218095607. Profile: https://share.firefox.dev/3hcpY0F
It looks like the tab thinks it's in the background. I found one refresh tick call, here: https://share.firefox.dev/37tI7Ua
It was triggered by InactiveRefreshDriverTimer::TimerTickOne
. The inactive refresh driver timer should only be used for background tabs.
Next time this happens, I will check what gBrowser.selectedBrowser.docShellIsActive
on the browser console says.
Updated•3 years ago
|
Reporter | ||
Comment 3•3 years ago
|
||
Just happened again. gBrowser.selectedBrowser.docShellIsActive
is false
. Is the tab switcher maybe not activating the tab's docshell? Or is it trying to activate it but some other bug causes it to not stick?
Updated•3 years ago
|
Reporter | ||
Comment 5•3 years ago
|
||
I think this mostly happens if I switch to the tab using Ctrl+Tab. I have browser.ctrlTab.recentlyUsedOrder
enabled. The duplicate bug 1683782 also mentions Ctrl+Tab.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 6•3 years ago
|
||
This is a long shot, but could it be related to bug 1625590 (ctrl+tab
+ macos native fonts => high memory) which just landed for 85?
Updated•3 years ago
|
Comment 8•3 years ago
|
||
I did my best to find a regression window for what I observe on Linux (see bug 1684643) by opening several tabs with different websites by repeatedly using Ctrl+T and entering a random URL so that the pages were loaded in the background, and then switching between tabs with Ctrl+Tab (thanks for this hint!). I could reproduce the bug most easily with Twitter pages, but that might be a coincidence.
Result:
Last good revision: 8ade6a20744e637d81f8e3bfecd05526d144d8a3
First bad revision: a3add3f43cbcbfdf053d901689882975479842eb
Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=8ade6a20744e637d81f8e3bfecd05526d144d8a3&tochange=a3add3f43cbcbfdf053d901689882975479842eb
This is bug 1678786, and there already is bug 1682158 as a regression. I have no idea what this is about. Does this explain your observations?
Comment 9•3 years ago
|
||
I will clear out the flag for regressionwindow-wanted condiering the fact that Viktor provided one.
Comment 10•3 years ago
|
||
Bug 1678786 only touched a test file so it can't have caused this.
Comment 11•3 years ago
|
||
Sorry, I should have seen that.
When I used mozregression, I found that the Nightly build 2020-12-11 (changeset 1130661c79c222fb1acd29b7ec5dc5202cdd0d2d) is not affected by this bug. Although I had tested this one for quite long, I tested it again and still couldn't reproduce this bug, so I am quite sure that the changeset we are looking for is listed in the following pushlog containing 11 changesets:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=1130661c79c222fb1acd29b7ec5dc5202cdd0d2d&tochange=a3add3f43cbcbfdf053d901689882975479842eb
Looking at the bugs I think bug 1635914 looks related, so I tested the corresponding revision and could reproduce this bug after about 10 minutes. The pushlog from above can reduced to:
Updated•3 years ago
|
Updated•3 years ago
|
Comment 12•3 years ago
|
||
(Continuation from comment 11)
Emilio, can you help us here? I think bug 1635914 seems to have caused this bug.
Assignee | ||
Comment 14•3 years ago
|
||
To be clear, str is enabling the pref so that control+tab switches tabs, and just tabswitching for a while right? Curious, do you have fission enabled?
I'm not at a computer right now but the other thing I'd check is whether there's any tab switcher assertion when this happens on a debug build.
Assignee | ||
Comment 15•3 years ago
|
||
Viktor, do you know how could I more reliably reproduce this? Could you post the steps you found to do the bisection? I only could reproduce this once, and using quite a weird combination of actions involving restoring tabs and so on (and I haven't been able to repro it ever since). FTR:
so, open tab in the background, wait for it to load, close (with the context menu), Ctrl+shift+T to restore it, then Ctrl+Tab to switch to the original tab, then Ctrl+Tab again
Thanks in advance.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 16•3 years ago
|
||
(In reply to Viktor Jägersküpper from comment #8)
I did my best to find a regression window for what I observe on Linux (see bug 1684643) by opening several tabs with different websites by repeatedly using Ctrl+T and entering a random URL so that the pages were loaded in the background, and then switching between tabs with Ctrl+Tab (thanks for this hint!). I could reproduce the bug most easily with Twitter pages, but that might be a coincidence.
I also tried going to twitter, literally ctrl+clicking everything I found (so that it gets loaded in the background), then switching tabs with Ctrl+Tab, but also no dice.
Do you need to tab-switch while the page is loading? Or after it's loaded?
Assignee | ||
Comment 17•3 years ago
|
||
Also, curious if this reproduces with browser.tabs.remote.warmup.enabled=false
.
Assignee | ||
Updated•3 years ago
|
Comment 18•3 years ago
|
||
I am just trying to find the shortest STR, give me a few minutes. The problem is that I can't easily reproduce the bug, I have to try many many times.
Comment 19•3 years ago
|
||
These are my STR:
- Go to a random website in the first tab and wait until it's loaded (I used bugzilla.mozilla.org)
- Open a second tab with
Ctrl
+T
, type a random Twitter URL (e.g. twitter.com/firefoxnightly) and hitEnter
- Immediately open another tab with
Ctrl
+T
and wait about 15 seconds so that the page in the second tab should have been loaded - Enter another Twitter URL and immediately switch to second tab with
Ctrl
+Tab
- Scroll down and hope that you can reproduce this bug
- If you weren't successful, wait about 15 seconds so that the page in the third tab should have been loaded
- Switch to the third tab with
Ctrl
+Tab
, scroll down and hope that you can reproduce this bug - If you still can't reproduce the bug, close the second and third tab and repeat steps 2-5/7
Unfortunately it can take some time to reproduce the bug this way, so this is really ugly. I am not even sure if Ctrl
+Tab
is relevant, I used it because Markus and Albert mentioned it. I don't know why the first tab is necessary, but without it I couldn't reproduce the bug.
For me (on Linux) browser.ctrlTab.recentlyUsedOrder
is enabled by default. I don't have fission enabled.
browser.tabs.remote.warmup.enabled
is set to true
by default for me. I will test with setting it to false
, although I don't know what this is about. I will also test with a debug build. (Not clearing the needinfo at the moment, I will test tomorrow.)
Comment 20•3 years ago
|
||
Since I just observed the bug on a live blog news site, you should know that I have a very slow CPU (Intel Core 2 Duo) and I think any site that is "bloated" like Twitter (Javascript plus images and whatever) might do the trick so that the page isn't loaded yet when you switch tabs with Ctrl
+Tab
.
Assignee | ||
Comment 21•3 years ago
|
||
Awesome, thanks! I tweaked a bit that method, using session-restore, and I seem to be able to reproduce it somewhat consistently like this. Basically, having two twitter tabs open, then Ctrl+Tab to the second so it starts loading (it's nice that we remember recently used tabs across a restart, was not expecting that!), wait for a paint, then quickly Ctrl+Tab back, then Ctrl+Tab once everything seems idle on the last tab.
I have a bunch of other twitter tabs in the background, but those are leftovers from a test.
I'll try to play myself with those prefs, but this gives me a way to investigate this. Thank you!
Assignee | ||
Comment 22•3 years ago
|
||
Catching this on rr has been tricky so far (since it's not so consistent as I thought).
Here's a logging patch I'm using. Will attach the log asap.
Assignee | ||
Comment 23•3 years ago
|
||
Interesting context is 44. I think this is the relevant bit, and there seems to be a race between the parent and the child process:
console.log: "set docShellIsActive(true): 44, false, https://twitter.com/firefoxnightly, set docShellIsActive@chrome://global/content/elements/browser-custom-element.js:398:142\nmaybeActivateDocShell@resource:///modules/AsyncTabSwitcher.jsm:552:7\nupdateDisplay@resource:///modules/AsyncTabSwitcher.jsm:481:16\npostActions@resource:///modules/AsyncTabSwitcher.jsm:673:10\nhandleEvent@resource:///modules/AsyncTabSwitcher.jsm:1116:12\nqueueUnload@resource:///modules/AsyncTabSwitcher.jsm:1052:10\nrequestTab@resource:///modules/AsyncTabSwitcher.jsm:1048:10\nupdateCurrentBrowser@chrome://browser/content/tabbrowser.js:996:31\n_setupEventListeners/<@chrome://browser/content/tabbrowser.js:5287:16\nset selectedIndex@chrome://global/content/elements/tabbox.js:197:14\nset selectedPanel@chrome://global/content/elements/tabbox.js:216:7\nset selectedIndex@chrome://global/content/elements/tabbox.js:545:11\nset selectedItem@chrome://global/content/elements/tabbox.js:565:35\nset selectedTab@chrome://global/content/elements/tabbox.js:87:11\nset selectedTab@chrome://browser/content/tabbrowser.js:290:7\nctrlTab_close@chrome://browser/content/browser-ctrlTab.js:427:9\nctrlTab_pick@chrome://browser/content/browser-ctrlTab.js:345:12\nctrlTab_handleEvent@chrome://browser/content/browser-ctrlTab.js:598:18\n"
[parent 267120] DidSetIsActive(44, 0 -> 1)
[child 267622] DidSetIsActive(44, 0 -> 1)
[child 267622] DidSetIsActive(44, 1 -> 0)
[parent 267120] DidSetIsActive(44, 1 -> 1)
[parent 267120] DidSetIsActive(44, 1 -> 0)
It seems like The parent activates, then the child, then the child overrides it at some point... I believe unless Chrome JS code is doing funky stuff (which I don't think is doing) the only possible callsite is this one. Which I already flagged as suspect, but was replacing this.
So I that call was wrong before, but now that's propagating to the parent process and we don't handle no-op activeness changes it is causing even more badness...
Assignee | ||
Comment 24•3 years ago
|
||
I flagged this as sketchy before (though it was trying to preserve
existing behavior).
However now that that state propagates to the parent process and races
with the state that the parent process reads, it started causing
correctness issues.
Just remove this line, it shouldn't be needed. I'm not sure how to write
a test for this, unfortunately :(
Assignee | ||
Comment 25•3 years ago
|
||
Viktor, whenever the build with the patch is ready here, can you confirm the patch fixes the issue?
I haven't been able to repro the issue with the patch, but I'm not reproducing so consistently so sanity-checking would be awesome.
Thanks a lot for helping us track this down.
Comment 26•3 years ago
|
||
Unfortunately the issue is still there. I used my STR from comment 19 and reproduced the bug at the first attempt. To be sure, I repeated the steps several times until I reproduced it again.
Just to be sure that I didn't mess this up, this is how I downloaded the try build: On the treeherder page I clicked on "+3" behind "Bpgo" in the "Linux x64 shippable opt" line, then I clicked on "B", then I clicked on "Artifacts" (appearing below) and then I downloaded the "target.tar.bz2" file from the list. I used a new profile for testing.
Assignee | ||
Comment 27•3 years ago
|
||
hmmm, oh well, thanks for testing, will dig a bit more then / keep trying to repro with that patch too.
Ugh, I wish this was not so timing dependent so I could capture this with rr...
Assignee | ||
Comment 28•3 years ago
|
||
To be clear, when you reproduce the issue with the patch applied, if you press Ctrl+Shift+Alt+I and go to the console tab, and type gBrowser.tabs[gBrowser.tabs.length - 1].linkedBrowser.docShellIsActive
, the result is still false
?
Comment 29•3 years ago
|
||
Will check that.
I couldn't reproduce the bug after setting browser.tabs.remote.warmup.enabled=false
during about 15 minutes of repeated testing.
Comment 30•3 years ago
|
||
I couldn't reproduce the bug after setting
browser.tabs.remote.warmup.enabled=false
during about 15 minutes of repeated testing.
With the regular Nightly build.
Comment 31•3 years ago
|
||
I get gBrowser is not defined
(see attached screenshot), but I also get this with my regular Nightly build. Am I doing something wrong?
Assignee | ||
Comment 32•3 years ago
|
||
Yes, you don't want the devtools console, you want the browser console, you can enable from DevTools -> Settings -> Enable broser chrome and add-on debugging toolboxes + Enable remote debugging.
Comment 33•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #28)
To be clear, when you reproduce the issue with the patch applied, if you press Ctrl+Shift+Alt+I and go to the console tab, and type
gBrowser.tabs[gBrowser.tabs.length - 1].linkedBrowser.docShellIsActive
, the result is stillfalse
?
The result is still false
.
In the meantime I reproduced the bug with fission enabled for a regular Nightly build.
Do you still want me to try a (recent) debug build and attach the terminal output here? (I know how I can get a debug build with mozregression: mozregression --repo autoland --build-type debug --launch 2021-01-06
)
Assignee | ||
Comment 34•3 years ago
|
||
There's JS running since we save the active status till we restore it,
so arbitrary things can happen, including receiving an IPC message from
the child saying that we're now really active.
If then we restore the wrong (old) status, stuff gets confused and
sadness ensues.
Screenshotting background tabs seems to work without this so it's not
clear to me why messing with the activeness state was necessary to begin
with.
Assignee | ||
Comment 35•3 years ago
|
||
(In reply to Viktor Jägersküpper from comment #33)
(In reply to Emilio Cobos Álvarez (:emilio) from comment #28)
To be clear, when you reproduce the issue with the patch applied, if you press Ctrl+Shift+Alt+I and go to the console tab, and type
gBrowser.tabs[gBrowser.tabs.length - 1].linkedBrowser.docShellIsActive
, the result is stillfalse
?The result is still
false
.
Bummer, then I think there may be yet another bug lurking around, but I think the other patch I posted is correct.
Do you still want me to try a (recent) debug build and attach the terminal output here? (I know how I can get a debug build with mozregression:
mozregression --repo autoland --build-type debug --launch 2021-01-06
)
No, no worries, that's super-helpful already. I discarded the CrossProcessPaint code because it looked synchronous to me, and because I didn't think it'd be used in this case, but after dumping a couple stacks I managed to reproduce it and see where it goes wrong / what overrides the activeness again from the child (and then propagates to the parent).
In hindsight, it seems obvious how that code is involved... Ctrl+T shows a switcher with tab captures, that are created on the fly using this API... And that code is not really necessarily synchronous because flushing notifications and painting can do IPC...
If you have the time, can you confirm that with this build you can no longer see the issue?
If you can, I'd be a bit baffled, but hey, I've been wrong before so... :-)
Comment 36•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #35)
If you have the time, can you confirm that with this build you can no longer see the issue?
Good news! I couldn't reproduce the bug with this build (using a fresh profile with fission disabled by default).
If you have to change the patches, feel free to ask me again for testing.
Assignee | ||
Updated•3 years ago
|
Comment 37•3 years ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/badff6002a34 BrowserChild::MakeHidden() shouldn't mess with tab state. r=nika
Comment 38•3 years ago
|
||
bugherder |
Comment 39•3 years ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f4af0087a1b4 CrossProcessPaint code shouldn't mess with tab state from the content process. r=mattwoodrow,NeilDeakin
Assignee | ||
Updated•3 years ago
|
Comment 40•3 years ago
|
||
bugherder |
Assignee | ||
Comment 42•3 years ago
|
||
Comment on attachment 9195771 [details]
Bug 1683188 - BrowserChild::MakeHidden() shouldn't mess with tab state. r=nika
Beta/Release Uplift Approval Request
- User impact if declined: Comment 0
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: See various comments for different str which reproduce slightly more frequently.
- List of other uplifts needed: None
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): Somewhat trivial removals, but in a tricky area. However I think these should be safe enough for uplift.
- String changes made/needed: none
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 43•3 years ago
|
||
Unfortunately, I cannot confirm this fix, since I was not able to reproduce the initial issue. I've tried on two different mac machines (OS version 10.15 and 11.1), by following the steps from comment 19 and comment 21, on Nightly 86.0a1 (2020-12-17).
Markus, can you please help us checking if this bug is fixed on latest Nightly 86.0a1?
Updated•3 years ago
|
Comment 44•3 years ago
|
||
Would it be safer to back out the regressing change from 85? We're about to build the last beta today.
Assignee | ||
Comment 45•3 years ago
|
||
The regressing change blocked some other fixes so it's not clear to me everything that should be backed out. Uplifting these seems less risky to me.
Reporter | ||
Comment 46•3 years ago
|
||
(In reply to Ciprian Georgiu [:ciprian_georgiu], Release Desktop QA from comment #43)
Markus, can you please help us checking if this bug is fixed on latest Nightly 86.0a1?
I'll report if I run into it again. If I don't hit it during the next 8 hours, it's probably fixed.
Comment 47•3 years ago
|
||
Comment on attachment 9195771 [details]
Bug 1683188 - BrowserChild::MakeHidden() shouldn't mess with tab state. r=nika
approved for 85.0b9
Updated•3 years ago
|
Comment 48•3 years ago
|
||
bugherder uplift |
Comment 49•3 years ago
|
||
I haven't seen the issue the whole day since updating my Dev Edition today to 85.0b9
Reporter | ||
Comment 50•3 years ago
|
||
I haven't seen this again on Nightly since the patch landed.
Comment 51•3 years ago
|
||
Comment on attachment 9195931 [details]
Bug 1683188 - CrossProcessPaint code shouldn't mess with tab state from the content process. r=mattwoodrow
(In reply to Emilio Cobos Álvarez (:emilio) from comment #34)
Screenshotting background tabs seems to work without this so it's not
clear to me why messing with the activeness state was necessary to begin
with.
The removed code was added to fix bug 1657036, but presumably bug 1635914 fixed that in a better way.
Assignee | ||
Comment 52•3 years ago
|
||
Yeah, that came up during review of that patch :)
Comment 53•3 years ago
|
||
Marking this verified as fixed on latest Nightly and Beta, based on comments 49 and 50. Thank you!
Updated•3 years ago
|
Description
•