Closed Bug 1317212 Opened 8 years ago Closed 7 years ago

[E10S a11y] tab-throbber in tab is forever spinning after detaching the tab

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

51 Branch
x86
Windows 10
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 + wontfix
firefox52 + verified
firefox53 --- verified

People

(Reporter: alice0775, Assigned: mconley)

References

Details

(Keywords: multiprocess, regression, ux-mode-error)

Attachments

(2 files)

[Tracking Requested - why for this release]:tab-throbber in the detached tab is forever spinning after detach tab

Build Identifier:
https://hg.mozilla.org/mozilla-central/rev/b37be3d705d929ee52280051d58cedc70a47626f
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 ID:20161113030203
and
https://hg.mozilla.org/releases/mozilla-aurora/rev/950df3cfb7798d1c6b1f422457a0e398a2b74587
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 ID:20161113004006


Reproduced on Nightly52.0a1 and Aurora51.0a2.
Not reproduced on 50RC2. 

Disabling e10s seems to fix the problem.


Reproducible: 100%

Steps To Reproduce:
1. Open tab more than 2 tabs
   (e.g., [about:home][about:home])
2. Wait to finish loading tabs
3. Detach a tab

Actual Results:
tab-throbber in the detached tab is forever spinning.

Expected Results:
tab-throbber should not be spinning.


Regression Window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9e9080a2e3462fc576b679603853f4395fadff8f&tochange=75abc730f8202005d6469a7ec324b7aba5a89374

Regressed by: 75abc730f820	Kartikaya Gupta — Bug 1300421 - Back out 4 csets from bug 1288760 for regressing event coordinate reporting. r=jfkthame
Flags: needinfo?(bugmail)
Component: Tabbed Browser → Event Handling
Product: Firefox → Core
Version: 52 Branch → 51 Branch
Track 51+/52+ as new regression in 51.
Hi Kartikaya,
It seems caused by bug 1300421, can you help shed some light here?
Interesting, the change was a backout and seems pretty unrelated to the STR. I can reproduce the bug as described using my OS X 2016-11-10 nightly with my default profile (which has various addons included TreeStyleTabs). However I *cannot* reproduce it on the same nightly when I launch it with a clean profile via mozregression. I'll try on Windows too, leaving needinfo for now.
I always tested with clean new profile. But in this case, it needs to enable e10s.

FYI, I re-tested and I got same regression window.
I played around with this, and I can reproduce it some of the time but not always. On Windows, if I start with the window maximized, and then drag the tab down into the content area to have it detach into a new window, I don't see the problem. However if I have the window non-maximized in the upper-left corner of my screen, and drag the tab outside the window to the right side, I see the problem. I was trying other scenarios and I haven't yet nailed down exactly what the circumstances are that cause this to occur - for example with the window non-maximized, if I drag the tab down into the content area it happens sometimes but not always.
I'm not able to consistently reproduce this. Usually I can repro it within a few attempts but sometimes it takes 10+ attempts to drag the tab before I hit it, so I don't know if I can trust it. Alice, can you describe how you are dragging the tab - where your original window is positioned, where you are dragging to, etc? As much detail as possible would be helpful. This might just be a intermittent bug but you seem to be able to reproduce it 100% so if you are doing something differently from me I'd like to know what.
Flags: needinfo?(bugmail) → needinfo?(alice0775)
I think the Bug 1300421 expose an existing bug more frequently.
Because, [A]when drag and drop a tab on contents area. --- Bug 1300421 triggers this problem almost 100%.
However, [B]when drag and drop a tab on Desktop above Title bar, It happens before landing Bug 1300421 but it is very rare(less than 10%).
Flags: needinfo?(alice0775)
Screen cast: https://youtu.be/4TTxiEVg7gw
(In reply to Alice0775 White from comment #4)
> But in this case, it needs to enable e10s.

Ah, makes sense. Probably attachment 8776054 [details] [diff] [review] should be reapplied.
It will not regress bug 1300421. :jfkthame, thoughts?
Flags: needinfo?(jfkthame)
(In reply to Masatoshi Kimura [:emk] from comment #9)
> (In reply to Alice0775 White from comment #4)
> > But in this case, it needs to enable e10s.
> 
> Ah, makes sense. Probably attachment 8776054 [details] [diff] [review]
> should be reapplied.
> It will not regress bug 1300421. :jfkthame, thoughts?

That sounds right to me; I think that patch should be OK, independent of the rest of the patches that landed and were backed out in 1288760.
Flags: needinfo?(jfkthame)
I've rebased patch 2.3 from bug 1288760 to current mozilla-central, and pushed to try at   https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa58d3d99caa88ba8ac0e4b7dee4118f781ec246 so we can see how that goes.
Alice, could you confirm whether the try build from comment 11 helps with this issue? Thanks!
Flags: needinfo?(alice0775)
(In reply to Jonathan Kew (:jfkthame) from comment #11)
> I've rebased patch 2.3 from bug 1288760 to current mozilla-central, and
> pushed to try at  
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=aa58d3d99caa88ba8ac0e4b7dee4118f781ec246 so we can
> see how that goes.

Nothing is changed. The problem is still reproduced.
https://hg.mozilla.org/try/rev/aa58d3d99caa88ba8ac0e4b7dee4118f781ec246
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0 ID:20161115032324
Flags: needinfo?(alice0775)
So apparently that doesn't help... if you have time to investigate anything further, that'd be great.
Flags: needinfo?(VYV03354)
Interestingly, 
Nightly53.0a1 works properly if force disable accessibility.
i.e., accessibility.force_disabled = 1 in about:config: (need restart browser).

So,  In order to reproduce the problem, 
It seems to need [1]enabled e10s(default) && [2]accessibility device(maybe IME) is activated on Windows.
Blocks: 1310788
Summary: [e10s] tab-throbber in tab is forever spinning after detaching the tab → [E10S a11y] tab-throbber in tab is forever spinning after detaching the tab
And also, setting accessibility.ipc_architecture.enabled = false seems fix the problem.
Flags: needinfo?(surkov.alexander)
Drive by comment: I wonder if https://hg.mozilla.org/mozilla-central/rev/387d3acae9e9 (landed yesterday) makes the problem better or worse?
Trevor is probably right person to ask about a11y e10s on OSX and Windows.

Do we have any stacks of where it hangs?
Flags: needinfo?(surkov.alexander)
(In reply to David Bolter [:davidb] from comment #17)
> Drive by comment: I wonder if
> https://hg.mozilla.org/mozilla-central/rev/387d3acae9e9 (landed yesterday)
> makes the problem better or worse?

It seems that nothing is changed.
I can still reproduce the problem with almost 100%.

https://hg.mozilla.org/mozilla-central/rev/28e2a6dde76ab6ad4464a3662df1bd57af04398a
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0 ID:20161118030222
(In reply to alexander :surkov from comment #18)
> Do we have any stacks of where it hangs?

or perf profile?
Hey Alice0775, I'm unable to see the video at https://www.youtube.com/watch?v=4TTxiEVg7gw&feature=youtu.be - apparently it's listed as "private". Can you please make the video public instead?
Flags: needinfo?(alice0775)
(In reply to Mike Conley (:mconley) from comment #21)
> Hey Alice0775, I'm unable to see the video at
> https://www.youtube.com/watch?v=4TTxiEVg7gw&feature=youtu.be - apparently
> it's listed as "private". Can you please make the video public instead?

sorry. I modified it.
Flags: needinfo?(alice0775)
Flags: needinfo?(mconley)
I can easily reproduce this on my touchscreen laptop. I've started investigating.
Hi :mconley,
Do you have any updates/progress here?
Yes, I think I've sorted this out. Patch and explanation coming up.
Flags: needinfo?(mconley)
(Explanation is in the commit message, comment 26)
Attachment #8815041 - Flags: review?(gijskruitbosch+bugs)
Some orange in my try push to sort out.
Flags: needinfo?(mconley)
Flags: needinfo?(VYV03354)
Assigning since it looks like Mike's working on this.
Assignee: nobody → mconley
(In reply to Andrew Overholt [:overholt] from comment #29)
> Assigning since it looks like Mike's working on this.

Yes, I'm hoping to get this addressed soon-ish, but I suspect any fix will be too risky for 51. We should probably wontfix it there (I just did this).
Attachment #8815041 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8818366 [details]
Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank.

Hey Gijs, wondering what you think of this - especially since you've semi-recently dealt with nsIWebProgress-like sync issues between content and parent for the URL bar.

Basically, the problem is that with the other patch in this series, I believe we have the possibility of _missing_ legitimate nsIWebProgress messages with mBlank set to true in the tab progress listener. Consider this scenario:

1) Current tab is non-remote
2) Some browser code somewhere hooks up a progress listener on the tab waiting for onLocationChange.
3) User navigates tab to a remote-friendly URL X
4) Remoteness swap occurs, and due to the other patch in this series, a new mTabProgressListener is constructed which has the mBlank member set to true, so it'll ignore the first set of messages until mBlank is set to false
5) The remote browser is told to load X before the initial about:blank load completes / messages are sent to the parent.
6) Messages for loading X are sent to the parent
7) The parent ignores the onLocationChange for X, because it thinks they're from about:blank
8) Code from (2) never fires. :(

I believe this is what causes browser/components/sessionstore/test/browser_parentProcessRestoreHash.js to fail without this second patch here.

So my proposal is to make mTabProgressListener "smarter" by not just assuming that the first few messages are from about:blank, but actually checking the location to ensure it matches about:blank.

I'm not certain this is the best way of going about this though, and wanted to see if you had additional thoughts.
Flags: needinfo?(mconley) → needinfo?(gijskruitbosch+bugs)
Attachment #8818366 - Flags: feedback?(gijskruitbosch+bugs)
(In reply to Mike Conley (:mconley) from comment #37)
> Comment on attachment 8818366 [details]
> Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress
> notifications from an initial about:blank.
> 
> Hey Gijs, wondering what you think of this - especially since you've
> semi-recently dealt with nsIWebProgress-like sync issues between content and
> parent for the URL bar.
> 
> Basically, the problem is that with the other patch in this series, I
> believe we have the possibility of _missing_ legitimate nsIWebProgress
> messages with mBlank set to true in the tab progress listener. Consider this
> scenario:
> 
> 1) Current tab is non-remote
> 2) Some browser code somewhere hooks up a progress listener on the tab
> waiting for onLocationChange.
> 3) User navigates tab to a remote-friendly URL X
> 4) Remoteness swap occurs, and due to the other patch in this series, a new
> mTabProgressListener is constructed which has the mBlank member set to true,
> so it'll ignore the first set of messages until mBlank is set to false
> 5) The remote browser is told to load X before the initial about:blank load
> completes / messages are sent to the parent.
> 6) Messages for loading X are sent to the parent
> 7) The parent ignores the onLocationChange for X, because it thinks they're
> from about:blank
> 8) Code from (2) never fires. :(
> 
> I believe this is what causes
> browser/components/sessionstore/test/browser_parentProcessRestoreHash.js to
> fail without this second patch here.
> 
> So my proposal is to make mTabProgressListener "smarter" by not just
> assuming that the first few messages are from about:blank, but actually
> checking the location to ensure it matches about:blank.
> 
> I'm not certain this is the best way of going about this though, and wanted
> to see if you had additional thoughts.

I think your diagnosis is correct. However, shouldn't we be setting mBlank to false earlier? Your patch doesn't do that and still checks mBlank for some of the other notifications. Effectively right now we try to set mBlank to false once the first full pageload hits. Perhaps we should be setting it if we detect the STATE_START (or any later changes of a new document ?)
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mconley)
Comment on attachment 8815041 [details]
Bug 1317212 - <xul:browser>'s that flip remoteness should not send progress updates for the initial about:blank load.

https://reviewboard.mozilla.org/r/96040/#review100222

Tentative r=me... this is theoretically correct, as long as we fix the raciness that ensues.
Attachment #8815041 - Flags: review?(gijskruitbosch+bugs) → review+
Attachment #8818366 - Flags: feedback?(gijskruitbosch+bugs)
The problem seems to be fixed on Nightly53.0a1.

I can reproduce the problem on the bad build Nightly52.0a1(2016-11-13).
However, I cannot reproduce on Nightly53.0a1(2017-01-14).

Fixed range
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=275c894ee97095e64df89b6a575754e4949c21ae&tochange=243cf7829d19447e20b6b73b39940615794cdf9a

Fixed by: Bug 1309596
Mike, do we still need to do something in this bug? I expect so... but I'm not sure. :-)
I get the feeling that this only appears fixed because the race that I laid on in comment 37 is still happening, but the side that was winning most often before with the STR is now losing more often. I think the bug is still there, but harder to hit with the STR. :/

Sorry, I got distracted from this bug with a few other things. I'm coming back around to it now.
(In reply to :Gijs from comment #38)
> I think your diagnosis is correct. However, shouldn't we be setting mBlank
> to false earlier? Your patch doesn't do that and still checks mBlank for
> some of the other notifications. Effectively right now we try to set mBlank
> to false once the first full pageload hits. Perhaps we should be setting it
> if we detect the STATE_START (or any later changes of a new document ?)

I've modified it so that if we have mBlank is true, and we see an onStateChange for any location that's not about:blank or not a top-level load, mBlank is set to false immediately. Sound okay?
Flags: needinfo?(mconley)
Comment on attachment 8818366 [details]
Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank.

https://reviewboard.mozilla.org/r/98440/#review105792

::: browser/base/content/tabbrowser.xml:696
(Diff revision 3)
>                } else if (aStateFlags & nsIWebProgressListener.STATE_STOP &&
>                           aStateFlags & nsIWebProgressListener.STATE_IS_NETWORK) {

So if we hit the first case in your if statement, where we're ignoring initial about:blank messages, and we get a state_stop + state_is_network for about:blank, we also enter this block and set the URL bar, remove "busy" and "progress" and so on from the tab... that doesn't seem right. Or am I missing something?

I realize that we were doing this before, but it feels like something we should fix if we think this race condition is likely to keep biting us. So, in particular, in the case where:

```js
(ignoreBlank && aStateFlags & nsIWebProgressListener.STATE_STOP && aStateFlags & nsIWebProgressListener.STATE_IS_NETWORK)
```

(but not the other part of that if statement)

it feels like maybe we should not enter this block? If we get the final 'stop' for about:blank, that is? Do bad things happen in that case, or am I misunderstanding because it's midnight and I'm tired? :-)
Comment on attachment 8818366 [details]
Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank.

https://reviewboard.mozilla.org/r/98440/#review105792

> So if we hit the first case in your if statement, where we're ignoring initial about:blank messages, and we get a state_stop + state_is_network for about:blank, we also enter this block and set the URL bar, remove "busy" and "progress" and so on from the tab... that doesn't seem right. Or am I missing something?
> 
> I realize that we were doing this before, but it feels like something we should fix if we think this race condition is likely to keep biting us. So, in particular, in the case where:
> 
> ```js
> (ignoreBlank && aStateFlags & nsIWebProgressListener.STATE_STOP && aStateFlags & nsIWebProgressListener.STATE_IS_NETWORK)
> ```
> 
> (but not the other part of that if statement)
> 
> it feels like maybe we should not enter this block? If we get the final 'stop' for about:blank, that is? Do bad things happen in that case, or am I misunderstanding because it's midnight and I'm tired? :-)

In fact, I suppose if we don't do that and the page really only loads the initial about:blank, nothing else will happen and so we might not reset state correctly, so not executing this block at all is probably wrong, too. Do we have a way of knowing at this point whether another load is pending already? :-(
Comment on attachment 8818366 [details]
Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank.

https://reviewboard.mozilla.org/r/98440/#review105936

Meh, considering it's not a regression and that it's not clear what to do about my earlier comment r=me
Attachment #8818366 - Flags: review?(gijskruitbosch+bugs) → review+
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1b00058c2288
<xul:browser>'s that flip remoteness should not send progress updates for the initial about:blank load. r=Gijs
https://hg.mozilla.org/integration/autoland/rev/92f50b968d7f
Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank. r=Gijs
https://hg.mozilla.org/mozilla-central/rev/1b00058c2288
https://hg.mozilla.org/mozilla-central/rev/92f50b968d7f
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Please request Aurora approval on this when you're comfortable doing so.
Flags: needinfo?(mconley)
Yeah, I'm gonna let this bake for a few days. Thanks for the reminder.
Comment on attachment 8815041 [details]
Bug 1317212 - <xul:browser>'s that flip remoteness should not send progress updates for the initial about:blank load.

Approval Request Comment
[Feature/Bug causing the regression]:

e10s, and bug 1254669

[User impact if declined]:

There is the potential for a race where a tab's busy / done state might not match reality. (Example - a tab will still show the "busy" spinner, when the page has in reality finished loading). It looks like this race is probably only hit-able by users when doing tab tear out / tear in.

[Is this code covered by automated tests?]:

Tab tear out and tear in has some automated tests, yes. This particular race is, unfortunately, hard to write an automated regression test for.

[Has the fix been verified in Nightly?]:

Actually no. The problem went away for the original reporter, probably due to the fact that other unrelated patches caused the conditions of the "race" to change, even though the potential of the race was still there.

[Needs manual test from QE? If yes, steps to reproduce]: 

Some testing of tab tear out and tear in behaviour would be good. The STR for this particular bug can be found in comment 0.

[List of other uplifts needed for the feature/fix]:

None.

[Is the change risky?]:

I'd say mostly not. 

[Why is the change risky/not risky?]:

We're making the thing that dispatches messages about tab progress events a bit smarter about how it does that... there's always room for error, but I think we're probably okay here. I'm going on my gut here, tbh.

[String changes made/needed]:

None.
Flags: needinfo?(mconley)
Attachment #8815041 - Flags: approval-mozilla-aurora?
Comment on attachment 8818366 [details]
Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank.

See above.
Attachment #8818366 - Flags: approval-mozilla-aurora?
Comment on attachment 8815041 [details]
Bug 1317212 - <xul:browser>'s that flip remoteness should not send progress updates for the initial about:blank load.

fix tab spinner when detaching, beta52+
Attachment #8815041 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
Comment on attachment 8818366 [details]
Bug 1317212 - Make tab progress listeners smarter at ignoring nsIWebProgress notifications from an initial about:blank.

fix tab spinner when detaching, beta52+, should be in b2
Attachment #8818366 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
Requesting qe verification for this issue, per comment 54.  Thanks Mike!
Flags: qe-verify+
I reproduced this issue using 50.0a1, build ID: 20161113030203 on Windows 10 x32.
I can confirm this issue is fixed, I verified Fx 53.0a2, build ID: 20170209004018 and Fx52.0b5, build ID: 20170209120619, on Windows 10 x32.

Cheers!
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Blocks: 1353945
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: