Closed Bug 1211098 Opened 4 years ago Closed 3 years ago

Intermittent e10s browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649 and three more

Categories

(Firefox :: General, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 54
Tracking Status
e10s + ---
firefox51 --- wontfix
firefox52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: philor, Assigned: gkrizsanits)

References

(Blocks 3 open bugs)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=4960132&repo=fx-team


147 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649
148 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | location bar value matches test URI after switching tabs - Got , expected data:text/plain,bug562649
150 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | location bar value matches test URI as the page has loaded - Got , expected data:text/plain,bug562649
153 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | Found an unexpected tab at the end of test run: data:text/plain,bug562649 -
Summary: Intermittent browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649 and three more → Intermittent e10s browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649 and three more
Blocks: e10s-tests
tracking-e10s: --- → +
Mike, looks like you've got some history with this test. Any thoughts?
Flags: needinfo?(mconley)
I'm afraid I don't have any - I just re-enabled the test when I saw that it was working. It looks like there's a race here with the setting of userTypedValue.

Hey Gijs, do you know how userTypedValue is supposed to get set when a tab is removed? I know that you've done some URL bar stuff lately, and I was wondering if you had it swapped into your head.
Flags: needinfo?(mconley) → needinfo?(gijskruitbosch+bugs)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #7)
> I'm afraid I don't have any - I just re-enabled the test when I saw that it
> was working. It looks like there's a race here with the setting of
> userTypedValue.
> 
> Hey Gijs, do you know how userTypedValue is supposed to get set when a tab
> is removed? I know that you've done some URL bar stuff lately, and I was
> wondering if you had it swapped into your head.

userTypedValue is stored per-browser, so if you're checking it on the tabbrowser, you're really getting the one from the selected browser: https://dxr.mozilla.org/mozilla-central/rev/d6ee82b9a74155b6bfd544166f036fc572ae8c56/browser/base/content/tabbrowser.xml#4052-4054

I expect that the tab switch here should be made to use BTU.switchToTab (the async, e10s-compatible version), but it would be good to check the context of bug 562649 to see if having this change be asynchronous is likely to cause problems in practice, or if we should stop providing this forwarding implementation, or make it forward differently in e10s (IIRC we update selectedBrowser async? So I wonder if we should update this sync to the to-be-selected-browser) or... something. :-)

Does that make sense?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mconley)
It makes sense, yes.

What also is interesting is that this intermittent has stopped being reported. I'm closing this one was INCOMPLETE unless we start seeing it again.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(mconley)
Resolution: --- → INCOMPLETE
https://treeherder.mozilla.org/logviewer.html#?job_id=37727822&repo=mozilla-inbound has the first error message, the others might be gone away because of the location bar changes.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo if bug is intermittent) from comment #10)
> https://treeherder.mozilla.org/logviewer.html#?job_id=37727822&repo=mozilla-
> inbound has the first error message, the others might be gone away because
> of the location bar changes.

Has this recently reappeared, and if so do we know if it's correlated with some change we've made? What "location bar changes" do you mean specifically?
Flags: needinfo?(aryx.bugmail)
(and in particular, I wonder if I regressed this with bug 1000458?)
(In reply to :Gijs Kruitbosch from comment #12)
> Has this recently reappeared, and if so do we know if it's correlated with
> some change we've made? What "location bar changes" do you mean specifically?
My comment was under the assumption that this issue hadn't appeared for the months (so I assumed changes in that timeframe), but it already started earlier last Wednesday, the failures got starred against this bug but the bug not reopened.

All 6 starred failures are on OSX 10.10 opt: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1211098&startday=2016-10-01&endday=2016-10-17&tree=all

Didn't spot anything obvious in the list of fixed bugs in Firefox::Location bar or Toolkit::XUL Widgets. There are recent changes about Enter handling like yours in bug 1000458 (landed on central 2016-10-04) or from mak in September, but an earlier failure would have been likely. Maybe a change in Core?
Flags: needinfo?(aryx.bugmail)
We (me and Drew) did a bunch of changes to urlbar code to clarify some codepaths, so it's likely those may have skewed some timings and retriggered this as well.
I don't think it's really useful to go after the culprit here, we should rather try to figure out what paths the test touches that change userTypedValue and whether it's measuring it at the wrong time or not properly detecting a real regression.
Recent bugs include bug 1304501 and bug 1306639, for example, that landed in the last few days.
bug 1000458 is specifically about switching tabs and hitting enter, which is also the failure message in this test, so I'm worried it might be related to that...

It's true that analyzing the test further will likely also help figure out what's wrong. I'll try and have a look later this week.
Flags: needinfo?(gijskruitbosch+bugs)
I can't reproduce and there have been no re-ocurrences today or yesterday. Even if there were 1 or 2 like the other days, this is so low-frequency that I'm not sure how to debug - it would be a slow process of continually adding logging to figure out in what state this is breaking. So going to clear ni for now, and if this becomes more frequent I can investigate.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to :Gijs from comment #8)

> I expect that the tab switch here should be made to use BTU.switchToTab (the
> async, e10s-compatible version), but it would be good to check the context
> of bug 562649 to see if having this change be asynchronous is likely to
> cause problems in practice, or if we should stop providing this forwarding
> implementation, or make it forward differently in e10s (IIRC we update
> selectedBrowser async? So I wonder if we should update this sync to the
> to-be-selected-browser) or... something. :-)
> 
> Does that make sense?

I had to disable this test for landing e10s-multi, because it became more frequent and I didn't want
to block landing on this test. I would love to fix this but I'm not sure I understand the second part
of your comment, could you please elaborate on that so I can give it a try? Thanks!
Flags: needinfo?(gijskruitbosch+bugs)
Blocks: 1315042
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #27)
> (In reply to :Gijs from comment #8)
> 
> > I expect that the tab switch here should be made to use BTU.switchToTab (the
> > async, e10s-compatible version), but it would be good to check the context
> > of bug 562649 to see if having this change be asynchronous is likely to
> > cause problems in practice, or if we should stop providing this forwarding
> > implementation, or make it forward differently in e10s (IIRC we update
> > selectedBrowser async? So I wonder if we should update this sync to the
> > to-be-selected-browser) or... something. :-)
> > 
> > Does that make sense?
> 
> I had to disable this test for landing e10s-multi, because it became more
> frequent and I didn't want
> to block landing on this test. I would love to fix this but I'm not sure I
> understand the second part
> of your comment, could you please elaborate on that so I can give it a try?
> Thanks!

I don't know anymore, it was 8 months ago. I also can't see the test failures you had with e10s-multi and I'm worried other things broke than this intermittent... anyway, at a hunch, does replacing this:

  gBrowser.selectedTab = gBrowser.addTab();
  gBrowser.removeCurrentTab();

with

  let tab = gBrowser.addTab();
  yield BrowserTestUtils.switchToTab(gBrowser, tab);
  gBrowser.removeTab(tab);

work?

if not, replace the last gBrowser.removeTab(tab); with yield BrowserTestUtils.removeTab(tab);
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gkrizsanits)
This test is full of races. It opens an URI in a new tab, and then it assumes before that would load, but after it finished switching to it can open another tab load its content and close it and then magically the load of the first tab will finish only after we removed this second tab and switched back to the original.

It's not happening. I tried to fix the test by eliminating all the races, but then it turns out that this case will never happen with e10s-multi (and even without that it's a matter of luck if this test succeeds). I could not reproduce the original case this test was invented for, so I see no point wasting more time on fixing this for multi.
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #29)
> This test is full of races.

I don't see how it can be. The test is doing everything synchronously, except the final waiting for the tab to load. :-\

> It opens an URI in a new tab, and then it
> assumes before that would load, but after it finished switching to it can
> open another tab load its content

No, the test doesn't depend on the content of the second tab loading at all. It opens, selects, and closes it synchronously. It doesn't wait anywhere.

> It's not happening. I tried to fix the test by eliminating all the races,
> but then it turns out that this case will never happen with e10s-multi (and
> even without that it's a matter of luck if this test succeeds).

Why not? The test is verifying that we persist the URI of a pending load on the tab until the load finishes, and that it gets removed afterwards, for tabs opened via the external URI opening API. Saying this can "never happen with e10s-multi" does not make a lot of sense to me.
Flags: needinfo?(gkrizsanits)
(In reply to :Gijs from comment #30)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #29)
> > This test is full of races.
> 
> I don't see how it can be. The test is doing everything synchronously,
> except the final waiting for the tab to load. :-\

How could everything be synchronous? The very first operation opens a tab, switches to the tab and loads the content of the tab. None of those operations are synchronous. WHILE some of those operations are underway we open another tab and switch to it and close it and switch back. Again, we're in e10s, switching is not synchronous that's why we have BrowserTestUtils.switchTab, no? 

> No, the test doesn't depend on the content of the second tab loading at all.
> It opens, selects, and closes it synchronously. It doesn't wait anywhere.

You are right except the switching part. Although we fake sync in some cases the first load can
finish sooner than all this.

> Why not? The test is verifying that we persist the URI of a pending load on
> the tab until the load finishes, and that it gets removed afterwards, for
> tabs opened via the external URI opening API. Saying this can "never happen
> with e10s-multi" does not make a lot of sense to me.

The first openURI finishes its load before we could finish the second tab opening/switching/removing/switching back dance. So by the time we reach this: http://searchfox.org/mozilla-central/source/browser/base/content/test/urlbar/browser_bug562649.js#13 the userTypedValue is reset already.
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #31)
> (In reply to :Gijs from comment #30)
> > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #29)
> > > This test is full of races.
> > 
> > I don't see how it can be. The test is doing everything synchronously,
> > except the final waiting for the tab to load. :-\
> 
> How could everything be synchronous? The very first operation opens a tab,
> switches to the tab and loads the content of the tab. None of those
> operations are synchronous.

Uh, yes they are. The internal state of the tabbrowser is updated synchronously, the tab is created synchronously, its browser is created synchronously. The loading is async, but at that stage in the test that doesn't matter yet.

> WHILE some of those operations are underway we
> open another tab and switch to it and close it and switch back. Again, we're
> in e10s, switching is not synchronous that's why we have
> BrowserTestUtils.switchTab, no? 

AIUI you'd use switchTab if you care about the visual UI state. In this case, we don't. gBrowser.selectedTab = tab updates logical selected state but not visual selection, AIUI (we wait for the tab switch to be complete). Logical selected state is all that's needed here.

> > No, the test doesn't depend on the content of the second tab loading at all.
> > It opens, selects, and closes it synchronously. It doesn't wait anywhere.
> 
> You are right except the switching part. Although we fake sync in some cases
> the first load can
> finish sooner than all this.

I don't understand what you mean by "fake sync".

> > Why not? The test is verifying that we persist the URI of a pending load on
> > the tab until the load finishes, and that it gets removed afterwards, for
> > tabs opened via the external URI opening API. Saying this can "never happen
> > with e10s-multi" does not make a lot of sense to me.
> 
> The first openURI finishes its load before we could finish the second tab
> opening/switching/removing/switching back dance. So by the time we reach
> this:
> http://searchfox.org/mozilla-central/source/browser/base/content/test/urlbar/
> browser_bug562649.js#13 the userTypedValue is reset already.

OK, then it sounds like we can easily fix this by using a slow-loading server file that we force to not return a response until we manage to do everything else, right?
Flags: needinfo?(gkrizsanits)
FWIW, it's still not clear to me how we'd manage to update the internal state of a main-thread JS object in response to the load if there's no event loop being spun by any of this code, which, AIUI, we don't do for either tab switching or tab creation. If the JS in the test runs to completion as expected, that just shouldn't be possible. Where is it not doing so?
(In reply to :Gijs from comment #32)
> AIUI you'd use switchTab if you care about the visual UI state. In this
> case, we don't. gBrowser.selectedTab = tab updates logical selected state
> but not visual selection, AIUI (we wait for the tab switch to be complete).
> Logical selected state is all that's needed here.

Right that makes sense, I didn't know what part of it is sync and what is not.

> I don't understand what you mean by "fake sync".

Fake sync is what we do during close tab for example with a nested event loop: http://searchfox.org/mozilla-central/source/toolkit/content/widgets/remote-browser.xml#352

> OK, then it sounds like we can easily fix this by using a slow-loading
> server file that we force to not return a response until we manage to do
> everything else, right?

How does that work in practice? I'm not great with server files in testing, can set up the server to wait with a request and then ping it from script to finish it? That would fix it I think. I would definitely give it a try.

(In reply to :Gijs from comment #33)
> FWIW, it's still not clear to me how we'd manage to update the internal
> state of a main-thread JS object in response to the load if there's no event
> loop being spun by any of this code, which, AIUI, we don't do for either tab
> switching or tab creation. If the JS in the test runs to completion as
> expected, that just shouldn't be possible. Where is it not doing so?

I guess then the culprit is the removeTab, maybe we can request it to skip the permit part? I was not sure if we do similar tricks somewhere else as well (my front end knowledge is not that great).

It's great talking this over with someone, thanks a lot for the help, I have been losing my mind here...
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #34)
> (In reply to :Gijs from comment #32)
> > AIUI you'd use switchTab if you care about the visual UI state. In this
> > case, we don't. gBrowser.selectedTab = tab updates logical selected state
> > but not visual selection, AIUI (we wait for the tab switch to be complete).
> > Logical selected state is all that's needed here.
> 
> Right that makes sense, I didn't know what part of it is sync and what is
> not.

Re-reading the test though, I'm not sure I'm right, as it also tries to read the gURLBar value. I expect we won't update the gURLBar value until we switch tabs visually. :-(

So we should probably switch to BTU.switchToTab.

> > I don't understand what you mean by "fake sync".
> 
> Fake sync is what we do during close tab for example with a nested event
> loop:
> http://searchfox.org/mozilla-central/source/toolkit/content/widgets/remote-
> browser.xml#352

Ahhh, of course, I missed that. Yes, that's what's causing the issue here.

> > OK, then it sounds like we can easily fix this by using a slow-loading
> > server file that we force to not return a response until we manage to do
> > everything else, right?
> 
> How does that work in practice? I'm not great with server files in testing,
> can set up the server to wait with a request and then ping it from script to
> finish it? That would fix it I think. I would definitely give it a try.

Yes, I've used this before. You want an sjs file for this, a bit like this:

https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/urlbar/slow-page.sjs

Then you can use a repeating timer instead of a one-off, and use the getState method to check for shared state, and in the 'faster' if() branch, you can set shared state that triggers the original request resolving. It's a bit clumsy, but it should work to control what you want here. :-)

Docs are here https://developer.mozilla.org/en-US/docs/Mozilla/httpd.js/HTTP_server_for_unit_tests

Feel free to ping me on IRC if you get stuck (though I'm by no means an expert...).

> (In reply to :Gijs from comment #33)
> > FWIW, it's still not clear to me how we'd manage to update the internal
> > state of a main-thread JS object in response to the load if there's no event
> > loop being spun by any of this code, which, AIUI, we don't do for either tab
> > switching or tab creation. If the JS in the test runs to completion as
> > expected, that just shouldn't be possible. Where is it not doing so?
> 
> I guess then the culprit is the removeTab, maybe we can request it to skip
> the permit part? I was not sure if we do similar tricks somewhere else as
> well (my front end knowledge is not that great).

Ah, this would also be an option, though see above. The removeTab API has a second options arg (an object) on which you could set skipPermitUnload to true, which should make the test fully sync as-is... but in that case I'm fairly sure that the gURLBar checks aren't actually doing anything useful. That said, they're already not useful, and this would be sufficient (I think) to re-enable for e10s-multi, and then we can make the test better in a followup.

> It's great talking this over with someone, thanks a lot for the help, I have
> been losing my mind here...

Thanks for bearing with my questions as well!
(In reply to :Gijs from comment #35)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #34)
> > (In reply to :Gijs from comment #32)
> Re-reading the test though, I'm not sure I'm right, as it also tries to read
> the gURLBar value. I expect we won't update the gURLBar value until we
> switch tabs visually. :-(

Ah, yeah now that you mention it that part was failing after I started using BrowserTestUtils to try and make the test more predictable.

> Ah, this would also be an option, though see above. The removeTab API has a
> second options arg (an object) on which you could set skipPermitUnload to
> true, which should make the test fully sync as-is... but in that case I'm
> fairly sure that the gURLBar checks aren't actually doing anything useful.
> That said, they're already not useful, and this would be sufficient (I
> think) to re-enable for e10s-multi, and then we can make the test better in
> a followup.

I ended up doing this for now, it did the trick it was the tab closing. Just for the good measure I added the flag to both tab closing in the test (the second should only help finishing the test slightly faster)
Assignee: nobody → gkrizsanits
Status: REOPENED → ASSIGNED
Attachment #8831978 - Flags: review?(gijskruitbosch+bugs)
Comment on attachment 8831978 [details] [diff] [review]
Fixing race from tab closing. v1

Review of attachment 8831978 [details] [diff] [review]:
-----------------------------------------------------------------

r=me . Can you file a follow-up to make this test more useful?

Also, shouldn't this fix include re-enabling the test on Nightly for e10s-multi in the browser.ini file? It looks like it's perma-disabled for e10s at the moment...
Attachment #8831978 - Flags: review?(gijskruitbosch+bugs) → review+
Blocks: 1335351
(In reply to :Gijs from comment #37)
> Comment on attachment 8831978 [details] [diff] [review]
> Fixing race from tab closing. v1
> 
> Review of attachment 8831978 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me . Can you file a follow-up to make this test more useful?
> 

Sure, bug 1335351.

> Also, shouldn't this fix include re-enabling the test on Nightly for
> e10s-multi in the browser.ini file? It looks like it's perma-disabled for
> e10s at the moment...

Ah yeah, I forgot that I had this part in another patch... thanks!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cfb399f6a1b78010de91a1974ac9ec7d8d7d7c02
https://hg.mozilla.org/mozilla-central/rev/e807d24e9b10
Status: ASSIGNED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
You need to log in before you can comment on or make changes to this bug.