Closed
Bug 1211098
Opened 9 years ago
Closed 8 years ago
Intermittent e10s browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649 and three more
Categories
(Firefox :: General, defect)
Firefox
General
Tracking
()
RESOLVED
FIXED
Firefox 54
People
(Reporter: philor, Assigned: gkrizsanits)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
1.39 KB,
patch
|
Gijs
:
review+
|
Details | Diff | Splinter Review |
https://treeherder.mozilla.org/logviewer.html#?job_id=4960132&repo=fx-team
147 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649
148 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | location bar value matches test URI after switching tabs - Got , expected data:text/plain,bug562649
150 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | location bar value matches test URI as the page has loaded - Got , expected data:text/plain,bug562649
153 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | Found an unexpected tab at the end of test run: data:text/plain,bug562649 -
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Updated•9 years ago
|
Summary: Intermittent browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649 and three more → Intermittent e10s browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649 and three more
Updated•9 years ago
|
Blocks: e10s-tests
tracking-e10s:
--- → +
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•9 years ago
|
||
Mike, looks like you've got some history with this test. Any thoughts?
Flags: needinfo?(mconley)
Comment hidden (Intermittent Failures Robot) |
Comment 7•9 years ago
|
||
I'm afraid I don't have any - I just re-enabled the test when I saw that it was working. It looks like there's a race here with the setting of userTypedValue.
Hey Gijs, do you know how userTypedValue is supposed to get set when a tab is removed? I know that you've done some URL bar stuff lately, and I was wondering if you had it swapped into your head.
Flags: needinfo?(mconley) → needinfo?(gijskruitbosch+bugs)
Comment 8•9 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #7)
> I'm afraid I don't have any - I just re-enabled the test when I saw that it
> was working. It looks like there's a race here with the setting of
> userTypedValue.
>
> Hey Gijs, do you know how userTypedValue is supposed to get set when a tab
> is removed? I know that you've done some URL bar stuff lately, and I was
> wondering if you had it swapped into your head.
userTypedValue is stored per-browser, so if you're checking it on the tabbrowser, you're really getting the one from the selected browser: https://dxr.mozilla.org/mozilla-central/rev/d6ee82b9a74155b6bfd544166f036fc572ae8c56/browser/base/content/tabbrowser.xml#4052-4054
I expect that the tab switch here should be made to use BTU.switchToTab (the async, e10s-compatible version), but it would be good to check the context of bug 562649 to see if having this change be asynchronous is likely to cause problems in practice, or if we should stop providing this forwarding implementation, or make it forward differently in e10s (IIRC we update selectedBrowser async? So I wonder if we should update this sync to the to-be-selected-browser) or... something. :-)
Does that make sense?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mconley)
Comment 9•9 years ago
|
||
It makes sense, yes.
What also is interesting is that this intermittent has stopped being reported. I'm closing this one was INCOMPLETE unless we start seeing it again.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(mconley)
Resolution: --- → INCOMPLETE
Comment 10•8 years ago
|
||
https://treeherder.mozilla.org/logviewer.html#?job_id=37727822&repo=mozilla-inbound has the first error message, the others might be gone away because of the location bar changes.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Comment hidden (Intermittent Failures Robot) |
Comment 12•8 years ago
|
||
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo if bug is intermittent) from comment #10)
> https://treeherder.mozilla.org/logviewer.html#?job_id=37727822&repo=mozilla-
> inbound has the first error message, the others might be gone away because
> of the location bar changes.
Has this recently reappeared, and if so do we know if it's correlated with some change we've made? What "location bar changes" do you mean specifically?
Flags: needinfo?(aryx.bugmail)
Comment 13•8 years ago
|
||
(and in particular, I wonder if I regressed this with bug 1000458?)
Comment 14•8 years ago
|
||
(In reply to :Gijs Kruitbosch from comment #12)
> Has this recently reappeared, and if so do we know if it's correlated with
> some change we've made? What "location bar changes" do you mean specifically?
My comment was under the assumption that this issue hadn't appeared for the months (so I assumed changes in that timeframe), but it already started earlier last Wednesday, the failures got starred against this bug but the bug not reopened.
All 6 starred failures are on OSX 10.10 opt: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1211098&startday=2016-10-01&endday=2016-10-17&tree=all
Didn't spot anything obvious in the list of fixed bugs in Firefox::Location bar or Toolkit::XUL Widgets. There are recent changes about Enter handling like yours in bug 1000458 (landed on central 2016-10-04) or from mak in September, but an earlier failure would have been likely. Maybe a change in Core?
Flags: needinfo?(aryx.bugmail)
Comment 15•8 years ago
|
||
We (me and Drew) did a bunch of changes to urlbar code to clarify some codepaths, so it's likely those may have skewed some timings and retriggered this as well.
I don't think it's really useful to go after the culprit here, we should rather try to figure out what paths the test touches that change userTypedValue and whether it's measuring it at the wrong time or not properly detecting a real regression.
Recent bugs include bug 1304501 and bug 1306639, for example, that landed in the last few days.
Comment 16•8 years ago
|
||
bug 1000458 is specifically about switching tabs and hitting enter, which is also the failure message in this test, so I'm worried it might be related to that...
It's true that analyzing the test further will likely also help figure out what's wrong. I'll try and have a look later this week.
Flags: needinfo?(gijskruitbosch+bugs)
Comment 17•8 years ago
|
||
I can't reproduce and there have been no re-ocurrences today or yesterday. Even if there were 1 or 2 like the other days, this is so low-frequency that I'm not sure how to debug - it would be a slow process of continually adding logging to figure out in what state this is breaking. So going to clear ni for now, and if this becomes more frequent I can investigate.
Flags: needinfo?(gijskruitbosch+bugs)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 27•8 years ago
|
||
(In reply to :Gijs from comment #8)
> I expect that the tab switch here should be made to use BTU.switchToTab (the
> async, e10s-compatible version), but it would be good to check the context
> of bug 562649 to see if having this change be asynchronous is likely to
> cause problems in practice, or if we should stop providing this forwarding
> implementation, or make it forward differently in e10s (IIRC we update
> selectedBrowser async? So I wonder if we should update this sync to the
> to-be-selected-browser) or... something. :-)
>
> Does that make sense?
I had to disable this test for landing e10s-multi, because it became more frequent and I didn't want
to block landing on this test. I would love to fix this but I'm not sure I understand the second part
of your comment, could you please elaborate on that so I can give it a try? Thanks!
Flags: needinfo?(gijskruitbosch+bugs)
Comment 28•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #27)
> (In reply to :Gijs from comment #8)
>
> > I expect that the tab switch here should be made to use BTU.switchToTab (the
> > async, e10s-compatible version), but it would be good to check the context
> > of bug 562649 to see if having this change be asynchronous is likely to
> > cause problems in practice, or if we should stop providing this forwarding
> > implementation, or make it forward differently in e10s (IIRC we update
> > selectedBrowser async? So I wonder if we should update this sync to the
> > to-be-selected-browser) or... something. :-)
> >
> > Does that make sense?
>
> I had to disable this test for landing e10s-multi, because it became more
> frequent and I didn't want
> to block landing on this test. I would love to fix this but I'm not sure I
> understand the second part
> of your comment, could you please elaborate on that so I can give it a try?
> Thanks!
I don't know anymore, it was 8 months ago. I also can't see the test failures you had with e10s-multi and I'm worried other things broke than this intermittent... anyway, at a hunch, does replacing this:
gBrowser.selectedTab = gBrowser.addTab();
gBrowser.removeCurrentTab();
with
let tab = gBrowser.addTab();
yield BrowserTestUtils.switchToTab(gBrowser, tab);
gBrowser.removeTab(tab);
work?
if not, replace the last gBrowser.removeTab(tab); with yield BrowserTestUtils.removeTab(tab);
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gkrizsanits)
Assignee | ||
Comment 29•8 years ago
|
||
This test is full of races. It opens an URI in a new tab, and then it assumes before that would load, but after it finished switching to it can open another tab load its content and close it and then magically the load of the first tab will finish only after we removed this second tab and switched back to the original.
It's not happening. I tried to fix the test by eliminating all the races, but then it turns out that this case will never happen with e10s-multi (and even without that it's a matter of luck if this test succeeds). I could not reproduce the original case this test was invented for, so I see no point wasting more time on fixing this for multi.
Flags: needinfo?(gkrizsanits)
Comment 30•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #29)
> This test is full of races.
I don't see how it can be. The test is doing everything synchronously, except the final waiting for the tab to load. :-\
> It opens an URI in a new tab, and then it
> assumes before that would load, but after it finished switching to it can
> open another tab load its content
No, the test doesn't depend on the content of the second tab loading at all. It opens, selects, and closes it synchronously. It doesn't wait anywhere.
> It's not happening. I tried to fix the test by eliminating all the races,
> but then it turns out that this case will never happen with e10s-multi (and
> even without that it's a matter of luck if this test succeeds).
Why not? The test is verifying that we persist the URI of a pending load on the tab until the load finishes, and that it gets removed afterwards, for tabs opened via the external URI opening API. Saying this can "never happen with e10s-multi" does not make a lot of sense to me.
Flags: needinfo?(gkrizsanits)
Assignee | ||
Comment 31•8 years ago
|
||
(In reply to :Gijs from comment #30)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #29)
> > This test is full of races.
>
> I don't see how it can be. The test is doing everything synchronously,
> except the final waiting for the tab to load. :-\
How could everything be synchronous? The very first operation opens a tab, switches to the tab and loads the content of the tab. None of those operations are synchronous. WHILE some of those operations are underway we open another tab and switch to it and close it and switch back. Again, we're in e10s, switching is not synchronous that's why we have BrowserTestUtils.switchTab, no?
> No, the test doesn't depend on the content of the second tab loading at all.
> It opens, selects, and closes it synchronously. It doesn't wait anywhere.
You are right except the switching part. Although we fake sync in some cases the first load can
finish sooner than all this.
> Why not? The test is verifying that we persist the URI of a pending load on
> the tab until the load finishes, and that it gets removed afterwards, for
> tabs opened via the external URI opening API. Saying this can "never happen
> with e10s-multi" does not make a lot of sense to me.
The first openURI finishes its load before we could finish the second tab opening/switching/removing/switching back dance. So by the time we reach this: http://searchfox.org/mozilla-central/source/browser/base/content/test/urlbar/browser_bug562649.js#13 the userTypedValue is reset already.
Flags: needinfo?(gkrizsanits)
Comment 32•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #31)
> (In reply to :Gijs from comment #30)
> > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #29)
> > > This test is full of races.
> >
> > I don't see how it can be. The test is doing everything synchronously,
> > except the final waiting for the tab to load. :-\
>
> How could everything be synchronous? The very first operation opens a tab,
> switches to the tab and loads the content of the tab. None of those
> operations are synchronous.
Uh, yes they are. The internal state of the tabbrowser is updated synchronously, the tab is created synchronously, its browser is created synchronously. The loading is async, but at that stage in the test that doesn't matter yet.
> WHILE some of those operations are underway we
> open another tab and switch to it and close it and switch back. Again, we're
> in e10s, switching is not synchronous that's why we have
> BrowserTestUtils.switchTab, no?
AIUI you'd use switchTab if you care about the visual UI state. In this case, we don't. gBrowser.selectedTab = tab updates logical selected state but not visual selection, AIUI (we wait for the tab switch to be complete). Logical selected state is all that's needed here.
> > No, the test doesn't depend on the content of the second tab loading at all.
> > It opens, selects, and closes it synchronously. It doesn't wait anywhere.
>
> You are right except the switching part. Although we fake sync in some cases
> the first load can
> finish sooner than all this.
I don't understand what you mean by "fake sync".
> > Why not? The test is verifying that we persist the URI of a pending load on
> > the tab until the load finishes, and that it gets removed afterwards, for
> > tabs opened via the external URI opening API. Saying this can "never happen
> > with e10s-multi" does not make a lot of sense to me.
>
> The first openURI finishes its load before we could finish the second tab
> opening/switching/removing/switching back dance. So by the time we reach
> this:
> http://searchfox.org/mozilla-central/source/browser/base/content/test/urlbar/
> browser_bug562649.js#13 the userTypedValue is reset already.
OK, then it sounds like we can easily fix this by using a slow-loading server file that we force to not return a response until we manage to do everything else, right?
Flags: needinfo?(gkrizsanits)
Comment 33•8 years ago
|
||
FWIW, it's still not clear to me how we'd manage to update the internal state of a main-thread JS object in response to the load if there's no event loop being spun by any of this code, which, AIUI, we don't do for either tab switching or tab creation. If the JS in the test runs to completion as expected, that just shouldn't be possible. Where is it not doing so?
Assignee | ||
Comment 34•8 years ago
|
||
(In reply to :Gijs from comment #32)
> AIUI you'd use switchTab if you care about the visual UI state. In this
> case, we don't. gBrowser.selectedTab = tab updates logical selected state
> but not visual selection, AIUI (we wait for the tab switch to be complete).
> Logical selected state is all that's needed here.
Right that makes sense, I didn't know what part of it is sync and what is not.
> I don't understand what you mean by "fake sync".
Fake sync is what we do during close tab for example with a nested event loop: http://searchfox.org/mozilla-central/source/toolkit/content/widgets/remote-browser.xml#352
> OK, then it sounds like we can easily fix this by using a slow-loading
> server file that we force to not return a response until we manage to do
> everything else, right?
How does that work in practice? I'm not great with server files in testing, can set up the server to wait with a request and then ping it from script to finish it? That would fix it I think. I would definitely give it a try.
(In reply to :Gijs from comment #33)
> FWIW, it's still not clear to me how we'd manage to update the internal
> state of a main-thread JS object in response to the load if there's no event
> loop being spun by any of this code, which, AIUI, we don't do for either tab
> switching or tab creation. If the JS in the test runs to completion as
> expected, that just shouldn't be possible. Where is it not doing so?
I guess then the culprit is the removeTab, maybe we can request it to skip the permit part? I was not sure if we do similar tricks somewhere else as well (my front end knowledge is not that great).
It's great talking this over with someone, thanks a lot for the help, I have been losing my mind here...
Flags: needinfo?(gkrizsanits)
Comment 35•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #34)
> (In reply to :Gijs from comment #32)
> > AIUI you'd use switchTab if you care about the visual UI state. In this
> > case, we don't. gBrowser.selectedTab = tab updates logical selected state
> > but not visual selection, AIUI (we wait for the tab switch to be complete).
> > Logical selected state is all that's needed here.
>
> Right that makes sense, I didn't know what part of it is sync and what is
> not.
Re-reading the test though, I'm not sure I'm right, as it also tries to read the gURLBar value. I expect we won't update the gURLBar value until we switch tabs visually. :-(
So we should probably switch to BTU.switchToTab.
> > I don't understand what you mean by "fake sync".
>
> Fake sync is what we do during close tab for example with a nested event
> loop:
> http://searchfox.org/mozilla-central/source/toolkit/content/widgets/remote-
> browser.xml#352
Ahhh, of course, I missed that. Yes, that's what's causing the issue here.
> > OK, then it sounds like we can easily fix this by using a slow-loading
> > server file that we force to not return a response until we manage to do
> > everything else, right?
>
> How does that work in practice? I'm not great with server files in testing,
> can set up the server to wait with a request and then ping it from script to
> finish it? That would fix it I think. I would definitely give it a try.
Yes, I've used this before. You want an sjs file for this, a bit like this:
https://dxr.mozilla.org/mozilla-central/source/browser/base/content/test/urlbar/slow-page.sjs
Then you can use a repeating timer instead of a one-off, and use the getState method to check for shared state, and in the 'faster' if() branch, you can set shared state that triggers the original request resolving. It's a bit clumsy, but it should work to control what you want here. :-)
Docs are here https://developer.mozilla.org/en-US/docs/Mozilla/httpd.js/HTTP_server_for_unit_tests
Feel free to ping me on IRC if you get stuck (though I'm by no means an expert...).
> (In reply to :Gijs from comment #33)
> > FWIW, it's still not clear to me how we'd manage to update the internal
> > state of a main-thread JS object in response to the load if there's no event
> > loop being spun by any of this code, which, AIUI, we don't do for either tab
> > switching or tab creation. If the JS in the test runs to completion as
> > expected, that just shouldn't be possible. Where is it not doing so?
>
> I guess then the culprit is the removeTab, maybe we can request it to skip
> the permit part? I was not sure if we do similar tricks somewhere else as
> well (my front end knowledge is not that great).
Ah, this would also be an option, though see above. The removeTab API has a second options arg (an object) on which you could set skipPermitUnload to true, which should make the test fully sync as-is... but in that case I'm fairly sure that the gURLBar checks aren't actually doing anything useful. That said, they're already not useful, and this would be sufficient (I think) to re-enable for e10s-multi, and then we can make the test better in a followup.
> It's great talking this over with someone, thanks a lot for the help, I have
> been losing my mind here...
Thanks for bearing with my questions as well!
Assignee | ||
Comment 36•8 years ago
|
||
(In reply to :Gijs from comment #35)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #34)
> > (In reply to :Gijs from comment #32)
> Re-reading the test though, I'm not sure I'm right, as it also tries to read
> the gURLBar value. I expect we won't update the gURLBar value until we
> switch tabs visually. :-(
Ah, yeah now that you mention it that part was failing after I started using BrowserTestUtils to try and make the test more predictable.
> Ah, this would also be an option, though see above. The removeTab API has a
> second options arg (an object) on which you could set skipPermitUnload to
> true, which should make the test fully sync as-is... but in that case I'm
> fairly sure that the gURLBar checks aren't actually doing anything useful.
> That said, they're already not useful, and this would be sufficient (I
> think) to re-enable for e10s-multi, and then we can make the test better in
> a followup.
I ended up doing this for now, it did the trick it was the tab closing. Just for the good measure I added the flag to both tab closing in the test (the second should only help finishing the test slightly faster)
Assignee: nobody → gkrizsanits
Status: REOPENED → ASSIGNED
Attachment #8831978 -
Flags: review?(gijskruitbosch+bugs)
Comment 37•8 years ago
|
||
Comment on attachment 8831978 [details] [diff] [review]
Fixing race from tab closing. v1
Review of attachment 8831978 [details] [diff] [review]:
-----------------------------------------------------------------
r=me . Can you file a follow-up to make this test more useful?
Also, shouldn't this fix include re-enabling the test on Nightly for e10s-multi in the browser.ini file? It looks like it's perma-disabled for e10s at the moment...
Attachment #8831978 -
Flags: review?(gijskruitbosch+bugs) → review+
Assignee | ||
Comment 38•8 years ago
|
||
(In reply to :Gijs from comment #37)
> Comment on attachment 8831978 [details] [diff] [review]
> Fixing race from tab closing. v1
>
> Review of attachment 8831978 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> r=me . Can you file a follow-up to make this test more useful?
>
Sure, bug 1335351.
> Also, shouldn't this fix include re-enabling the test on Nightly for
> e10s-multi in the browser.ini file? It looks like it's perma-disabled for
> e10s at the moment...
Ah yeah, I forgot that I had this part in another patch... thanks!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cfb399f6a1b78010de91a1974ac9ec7d8d7d7c02
Comment 39•8 years ago
|
||
Pushed by gkrizsanits@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e807d24e9b10
Fixing race from tab closing. r=gijs
Comment 40•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago → 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Updated•8 years ago
|
status-firefox44:
affected → ---
status-firefox51:
--- → wontfix
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Comment 41•8 years ago
|
||
bugherder uplift |
Flags: in-testsuite+
Comment 42•8 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•