Closed Bug 1453723 Opened 2 years ago Closed 1 year ago

Intermittent browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js | URL bar should be empty - Got about:preferences, expected

Categories

(Firefox :: Address Bar, defect, P5)

defect

Tracking

()

RESOLVED FIXED
Firefox 63
Tracking Status
firefox-esr60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: Gijs)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed])

Attachments

(2 files)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=173249497&repo=try

https://queue.taskcluster.net/v1/task/Ph7gEm4jSjy0UFNw_ywuMg/runs/0/artifacts/public/logs/live_backing.log

Seen for a central-as-beta simulation. Retriggers are fine.

05:46:45     INFO - TEST-PASS | browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js | The browser should have no recorded userTypedValue - 
05:46:45     INFO - Leaving test bound clearURLBarAfterParentProcessURL
05:46:45     INFO - Entering test bound clearURLBarAfterParentProcessURLInExistingTab
05:46:45     INFO - Buffered messages finished
05:46:45     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js | URL bar should be empty - Got about:preferences, expected 
05:46:45     INFO - Stack trace:
05:46:45     INFO - chrome://mochikit/content/browser-test.js:test_is:1280
05:46:45     INFO - chrome://mochitests/content/browser/browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js:clearURLBarAfterParentProcessURLInExistingTab:40
05:46:45     INFO - TEST-PASS | browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js | The browser should have no recorded userTypedValue - 
05:46:45     INFO - Leaving test bound clearURLBarAfterParentProcessURLInExistingTab
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Recent log failure: https://treeherder.mozilla.org/logviewer.html#?job_id=194011248&repo=autoland&lineNumber=5376
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
There are 53 failures (all on debug) in the last 7 days.
 Linux32     -> 16 failures
 Linux64     -> 32 failures
 Osx-10-10   -> 4 failures
 Windows7-32 -> 1 failure
 
Recent log link: https://treeherder.mozilla.org/logviewer.html#?job_id=195425703&repo=autoland&lineNumber=3785

Joel, Marco is away until 26 Aug. Who can help us with this?
Flags: needinfo?(jmaher)
Whiteboard: [stockwell needswork]
this looks to have started August 14th or 15th, a bit old for a fresh orange.

:Gijs, could you help us find someone who could look into this test?  I do see that :bgrins changed this test (and many others) on August 17th- it could be the root cause but maybe a day or two too late.
Flags: needinfo?(jmaher) → needinfo?(gijskruitbosch+bugs)
So a few notes:

- The error message is not unique in this test file, but from a casual look it looks like it's the line 40 assertion that's failing everywhere.
- Both in that test task/function and in some of the other ones, we wait for the Initialized event, which is fired off a callback for DOMContentLoaded. So the sequence is:

-> wait for `Initialized` (from DCL) -> promise resolution tick -> load different URI -> wait for browserLoaded() promise

I expect that in the cases where we fail, the load event is coming from the about:preferences load, instead of the newly loaded thing.


I need to do some doublechecking if the bug that I was testing for here required the load of the original page to not be complete. That is, the trivial fix here would be to just wait for the browser to be loaded after loading about:preferences. However, if the test is specifically trying to check what happens if you load a URL that requires a process switch *while* the browser is in the middle of loading something else, that trivial fix would defeat the point of the test. Especially what with the upcoming work to move the new tab into its own process (though remote), we're going to be doing more process switching, and we need to make sure that we don't ignore subtle race conditions there.

The other thing is that I have no idea what's triggered this. There's a few different things that spring to mind:

- changes in how quickly we fire about:preferences' load event
- changes in how docshell dispatches load events
- changes in browser remoteness management and/or browser bindings.
- changes in the url bar / tabbrowser (seems less likely to me off-hand, though also possible).

Looking at this list of changes:

https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2018-08-14&enddate=2018-08-16

(which is deliberately a bit wide because we don't seem to know exactly when this broke)

Naively makes me suspect at least:

bug 1442058 (removing bindings has bitten us before, and the remoteness change comes with a DOM reinsertion with new bindings, so it's possible something odd's happening there now - maybe it's just faster than before?)
bug 1472491 (because sooooo many changes that could potentially affect this)


Brian, any chance it's the former?

Joel, any chance you could (remind me how to) bisect when this regressed further with retriggers or whatever?

Keeping my needinfo while I try to dig up why the test does what it does.
Flags: needinfo?(jmaher)
Flags: needinfo?(bgrinstead)
OK, so the test was meant to wait for about:preferences to load, so I guess we can just fix that instead... It'd still be good to understand why the frequency of these failures has changed, though.
Assignee: nobody → gijskruitbosch+bugs
Status: REOPENED → ASSIGNED
Flags: needinfo?(gijskruitbosch+bugs)
We need to collect more data on the ~16th to see what the failure rates look like.

this test:
browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js

in this range:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=linux%20debug%20browser-chrome&tochange=6c9806cf666e4219fa6fe3ab95d99a360f7ad59e&fromchange=6b9aaa303e9a193b92d02659a0b19059007b5665&selectedJob=194213794


linux32/debug is bc10
linux64/debug is bc9

I retriggered a lot of jobs and did some backfilling, we will need to revisit this in an hour or so and see what the failure rate is and if it falls inside this range and either narrow it down, or go back a bit further in history.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)
> We need to collect more data on the ~16th to see what the failure rates look
> like.
> 
> this test:
> browser/base/content/test/urlbar/browser_urlbarAboutHomeLoading.js
> 
> in this range:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-
> searchStr=linux%20debug%20browser-
> chrome&tochange=6c9806cf666e4219fa6fe3ab95d99a360f7ad59e&fromchange=6b9aaa303
> e9a193b92d02659a0b19059007b5665&selectedJob=194213794
> 
> 
> linux32/debug is bc10
> linux64/debug is bc9
> 
> I retriggered a lot of jobs and did some backfilling, we will need to
> revisit this in an hour or so and see what the failure rate is and if it
> falls inside this range and either narrow it down, or go back a bit further
> in history.

Looks like https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=cbfd7962458a790d91ae579c7e6f33ba046380f5&filter-searchStr=linux%20debug%20browser-chrome is quite orange on x64? Not sure if it's "orange enough" to match up with what caused this to spike...
I am not able to find a root cause by retriggering very easily.
the jobs fail so often due to other test failures- this specific failure is hard to find without collecting a few thousand more data points.
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #20)
> the jobs fail so often due to other test failures- this specific failure is
> hard to find without collecting a few thousand more data points.

:-(

This doesn't seem like a great place to be in. Can you ni me on some of the other intermittent issues that are making this failure hard to investigate?
Flags: needinfo?(jmaher)
Flags: needinfo?(jmaher)
Waiting for the browser to have switched remoteness before waiting for
it to load the non-about:preferences page should be enough to address this orange.
Keywords: leave-open
Flags: needinfo?(bgrinstead)
Comment on attachment 9003770 [details]
Bug 1453723 - ensure we wait for the correct load, r?bgrins

Brian Grinstead [:bgrins] has approved the revision.
Attachment #9003770 - Flags: review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6617fb3e1363
ensure we wait for the correct load, r=bgrins
I accidentally cleared the needinfo from Comment 15. 

> bug 1442058 (removing bindings has bitten us before, and the remoteness change comes with
> a DOM reinsertion with new bindings, so it's possible something odd's happening there now -
> maybe it's just faster than before?)

It's possible that bug 1442058 changed some timings. Though in case, unlike some of the others, the element still has _a_ binding (browser or browser-remote), so it seems less likely. The other potential behavior changes:

1) If there was some caller where a browser gets created and we don't set the expando property, or we set it but loadURI had already been called.
2) browser.tabModalPromptBox changed from null (as defined in a <field>) to undefined.

(1) seems more likely than (2), but I don't know if either of them would have caused a spike here.
TV is orange with exactly the same errors as before, so this clearly isn't fixed. The problem is, I can't reproduce locally, not even with --verify. :-\
Explicitly specify which load we're waiting for to avoid intermittent oranges
Comment on attachment 9005251 [details]
Bug 1453723 - actually wait for the homepage to load, r?bgrins

Brian Grinstead [:bgrins] has approved the revision.
Attachment #9005251 - Flags: review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6b9c111f3da9
actually wait for the homepage to load, r=bgrins
This looks fixed to me, but let's wait another week...
Looks OK now.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago1 year ago
Resolution: --- → FIXED
Whiteboard: [stockwell needswork] → [stockwell fixed]
Target Milestone: --- → Firefox 63
You need to log in before you can comment on or make changes to this bug.