Closed Bug 1094421 Opened 5 years ago Closed 4 years ago

Intermittent browser_aboutHome.js | Test timed out | Found a tab after previous test timed out: http://www.browser-searchsuggestionengine.com/searchSuggestionEngine

Categories

(Firefox :: General, defect)

defect
Not set
Points:
1

Tracking

()

RESOLVED WORKSFORME
Firefox 36
Tracking Status
firefox34 --- fixed
firefox35 --- fixed
firefox36 --- affected
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=1106743&repo=fx-team

builder 	Rev4 MacOSX Snow Leopard 10.6 fx-team debug test mochitest-browser-chrome-1
buildid 	20141105104849
builduid 	2ef88e1a80aa470390939fde520c423d
results 	warnings (1)
revision 	ca180a35db49
slave 		t-snow-r4-0042
starttime 	Wed Nov 05 2014 14:46:39 GMT-0500 (Eastern Standard Time)

11:52:06 INFO - 87 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Engine name in DOM should match engine we just added
11:52:06 INFO - 88 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Health Reporter instance available.
11:52:06 INFO - 89 INFO waitForDocLoadAndStopIt: Waiting for URL: http://browser-searchsuggestionengine.com/searchSuggestionEngine
11:52:06 INFO - 90 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Searches provider is available.
11:52:06 INFO - 91 INFO Perform a search.
11:52:06 INFO - 92 INFO Console message: [JavaScript Warning: "ReferenceError: reference to undefined property data.selection" {file: "resource://app/modules/AboutHome.jsm" line: 191}]
11:52:06 INFO - 93 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Detail is search engine name
11:52:06 INFO - 94 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Health Reporter instance available.
11:52:06 INFO - 95 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Searches provider is available.
11:52:06 INFO - 96 INFO TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | One more search recorded.
11:52:06 INFO - 97 INFO Console message: [JavaScript Error: "TelemetryStopwatch: key "FX_PAGE_LOAD_MS" was already initialized" {file: "resource://gre/modules/TelemetryStopwatch.jsm" line: 52}]
11:52:06 INFO - 98 INFO Console message: 1415217065909 Services.HealthReport.HealthReporter WARN Recording new remote ID: 26faa958-5e85-f84d-a7bb-4399b8e4aaac
11:52:06 INFO - 99 INFO Console message: 1415217065950 Services.DataReporting.Policy WARN Hard error submitting data: Server failure.
11:52:06 INFO - 100 INFO Longer timeout required, waiting longer... Remaining timeouts: 1
11:52:06 INFO - 101 INFO TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Test timed out - expected PASS
11:52:06 INFO - 102 INFO TEST-OK | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | took 90232ms
11:52:06 INFO - 103 INFO TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Found a tab after previous test timed out: http://www.browser-searchsuggestionengine.com/searchSuggestionEngine - expected PASS
Blocks: 1062096
Flags: needinfo?(gijskruitbosch+bugs)
See also: bug 1011270, bug 1052730, bug 1062496

However, looks like in at least the first case, there's this gem:

11:52:06 INFO - 89 INFO waitForDocLoadAndStopIt: Waiting for URL: http://browser-searchsuggestionengine.com/searchSuggestionEngine 

And we then end up with 
11:52:06 INFO - 103 INFO TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Found a tab after previous test timed out: http://www.browser-searchsuggestionengine.com/searchSuggestionEngine - expected PASS 

Note the extra "www." there.

I expect that this is because it ends up doing a lookup for the target URL, which fails, at which point it tries using the "www." (hello, nsIURIFixup.FIXUP_ALTERNATE_URL) variant. I don't know if frame scripts would stay through these kinds of "redirects", and Mossop is suspicious about the order of these onStateChange events over in bug 1093594... but then again, this isn't an e10s run.

I *think* that the:

11:52:06     INFO -  97 INFO Console message: [JavaScript Error: "TelemetryStopwatch: key "FX_PAGE_LOAD_MS" was already initialized" {file: "resource://gre/modules/TelemetryStopwatch.jsm" line: 52}]

message corroborates the "whoops, and now we're doing a second load (ie with "www.") and something strange just happened" theory.

On builds where the test works, the dump() statements in the frame script for waitForDocLoadAndStopIt do fire:

18:28:28     INFO -  waitForDocLoadAndStopIt: onStateChange f0001: http://browser-searchsuggestionengine.com/searchSuggestionEngine
18:28:28     INFO -  waitForDocLoadAndStopIt: Document start: http://browser-searchsuggestionengine.com/searchSuggestionEngine

and there is no trace of them here.

I imagine that as far as this test failure is concerned, adding the "www." here to the .xml search spec is the trivial, "worth trying" fix. Dave/Drew, how does that sound to you and/or do you have other hypotheses?

(
sample working raw log: 
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-macosx64-debug/1415237267/fx-team_snowleopard-debug_test-mochitest-browser-chrome-1-bm108-tests1-macosx-build815.txt.gz

sample broken raw log:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64-debug/1415266281/mozilla-inbound_mountainlion-debug_test-mochitest-browser-chrome-1-bm106-tests1-macosx-build1881.txt.gz
)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(dtownsend+bugmail)
Flags: needinfo?(adw)
(In reply to :Gijs Kruitbosch from comment #2)
> See also: bug 1011270, bug 1052730, bug 1062496
> 
> However, looks like in at least the first case, there's this gem:
> 
> 11:52:06 INFO - 89 INFO waitForDocLoadAndStopIt: Waiting for URL:
> http://browser-searchsuggestionengine.com/searchSuggestionEngine 
> 
> And we then end up with 
> 11:52:06 INFO - 103 INFO TEST-UNEXPECTED-FAIL |
> chrome://mochitests/content/browser/browser/base/content/test/general/
> browser_aboutHome.js | Found a tab after previous test timed out:
> http://www.browser-searchsuggestionengine.com/searchSuggestionEngine -
> expected PASS 
> 
> Note the extra "www." there.
> 
> I expect that this is because it ends up doing a lookup for the target URL,
> which fails, at which point it tries using the "www." (hello,
> nsIURIFixup.FIXUP_ALTERNATE_URL) variant. I don't know if frame scripts
> would stay through these kinds of "redirects", and Mossop is suspicious
> about the order of these onStateChange events over in bug 1093594... but
> then again, this isn't an e10s run.

Framescripts should persist across document loads and redirects except in the special case where a load switches the browser from remote to non-remote or vice-versa. But if this is failing in non-e10s then that shouldn't be an issue.

> I *think* that the:
> 
> 11:52:06     INFO -  97 INFO Console message: [JavaScript Error:
> "TelemetryStopwatch: key "FX_PAGE_LOAD_MS" was already initialized" {file:
> "resource://gre/modules/TelemetryStopwatch.jsm" line: 52}]
> 
> message corroborates the "whoops, and now we're doing a second load (ie with
> "www.") and something strange just happened" theory.
> 
> On builds where the test works, the dump() statements in the frame script
> for waitForDocLoadAndStopIt do fire:
> 
> 18:28:28     INFO -  waitForDocLoadAndStopIt: onStateChange f0001:
> http://browser-searchsuggestionengine.com/searchSuggestionEngine
> 18:28:28     INFO -  waitForDocLoadAndStopIt: Document start:
> http://browser-searchsuggestionengine.com/searchSuggestionEngine
> 
> and there is no trace of them here.
> 
> I imagine that as far as this test failure is concerned, adding the "www."
> here to the .xml search spec is the trivial, "worth trying" fix. Dave/Drew,
> how does that sound to you and/or do you have other hypotheses?

I don't really have anything more to add without spending a lot more time digging in so this seems like a reasonable quick fix to me.
Flags: needinfo?(dtownsend+bugmail)
(In reply to Dave Townsend [:mossop] from comment #4)
> I don't really have anything more to add without spending a lot more time
> digging in so this seems like a reasonable quick fix to me.

https://hg.mozilla.org/integration/fx-team/rev/5612c9fbbc7b
Flags: needinfo?(adw)
https://hg.mozilla.org/mozilla-central/rev/5612c9fbbc7b
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 36
Hi Gijs, can you provide a point value.
Iteration: --- → 36.2
Flags: qe-verify?
Flags: needinfo?(gijskruitbosch+bugs)
Flags: firefox-backlog+
Points: --- → 1
Flags: needinfo?(gijskruitbosch+bugs)
Flags: qe-verify? → qe-verify-
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Iteration: 36.2 → 36.3
Not actively investigating this. Please ping me if this is more frequent than once a week (which seems to be the current going rate post-patch) and/or this is otherwise high in the annoyance/frequency spectrum.
Assignee: gijskruitbosch+bugs → nobody
Iteration: 36.3 → ---
Well, that didn't take long...
Flags: needinfo?(gijskruitbosch+bugs)
Literally all of this is 10.6 debug. I looked at this for a bit, and I have no idea why the page isn't starting to load, *just* on 10.6 debug, apart from "look ma, I'm really slow", which is also kind of bug 1011270. I'll look at picking that up instead.
Flags: needinfo?(gijskruitbosch+bugs)
When I debug the test locally, I reproduce this bug almost all of times. Therefore, I'm wasting a lot of time. Cannot disable only the part of browser_aboutHome.js?

# And I have trouble with the test, I guess Promis.jsm or something included by the test has some bugs...
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #67)
> When I debug the test locally, I reproduce this bug almost all of times.
> Therefore, I'm wasting a lot of time. Cannot disable only the part of
> browser_aboutHome.js?

Can you clarify what you mean by "I'm wasting a lot of time"? What's happening when the test fails locally? The output from the failing test on tinderbox is not enough to figure out what is happening - if somehow the page loads but the code never picks up on it, or if the test times out before the code can pick up on it because loading the page is so slow.

> # And I have trouble with the test, I guess Promis.jsm or something included
> by the test has some bugs...

What do you mean by "I have trouble with the test" ?

Considering we use Promise.jsm in lots and lots of other tests, it seems unlikely to be promise.jsm itself. :-\
Flags: needinfo?(ryanvm)
Now with correct needinfo...
Flags: needinfo?(ryanvm) → needinfo?(masayuki)
(In reply to :Gijs Kruitbosch from comment #68)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #67)
> > When I debug the test locally, I reproduce this bug almost all of times.
> > Therefore, I'm wasting a lot of time. Cannot disable only the part of
> > browser_aboutHome.js?
> 
> Can you clarify what you mean by "I'm wasting a lot of time"?

I meant that I needed to run the test a lot of times for checking the result of my changes.

> What's happening when the test fails locally? The output from the failing test on
> tinderbox is not enough to figure out what is happening - if somehow the
> page loads but the code never picks up on it, or if the test times out
> before the code can pick up on it because loading the page is so slow.

It shows Server Not Found error page. Even if I press Try Again button manually, it fails to find the server.

When I skip this test, it won't block my work, though:
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/browser_aboutHome.js#81

> > # And I have trouble with the test, I guess Promis.jsm or something included
> > by the test has some bugs...
> 
> What do you mean by "I have trouble with the test" ?
> 
> Considering we use Promise.jsm in lots and lots of other tests, it seems
> unlikely to be promise.jsm itself. :-\

See bug 917322 comment 134 for an instance. And also this test causes a lot of orange bugs. I think that tests should be simpler as far as possible because complex test may block core's bug fix by bugs of tests.
Flags: needinfo?(masayuki)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #70)
> (In reply to :Gijs Kruitbosch from comment #68)
> > What's happening when the test fails locally? The output from the failing test on
> > tinderbox is not enough to figure out what is happening - if somehow the
> > page loads but the code never picks up on it, or if the test times out
> > before the code can pick up on it because loading the page is so slow.
> 
> It shows Server Not Found error page. Even if I press Try Again button
> manually, it fails to find the server.

This doesn't really answer the question (the fact that the server is not found is expected), but I'm assuming that you mean the page loads but the test does not detect this and therefore does not continue correctly (and just waits, and eventually times out).

> > > # And I have trouble with the test, I guess Promis.jsm or something included
> > > by the test has some bugs...
> > 
> > What do you mean by "I have trouble with the test" ?
> > 
> > Considering we use Promise.jsm in lots and lots of other tests, it seems
> > unlikely to be promise.jsm itself. :-\
> 
> See bug 917322 comment 134 for an instance.

You should file a bug against Promise.jsm about this. However, it's related to promise.jsm's error handling, so that doesn't seem relevant here (as this error doesn't show up, nor do any other errors - the test just times out and then has a waiting tab).

> And also this test causes a lot
> of orange bugs. I think that tests should be simpler as far as possible
> because complex test may block core's bug fix by bugs of tests.

bug 1011270 is about splitting up the test, but that won't fix the type of failure you're seeing. We've been trying and failing to understand the failure you're seeing (as in, we could never reproduce it locally, and don't understand why it fails the way it does). If you want to debug it, that would be helpful.

Regarding "complex test" - to correctly test complex behaviour like whether FHR is updated correctly when using the search box on about:home, a complex test can't really be avoided here. It's also not clear to me what, in particular, you find complex about the test or how you think it should be improved.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #70)
> And also this test causes a lot of orange bugs.

I only see this bug and bug 1062496. What other bugs does this test 'cause'?
(In reply to :Gijs Kruitbosch from comment #71)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #70)
> > (In reply to :Gijs Kruitbosch from comment #68)
> > > What's happening when the test fails locally? The output from the failing test on
> > > tinderbox is not enough to figure out what is happening - if somehow the
> > > page loads but the code never picks up on it, or if the test times out
> > > before the code can pick up on it because loading the page is so slow.
> > 
> > It shows Server Not Found error page. Even if I press Try Again button
> > manually, it fails to find the server.
> 
> This doesn't really answer the question (the fact that the server is not
> found is expected), but I'm assuming that you mean the page loads but the
> test does not detect this and therefore does not continue correctly (and
> just waits, and eventually times out).

I don't understand the test what it checks, though. I don't see server not found error by my eyes if the test passes normally. Otherwise, stops on the error page and will be timeout.


> > > > # And I have trouble with the test, I guess Promis.jsm or something included
> > > > by the test has some bugs...
> > > 
> > > What do you mean by "I have trouble with the test" ?
> > > 
> > > Considering we use Promise.jsm in lots and lots of other tests, it seems
> > > unlikely to be promise.jsm itself. :-\
> > 
> > See bug 917322 comment 134 for an instance.
> 
> You should file a bug against Promise.jsm about this. However, it's related
> to promise.jsm's error handling, so that doesn't seem relevant here (as this
> error doesn't show up, nor do any other errors - the test just times out and
> then has a waiting tab).

I'm not familiar with the module and also JS itself. Therefore, I cannot decide what's the bug and what's the correct behavior. If I had enough time to research them, I'd done it...

> Regarding "complex test" - to correctly test complex behaviour like whether
> FHR is updated correctly when using the search box on about:home, a complex
> test can't really be avoided here. It's also not clear to me what, in
> particular, you find complex about the test or how you think it should be
> improved.

I believe that using other modules is too complex. When it becomes orange by a change of platform-side, it's hard to research if it detects a bug of the change or not.

(In reply to :Gijs Kruitbosch from comment #72)
> (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #70)
> > And also this test causes a lot of orange bugs.
> 
> I only see this bug and bug 1062496. What other bugs does this test 'cause'?

bug 981473?
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #73)
> (In reply to :Gijs Kruitbosch from comment #71)
> > (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #70)
> > > (In reply to :Gijs Kruitbosch from comment #68)
> > > > What's happening when the test fails locally? The output from the failing test on
> > > > tinderbox is not enough to figure out what is happening - if somehow the
> > > > page loads but the code never picks up on it, or if the test times out
> > > > before the code can pick up on it because loading the page is so slow.
> > > 
> > > It shows Server Not Found error page. Even if I press Try Again button
> > > manually, it fails to find the server.
> > 
> > This doesn't really answer the question (the fact that the server is not
> > found is expected), but I'm assuming that you mean the page loads but the
> > test does not detect this and therefore does not continue correctly (and
> > just waits, and eventually times out).
> 
> I don't understand the test what it checks, though. I don't see server not
> found error by my eyes if the test passes normally. Otherwise, stops on the
> error page and will be timeout.

The test clicks a button on about:home which triggers the search, and records it in FHR. The test invokes this code:

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/head.js#395

which should be stopping the search pageload, which is why you don't see it in the successful case. I don't know why the test is not successful at catching the page load when the test fails.

> > Regarding "complex test" - to correctly test complex behaviour like whether
> > FHR is updated correctly when using the search box on about:home, a complex
> > test can't really be avoided here. It's also not clear to me what, in
> > particular, you find complex about the test or how you think it should be
> > improved.
> 
> I believe that using other modules is too complex.

This is a nice theory but really doesn't work for frontend tests. They often test interactions of different bits of the front end (like search and FHR, or search and customizable UI, or customizable UI and custom toolbars added in previous versions, ...) - that's their point.

Besides, Promise.jsm is essentially like DOM Promises (it predates DOM Promises), so if anything, the issue is with other modules, not so much with the test's use of promises. We could rewrite it to use DOM Promises, but it would very much surprise me if that stopped the oranges.

> When it becomes orange by
> a change of platform-side, it's hard to research if it detects a bug of the
> change or not.

As is the case for any random orange whose frequency changes by means of a platform change.

> (In reply to :Gijs Kruitbosch from comment #72)
> > (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #70)
> > > And also this test causes a lot of orange bugs.
> > 
> > I only see this bug and bug 1062496. What other bugs does this test 'cause'?
> 
> bug 981473?

This happens only once or twice a year, just like other DST-related failures. It's annoying and we should ideally fix it, but it has little do with the complexity or otherwise of the test.
OS: Mac OS X → All
Hardware: x86_64 → All
Inactive; closing (see bug 1180138).
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → WORKSFORME
Depends on: 1260330
You need to log in before you can comment on or make changes to this bug.