Closed Bug 1171303 Opened 4 years ago Closed 4 years ago

Frequent testOfflinePage GeckoEventExpecter - blockForEvent timeout: Robocop:JS

Categories

(Firefox for Android :: Testing, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 41
Tracking Status
firefox39 --- fixed
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1160008 collects robocop "blockForEvent timeout: Robocop:JS" failures for various tests. One of the most frequent currently is testOfflinePage.

log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=10416266
repository: mozilla-inbound
start_time: 2015-06-03T14:43:43
who: wkocher[at]mozilla[dot]com
machine: tst-linux64-spot-1720
buildname: Android 4.3 armv7 API 11+ mozilla-inbound opt test robocop-3
revision: 0c9e32f0a414

TEST-UNEXPECTED-FAIL | testOfflinePage | GeckoEventExpecter - blockForEvent timeout: Robocop:JS
0 ERROR Exception caught during test! - junit.framework.AssertionFailedError: TEST-UNEXPECTED-FAIL | testOfflinePage | GeckoEventExpecter - blockForEvent timeout: Robocop:JS
TEST-UNEXPECTED-FAIL | testOfflinePage | Exception caught - junit.framework.AssertionFailedError: TEST-UNEXPECTED-FAIL | testOfflinePage | GeckoEventExpecter - blockForEvent timeout: Robocop:JS
Return code: 1
From https://bugzilla.mozilla.org/show_bug.cgi?id=1160008#c324 .. c328:

The testOfflinePage failures look like hangs, with promiseOffline() waiting to observe network:offline-status-changed.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6343ee790276 re-creates the failures with extra logging in nsIOService::SetOffline to see if network:offline-status-changed is sent.

Compare (failure) http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/f9f464bf19320813018b5d33614764132af7a951cb2542315a8ca1af88f353aee3f6a7e0fec32bdff961d47e090805a58dc100611c6055360c291394bd711af2

06-02 13:10:27.961  1600  1617 I Robocop : {"message":"Robocop:JS should equal Robocop:JS","time":1433275827949,"source":"robocop","status":"PASS","test":"testOfflinePage","thread":null,"subtest":"Given message occurred for registered event: {\"message\":\"\\nTEST-PASS | testOfflinePage.js | [test_offline : 95] Docshell URI is the original URI.\\n\",\"innerType\":\"progress\",\"type\":\"Robocop:JS\"}","action":"test_status","pid":null}
06-02 13:10:27.961  1600  1617 D Robocop : received event Robocop:JS
06-02 13:10:27.980  1600  1613 D Robocop : unblocked on expecter for Robocop:JS
06-02 13:10:28.001  1600  1613 I Robocop : {"message":"[test_offline : 95] Docshell URI is the original URI.","time":1433275827983,"source":"robocop","status":"PASS","test":"testOfflinePage","thread":null,"subtest":"testOfflinePage.js","action":"test_status","pid":null}
06-02 13:10:28.011  1600  1617 I Gecko   : SetOffline: 0
06-02 13:10:28.131  1600  1617 D GeckoTabs: handleMessage: Tab:ViewportMetadata

with (success) http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/2a1dc1ab072bf348e73c84a2dc74be9b32da7abe8a420331f31805dfce7ccc630ab59c10ff7874d6f97e7662af9a5ed160f366aed0a0720ecef0fef8d06b886a

06-02 13:17:33.523  1794  1802 I Robocop : {"message":"[test_offline : 95] Docshell URI is the original URI.","time":1433276253523,"source":"robocop","status":"PASS","test":"testOfflinePage","thread":null,"subtest":"testOfflinePage.js","action":"test_status","pid":null}
06-02 13:17:33.534  1794  1804 I Gecko   : SetOffline: 0
06-02 13:17:33.534  1794  1804 I Gecko   : SetOffline - notify online
06-02 13:17:33.553  1794  1804 D Robocop : handleMessage called for: Robocop:JS; expecting: Robocop:JS
06-02 13:17:33.563  1794  1804 I Robocop : {"message":"Robocop:JS should equal Robocop:JS","time":1433276253563,"source":"robocop","status":"PASS","test":"testOfflinePage","thread":null,"subtest":"Given message occurred for registered event: {\"message\":\"\\nTEST-INFO | testOfflinePage.js | Received topic: network:offline-status-changed\\n\",\"innerType\":\"progress\",\"type\":\"Robocop:JS\"}","action":"test_status","pid":null}

SetOffline(0) is called in both cases, but in the failed tests, SetOffline does not send the notification...maybe because it thinks it is already online?


ailure:
06-02 13:10:05.712  1600  1617 I Gecko   : SetOffline: 1
06-02 13:10:05.720  1600  1617 I Gecko   : SetOffline - notify offline
...
06-02 13:10:23.262  1600  1617 I Gecko   : SetOffline: 0
06-02 13:10:23.282  1600  1617 I Gecko   : SetOffline - notify online
...
06-02 13:10:28.011  1600  1617 I Gecko   : SetOffline: 0


Success:

06-02 13:17:20.724  1794  1804 I Gecko   : SetOffline: 1
06-02 13:17:20.734  1794  1804 I Gecko   : SetOffline - notify offline
...
06-02 13:17:33.534  1794  1804 I Gecko   : SetOffline: 0
06-02 13:17:33.534  1794  1804 I Gecko   : SetOffline - notify online


The "extra" online request seems to happen while the test is waiting for DOMContentLoaded at

http://hg.mozilla.org/mozilla-central/annotate/e537a1ba501b/mobile/android/tests/browser/robocop/testOfflinePage.js#l89

Is there some kind of side-effect of the LoadURI call that causes us to go online? Or is there some independent event causing it?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac06a0bcca2a shows that the extra online request originates at http://hg.mozilla.org/mozilla-central/annotate/0920f2325a6d/mobile/android/components/AddonUpdateService.js#l52.

http://hg.mozilla.org/mozilla-central/annotate/0920f2325a6d/mobile/android/components/AddonUpdateService.js#l45 suggests that we could avoid this by setting PREF_ADDON_UPDATE_ENABLED (extensions.autoupdate.enabled) to false during the test (all robocop tests? all tests??).
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/692ec54e47a619ad6b4575a18a9ca0527ebd4228ece52d9ac22eb761ff22093fd30ea6843c6a110cec5afa636aaca1055e4fcbbdb8b256347e1b0a1dc924ce1f

06-03 14:49:42.304  1531  1548 I Gecko   : AUS sets offline false
06-03 14:49:42.304  1531  1548 I Gecko   : SetOffline: 0
06-03 14:49:42.314  1531  1548 I Gecko   : SetOffline - notify online
06-03 14:49:44.434  1531  1548 I GeckoConsole: 1433368184432	addons.update-checker	WARN	Request failed: http://10.0.2.2:8854/extensions-dummy/updateBackgroundURL - 404: Not Found
06-03 14:49:44.434  1531  1548 I Gecko   : 1433368184432	addons.update-checker	WARN	Request failed: http://10.0.2.2:8854/extensions-dummy/updateBackgroundURL - 404: Not Found
06-03 14:49:44.494  1531  1548 I GeckoConsole: 1433368184500	addons.update-checker	WARN	Request failed: http://10.0.2.2:8854/extensions-dummy/updateBackgroundURL - 404: Not Found
06-03 14:49:44.494  1531  1548 I Gecko   : 1433368184500	addons.update-checker	WARN	Request failed: http://10.0.2.2:8854/extensions-dummy/updateBackgroundURL - 404: Not Found
06-03 14:49:44.553  1531  1548 I GeckoConsole: 1433368184557	addons.update-checker	WARN	Request failed: http://10.0.2.2:8854/extensions-dummy/updateBackgroundURL - 404: Not Found
06-03 14:49:44.553  1531  1548 I Gecko   : 1433368184557	addons.update-checker	WARN	Request failed: http://10.0.2.2:8854/extensions-dummy/updateBackgroundURL - 404: Not Found
...
06-03 14:49:45.803  1531  1544 I Robocop : {"message":"[test_offline : 95] Docshell URI is the original URI.","time":1433368185798,"source":"robocop","status":"PASS","test":"testOfflinePage","thread":null,"subtest":"testOfflinePage.js","action":"test_status","pid":null}
06-03 14:49:45.823  1531  1548 I Gecko   : SetOffline: 0
Random drive-by ... isn't this the relevant part ... seems common through out, can't we catch and handle that?

06-03 14:52:48.894  1531  1548 I Gecko   : *************************
06-03 14:52:48.894  1531  1548 I Gecko   : A coding exception was thrown and uncaught in a Task.
06-03 14:52:48.894  1531  1548 I Gecko   : 
06-03 14:52:48.904  1531  1548 I Gecko   : Full message: TypeError: browser.contentDocument is undefined
06-03 14:52:48.904  1531  1548 I Gecko   : Full stack: test_offline@testOfflinePage.js:102:3
:capella -- I appreciate the drive-by. Thanks.

I'm not sure I fully understand that stack, but I think it is a consequence of the java-side timeout. On the java side, blockForEvent is waiting for Robocop:JS; on the js side, promiseOffline() is waiting for its notification from the IOService (which never comes). Eventually the java side times out and throws an exception, ending the test:

06-03 14:52:46.703  1531  1544 I Robocop : {"action":"log","message":"SimpleTest FINISHED","time":1433368366698,"pid":null,"level":"info","source":"robocop","thread":null}

and triggering browser shutdown

06-03 14:52:46.884  1531  1548 I GeckoDump: Robocop:Quit received -- requesting quit

It is only then, during shutdown, that we get some action in promiseOffline() and report the uncaught exception. I think the js test is trying to continue to run during shutdown and that is causing the exception. It would be nice to handle this shutdown issue better, but I'm going to focus on not getting into this situation in the first place: avoid the unfulfilled promise and avoid the timeout.
It's fair to catch the issue earlier in the failure chain. But here we're both addressing a test-specific situation right?

as-in, there'd still be a question of how to pattern-avoid these |blockForEvent timeout: Robocop:JS| failures?
This sets extensions.autoupdate.enabled=false in the test profile used by most tests, including all robocop tests. (The test profile already sets extensions.update.background.url="http://%(server)s/extensions-dummy/updateBackgroundURL", for a fast failure and to avoid network dependency...I'd prefer to avoid the attempt entirely.)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d819345eb658 suggests this eliminates the frequent testOfflinePage failures and does not introduce any new problems. There is a lower-frequency rc3 failure that seems to affect all tests in rc3 -- I think it is unrelated.


Another idea to consider: Should http://hg.mozilla.org/mozilla-central/annotate/0920f2325a6d/mobile/android/components/AddonUpdateService.js#l52 set offline to false? I imagine there's a good reason for it, but it seems heavy handed: I would expect offline changes to be initiated by user request or the IOService (auto recovery), but not by a service that just wants to use the network.
Attachment #8615321 - Flags: review?(mark.finkle)
(In reply to Mark Capella [:capella] from comment #6)
> as-in, there'd still be a question of how to pattern-avoid these
> |blockForEvent timeout: Robocop:JS| failures?

Yes. I don't have a solution for that.
Comment on attachment 8615321 [details] [diff] [review]
set extensions.autoupdate.enabled=false during most tests

Good find!
Attachment #8615321 - Flags: review?(mark.finkle) → review+
For some reason, this apparently made web-platform-tests on Windows and OSX permafail. At least, this was the best candidate for what was contained in that push.
https://treeherder.mozilla.org/logviewer.html#?job_id=10439276&repo=mozilla-inbound
Indeed that greened them up.
I am puzzled since I cannot find any references to extensions.autoupdate.enabled outside of /mobile. But https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f2f564acbbc also confirms that my original patch causes the web-platform-test failures on Windows and OSX.
Set pref for robocop tests only. Avoids Windows/OSX bustage:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b4059f38bec
Attachment #8615321 - Attachment is obsolete: true
Attachment #8615596 - Flags: review?(mark.finkle)
Comment on attachment 8615596 [details] [diff] [review]
set extensions.autoupdate.enabled=false during robocop only

Eager to get this checked in, I'm going to re-interpret this as a trivial change over the approved patch and carry the r+.
Attachment #8615596 - Flags: review?(mark.finkle) → review+
https://hg.mozilla.org/mozilla-central/rev/8e3fffc15230
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 41
Flags: needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
You need to log in before you can comment on or make changes to this bug.