Perma Android robocop testAboutPage | testIntentUri | testBrowserDiscovery | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
Categories
(GeckoView :: General, defect, P5)
Tracking
(firefox-esr60 unaffected, firefox-esr68 unaffected, firefox68 unaffected, firefox69 unaffected, firefox70 verified)
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | --- | unaffected |
firefox70 | --- | verified |
People
(Reporter: intermittent-bug-filer, Assigned: gbrown)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(2 files)
Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=256333802&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/cPu2ihHmQrStMIBBEXJHbg/runs/1/artifacts/public/logs/live_backing.log
[task 2019-07-13T11:01:25.130Z] 11:01:25 INFO - TEST-PASS | testAboutPage | Url is correct - about:firefox should equal about:firefox
[task 2019-07-13T11:01:25.130Z] 11:01:25 INFO - Buffered messages finished
[task 2019-07-13T11:01:25.131Z] 11:01:25 WARNING - TEST-UNEXPECTED-FAIL | testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
[task 2019-07-13T11:01:25.131Z] 11:01:25 INFO - 0 ERROR Exception caught during test! - junit.framework.AssertionFailedError: TEST-UNEXPECTED-FAIL | testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
[task 2019-07-13T11:01:25.131Z] 11:01:25 INFO - at junit.framework.Assert.fail(Assert.java:50)
[task 2019-07-13T11:01:25.131Z] 11:01:25 INFO - at org.mozilla.gecko.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:124)
[task 2019-07-13T11:01:25.132Z] 11:01:25 INFO - at org.mozilla.gecko.FennecMochitestAssert.ok(FennecMochitestAssert.java:145)
[task 2019-07-13T11:01:25.132Z] 11:01:25 INFO - at org.mozilla.gecko.FennecNativeActions$GeckoEventExpecter.blockForEvent(FennecNativeActions.java:113)
[task 2019-07-13T11:01:25.133Z] 11:01:25 INFO - at org.mozilla.gecko.FennecNativeActions$GeckoEventExpecter.blockForEvent(FennecNativeActions.java:97)
[task 2019-07-13T11:01:25.133Z] 11:01:25 INFO - at org.mozilla.gecko.tests.OldBaseTest.loadUrlAndWait(OldBaseTest.java:221)
[task 2019-07-13T11:01:25.134Z] 11:01:25 INFO - at org.mozilla.gecko.tests.testAboutPage.testAboutPage(testAboutPage.java:26)
[task 2019-07-13T11:01:25.134Z] 11:01:25 INFO - at java.lang.reflect.Method.invokeNative(Native Method)
[task 2019-07-13T11:01:25.135Z] 11:01:25 INFO - at java.lang.reflect.Method.invoke(Method.java:525)
[task 2019-07-13T11:01:25.135Z] 11:01:25 INFO - at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:214)
[task 2019-07-13T11:01:25.135Z] 11:01:25 INFO - at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:199)
[task 2019-07-13T11:01:25.136Z] 11:01:25 INFO - at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:192)
[task 2019-07-13T11:01:25.136Z] 11:01:25 INFO - at org.mozilla.gecko.tests.BaseRobocopTest.runTest(BaseRobocopTest.java:188)
[task 2019-07-13T11:01:25.137Z] 11:01:25 INFO - at junit.framework.TestCase.runBare(TestCase.java:134)
[task 2019-07-13T11:01:25.137Z] 11:01:25 INFO - at junit.framework.TestResult$1.protect(TestResult.java:115)
[task 2019-07-13T11:01:25.137Z] 11:01:25 INFO - at junit.framework.TestResult.runProtected(TestResult.java:133)
[task 2019-07-13T11:01:25.138Z] 11:01:25 INFO - at junit.framework.TestResult.run(TestResult.java:118)
[task 2019-07-13T11:01:25.138Z] 11:01:25 INFO - at junit.framework.TestCase.run(TestCase.java:124)
[task 2019-07-13T11:01:25.139Z] 11:01:25 INFO - at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:191)
[task 2019-07-13T11:01:25.139Z] 11:01:25 INFO - at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:176)
[task 2019-07-13T11:01:25.139Z] 11:01:25 INFO - at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:554)
[task 2019-07-13T11:01:25.140Z] 11:01:25 INFO - at org.mozilla.gecko.FennecInstrumentationTestRunner.onStart(FennecInstrumentationTestRunner.java:66)
[task 2019-07-13T11:01:25.140Z] 11:01:25 INFO - at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1701)
[task 2019-07-13T11:01:25.141Z] 11:01:25 WARNING - TEST-UNEXPECTED-FAIL | testAboutPage | Exception caught - junit.framework.AssertionFailedError: TEST-UNEXPECTED-FAIL | testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
[task 2019-07-13T11:01:25.141Z] 11:01:25 INFO - TEST-OK | testAboutPage | took 237570ms
Comment 1•5 years ago
|
||
Looks like this is a regression from Bug 1557096. https://hg.mozilla.org/integration/autoland/rev/4629e855ea33
It permafails on central on Android robocop tests: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=robocop&fromchange=f1d381735279b97f3f130e4fd54d3356bfad486c&selectedJob=256333802
Failure logs: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256341646&repo=mozilla-central&lineNumber=1713
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256333312&repo=mozilla-central&lineNumber=1454
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256333224&repo=mozilla-central&lineNumber=1688
Alvina, James, please take a look over this.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 4•5 years ago
|
||
(In reply to Cosmin Sabou [:CosminS] from comment #1)
Looks like this is a regression from Bug 1557096. https://hg.mozilla.org/integration/autoland/rev/4629e855ea33
It permafails on central on Android robocop tests: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=robocop&fromchange=f1d381735279b97f3f130e4fd54d3356bfad486c&selectedJob=256333802
Failure logs: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256341646&repo=mozilla-central&lineNumber=1713
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256333312&repo=mozilla-central&lineNumber=1454
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256333224&repo=mozilla-central&lineNumber=1688
Alvina, James, please take a look over this.
This seems to have increased failures for bug 1495113 too.
Comment 5•5 years ago
|
||
And also Bug 1502276 https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-07-08&endday=2019-07-15&tree=trunk&bug=1502276
Joel, can we go ahead and backout bug 1557096? There are three regressions from it and looks like snorp or awaseem are not responding to the ni.
Comment hidden (Intermittent Failures Robot) |
Comment 7•5 years ago
|
||
yes, I agree we should be backing out bug 1557096, we know the root cause and it is easy to reland code. As it looks like a need to uplift to Beta, do this before we uplift it :)
Comment 8•5 years ago
|
||
(In reply to Cosmin Sabou [:CosminS] from comment #1)
Looks like this is a regression from Bug 1557096. https://hg.mozilla.org/integration/autoland/rev/4629e855ea33
It permafails on central on Android robocop tests: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=robocop&fromchange=f1d381735279b97f3f130e4fd54d3356bfad486c&selectedJob=256333802
Failure logs: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256341646&repo=mozilla-central&lineNumber=1713
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256333312&repo=mozilla-central&lineNumber=1454
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256333224&repo=mozilla-central&lineNumber=1688
Alvina, James, please take a look over this.
Ah, definitely back it out and I'll be sure to fix it. I'm not really sure why these tests fail now, but I'll look into it. James is on PTO this week btw.
Comment hidden (Intermittent Failures Robot) |
Comment 10•5 years ago
•
|
||
Hi Geoff! :cpeterson let me know that you have experience on the robocop tests and may be able to help in fixing this. The same timeout happened in these two places:
- https://bugzilla.mozilla.org/show_bug.cgi?id=1565850
- https://bugzilla.mozilla.org/show_bug.cgi?id=1495113
This is related to this bug fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1557096
Do you know what could be causing this problem?
Assignee | ||
Comment 11•5 years ago
|
||
Let's have a look at the logcat:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256341646&repo=mozilla-central&lineNumber=1713
has logcat
https://taskcluster-artifacts.net/bLatnu5cTBar5f-nlqInNQ/0/public/test_info//logcat-emulator-5554.log
07-13 05:02:13.338 1087 1106 I Robocop : {"message":"about:firefox should equal about:firefox","time":1563019333335,"source":"robocop","status":"PASS","test":"testAboutPage","thread":null,"subtest":"Url is correct","action":"test_status","pid":null}
07-13 05:02:13.338 1087 1106 D Robocop : waiting for Content:DOMContentLoaded
07-13 05:02:13.348 1087 1087 D GeckoToolbar: onTabChanged: LOAD_ERROR
07-13 05:02:13.358 1087 1087 D GeckoBrowserApp: BrowserApp.onTabChanged: 1: LOAD_ERROR
07-13 05:02:13.378 1087 1110 D GeckoBrowser: Handled load error: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIWebNavigation.loadURI]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://global/content/elements/browser-custom-element.js :: loadURI/< :: line 989" data: no]
07-13 05:03:02.178 275 278 D dalvikvm: GC_CONCURRENT freed 713K, 39% free 5609K/9116K, paused 7ms+19ms, total 152ms
07-13 05:03:44.318 275 278 D dalvikvm: GC_CONCURRENT freed 724K, 39% free 5601K/9116K, paused 6ms+24ms, total 153ms
07-13 05:04:16.038 275 278 D dalvikvm: GC_CONCURRENT freed 708K, 39% free 5602K/9116K, paused 6ms+24ms, total 145ms
In this case testAboutPage is trying to open about:firefox and then wait for Content:DOMContentLoaded...but there are load errors reported. After the load errors, the test waits for the event and eventually times out. We should look closer at that load error.
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to Alvina Waseem [:awaseem] from comment #10)
This is related to this bug fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1557096
Do you know what could be causing this problem?
I don't see any obvious link between the changes in bug 1557096 and these test failures. I believe the regression analysis in comment 1 was incorrect. I am retriggering to try to identify the real culprit:
Assignee | ||
Comment 14•5 years ago
|
||
Yes, that's it.
Comment 15•5 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #13)
Could be from bug 1560178.
This looks like it's me alright, from https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=robocop&tochange=9056a9e2cf4a32b1386216b49f6d5d2f353bccdb&fromchange=4629e855ea33d85f8adcc6e76264ac779b799bd1 . Can someone explain how these tier-2 tests are different from the "normal" robocop ones? Or have robocop tests all been moved to tier2?
Like in bug 1565908, I don't really understand how this has impacted any android runs. All the actual code (as opposed to test) changes are behind XRE_IsE10sParentProcess
, so they shouldn't ever run on fennec (which AIUI is what robocop runs against). Who knows about how robocop works (in particular how prefs get set for the user profile fennec uses and how it deals with turning e10s on/off) and how we'd check why that value is wrong here? :gbrown?
Assignee | ||
Comment 16•5 years ago
|
||
Robocop was moved to tier 2 simply because we stopped running it on integration branches after the last release (they run on mozilla-central and try only now).
Robocop should be running against fennec and that should be non-e10s.
Assignee | ||
Comment 17•5 years ago
|
||
Definitely Fennec:
07-17 09:06:12.767 275 437 I ActivityManager: Start proc org.mozilla.fennec_aurora for added application org.mozilla.fennec_aurora: pid=1086 uid=10043 gids={50043, 3003, 1028, 1015, 1006}
and I see the same pid for all gecko activity during each test -- looks single-process.
Assignee | ||
Comment 18•5 years ago
|
||
For the most part, robocop test harness code has no concept of e10s -- robocop is always fennec, always non-e10s. There is an e10s flag passed around when launching the test, and that flag is incorrect (e10s=True), but ultimately that flag is not used.
should confirm that the harness simply launches something like:
am instrument -e quit_and_finish 1 -e deviceroot /sdcard/tests -e class org.mozilla.gecko.tests.testAboutPage org.mozilla.roboexample.test/org.mozilla.gecko.FennecInstrumentationTestRunner
The user profile code is mostly shared with mochitest, so I wonder if something is being set incorrectly there...is there a particular pref I should check?
Assignee | ||
Comment 19•5 years ago
|
||
Assignee | ||
Comment 20•5 years ago
|
||
Any concerns here? Anything else I can check?
Assignee | ||
Comment 21•5 years ago
|
||
Checking browser.tabs.remote.autostart here:
Comment 22•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #21)
Checking browser.tabs.remote.autostart here:
It's false in the user.js file (repeatedly, even) so I'm even more puzzled than before... :-(
Assignee | ||
Comment 23•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #21)
Checking browser.tabs.remote.autostart here:
[task 2019-07-17T19:37:12.821Z] 19:37:12 INFO - ZZZ
[task 2019-07-17T19:37:12.822Z] 19:37:12 INFO - {'marionette.log.level': 'Trace', 'browser.tabs.remote.autostart': True}
[task 2019-07-17T19:37:12.822Z] 19:37:12 INFO - ZZZ
Oops! I think I know what's happening...
Comment 24•5 years ago
|
||
So fwiw, I think we should fix robocop to not set the pref to true either way - sounds like gbrown might know what's going wrong there (thanks for your help!). Come to think of it, perhaps I should have seen this coming given bug 1552442 / https://bugzilla.mozilla.org/show_bug.cgi?id=1549855#c6 - as always hindsight is a wonderful thing.
But given how much breaks if that pref is set, after bug 1560178, I'm wondering if we also want a code update for docshell/apprunner that hardcodes us to miss the block I added in bug 1560178 for fennec, at least for the potential uplift to 68esr, because bricking things for users who (for whatever reason, e.g. custom sync setups prior to bug 1538015 being fixed) have the pref set is... not great. I mean, clearly the intent of XRE_IsE10sParentProcess
is to only be true if we're (a) in the parent process and (b) in an app and configuration where e10s is being used, but it also clearly is imperfect by relying on this pref, which apparently on fennec could be "pretending" to be doing e10s even when we patently do not support it and will still be loading webpages same-process.
Boris & Nick, how would you feel about just adding an ifdef for the build app in XRE_IsE10sParentProcess
and always returning false
when built for fennec?
Comment 25•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #24)
So fwiw, I think we should fix robocop to not set the pref to true either way - sounds like gbrown might know what's going wrong there (thanks for your help!). Come to think of it, perhaps I should have seen this coming given bug 1552442 / https://bugzilla.mozilla.org/show_bug.cgi?id=1549855#c6 - as always hindsight is a wonderful thing.
But given how much breaks if that pref is set, after bug 1560178, I'm wondering if we also want a code update for docshell/apprunner that hardcodes us to miss the block I added in bug 1560178 for fennec, at least for the potential uplift to 68esr, because bricking things for users who (for whatever reason, e.g. custom sync setups prior to bug 1538015 being fixed) have the pref set is... not great. I mean, clearly the intent of
XRE_IsE10sParentProcess
is to only be true if we're (a) in the parent process and (b) in an app and configuration where e10s is being used, but it also clearly is imperfect by relying on this pref, which apparently on fennec could be "pretending" to be doing e10s even when we patently do not support it and will still be loading webpages same-process.
I was about to nit pack about "clearly the intent"... but then I read the method and yes, that's clear. That should be false for Fennec with no way to change it.
Boris & Nick, how would you feel about just adding an ifdef for the build app in
XRE_IsE10sParentProcess
and always returningfalse
when built for fennec?
I have no particular problem with that, although I think my voice counts little here. snorp (PTO just this moment) and bholley are better folks, if not bz (as flagged).
Comment 26•5 years ago
|
||
I have no problem with adding that ifdef.
We should also consider filing a bug for a better way to implement XRE_IsE10sParentProcess
....
Comment 27•5 years ago
•
|
||
Sorry about the wrong investigation here. Thank you Geoff for the work done.
Comment hidden (Intermittent Failures Robot) |
Comment 29•5 years ago
|
||
I'll address the utility function in bug 1567848. I'm going to mark this fixed per :gbrown's work in the other bug given that this has all the relevant intermittent-failure flags etc. Thanks!
Updated•5 years ago
|
Updated•2 years ago
|
Description
•