Closed Bug 1565794 Opened 5 years ago Closed 5 years ago

Perma Android robocop testAboutPage | testIntentUri | testBrowserDiscovery | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded

Categories

(GeckoView :: General, defect, P5)

defect

Tracking

(firefox-esr60 unaffected, firefox-esr68 unaffected, firefox68 unaffected, firefox69 unaffected, firefox70 verified)

VERIFIED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- verified

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files)

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=256333802&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/cPu2ihHmQrStMIBBEXJHbg/runs/1/artifacts/public/logs/live_backing.log


[task 2019-07-13T11:01:25.130Z] 11:01:25 INFO - TEST-PASS | testAboutPage | Url is correct - about:firefox should equal about:firefox
[task 2019-07-13T11:01:25.130Z] 11:01:25 INFO - Buffered messages finished
[task 2019-07-13T11:01:25.131Z] 11:01:25 WARNING - TEST-UNEXPECTED-FAIL | testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
[task 2019-07-13T11:01:25.131Z] 11:01:25 INFO - 0 ERROR Exception caught during test! - junit.framework.AssertionFailedError: TEST-UNEXPECTED-FAIL | testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
[task 2019-07-13T11:01:25.131Z] 11:01:25 INFO - at junit.framework.Assert.fail(Assert.java:50)
[task 2019-07-13T11:01:25.131Z] 11:01:25 INFO - at org.mozilla.gecko.FennecMochitestAssert._logMochitestResult(FennecMochitestAssert.java:124)
[task 2019-07-13T11:01:25.132Z] 11:01:25 INFO - at org.mozilla.gecko.FennecMochitestAssert.ok(FennecMochitestAssert.java:145)
[task 2019-07-13T11:01:25.132Z] 11:01:25 INFO - at org.mozilla.gecko.FennecNativeActions$GeckoEventExpecter.blockForEvent(FennecNativeActions.java:113)
[task 2019-07-13T11:01:25.133Z] 11:01:25 INFO - at org.mozilla.gecko.FennecNativeActions$GeckoEventExpecter.blockForEvent(FennecNativeActions.java:97)
[task 2019-07-13T11:01:25.133Z] 11:01:25 INFO - at org.mozilla.gecko.tests.OldBaseTest.loadUrlAndWait(OldBaseTest.java:221)
[task 2019-07-13T11:01:25.134Z] 11:01:25 INFO - at org.mozilla.gecko.tests.testAboutPage.testAboutPage(testAboutPage.java:26)
[task 2019-07-13T11:01:25.134Z] 11:01:25 INFO - at java.lang.reflect.Method.invokeNative(Native Method)
[task 2019-07-13T11:01:25.135Z] 11:01:25 INFO - at java.lang.reflect.Method.invoke(Method.java:525)
[task 2019-07-13T11:01:25.135Z] 11:01:25 INFO - at android.test.InstrumentationTestCase.runMethod(InstrumentationTestCase.java:214)
[task 2019-07-13T11:01:25.135Z] 11:01:25 INFO - at android.test.InstrumentationTestCase.runTest(InstrumentationTestCase.java:199)
[task 2019-07-13T11:01:25.136Z] 11:01:25 INFO - at android.test.ActivityInstrumentationTestCase2.runTest(ActivityInstrumentationTestCase2.java:192)
[task 2019-07-13T11:01:25.136Z] 11:01:25 INFO - at org.mozilla.gecko.tests.BaseRobocopTest.runTest(BaseRobocopTest.java:188)
[task 2019-07-13T11:01:25.137Z] 11:01:25 INFO - at junit.framework.TestCase.runBare(TestCase.java:134)
[task 2019-07-13T11:01:25.137Z] 11:01:25 INFO - at junit.framework.TestResult$1.protect(TestResult.java:115)
[task 2019-07-13T11:01:25.137Z] 11:01:25 INFO - at junit.framework.TestResult.runProtected(TestResult.java:133)
[task 2019-07-13T11:01:25.138Z] 11:01:25 INFO - at junit.framework.TestResult.run(TestResult.java:118)
[task 2019-07-13T11:01:25.138Z] 11:01:25 INFO - at junit.framework.TestCase.run(TestCase.java:124)
[task 2019-07-13T11:01:25.139Z] 11:01:25 INFO - at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:191)
[task 2019-07-13T11:01:25.139Z] 11:01:25 INFO - at android.test.AndroidTestRunner.runTest(AndroidTestRunner.java:176)
[task 2019-07-13T11:01:25.139Z] 11:01:25 INFO - at android.test.InstrumentationTestRunner.onStart(InstrumentationTestRunner.java:554)
[task 2019-07-13T11:01:25.140Z] 11:01:25 INFO - at org.mozilla.gecko.FennecInstrumentationTestRunner.onStart(FennecInstrumentationTestRunner.java:66)
[task 2019-07-13T11:01:25.140Z] 11:01:25 INFO - at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1701)
[task 2019-07-13T11:01:25.141Z] 11:01:25 WARNING - TEST-UNEXPECTED-FAIL | testAboutPage | Exception caught - junit.framework.AssertionFailedError: TEST-UNEXPECTED-FAIL | testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded
[task 2019-07-13T11:01:25.141Z] 11:01:25 INFO - TEST-OK | testAboutPage | took 237570ms

Flags: needinfo?(snorp)
Flags: needinfo?(awaseem)
Product: Firefox for Android → GeckoView
Regressed by: 1557096
Summary: Intermittent testAboutPage | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded → Perma Android robocop testAboutPage | testIntentUri | testBrowserDiscovery | GeckoEventExpecter - blockForEvent timeout: Content:DOMContentLoaded

And also Bug 1502276 https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-07-08&endday=2019-07-15&tree=trunk&bug=1502276
Joel, can we go ahead and backout bug 1557096? There are three regressions from it and looks like snorp or awaseem are not responding to the ni.

Flags: needinfo?(jmaher)

yes, I agree we should be backing out bug 1557096, we know the root cause and it is easy to reland code. As it looks like a need to uplift to Beta, do this before we uplift it :)

Flags: needinfo?(jmaher)

Hi Geoff! :cpeterson let me know that you have experience on the robocop tests and may be able to help in fixing this. The same timeout happened in these two places:

  1. https://bugzilla.mozilla.org/show_bug.cgi?id=1565850
  2. https://bugzilla.mozilla.org/show_bug.cgi?id=1495113

This is related to this bug fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1557096
Do you know what could be causing this problem?

Flags: needinfo?(gbrown)

Let's have a look at the logcat:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256341646&repo=mozilla-central&lineNumber=1713
has logcat
https://taskcluster-artifacts.net/bLatnu5cTBar5f-nlqInNQ/0/public/test_info//logcat-emulator-5554.log

07-13 05:02:13.338  1087  1106 I Robocop : {"message":"about:firefox should equal about:firefox","time":1563019333335,"source":"robocop","status":"PASS","test":"testAboutPage","thread":null,"subtest":"Url is correct","action":"test_status","pid":null}
07-13 05:02:13.338  1087  1106 D Robocop : waiting for Content:DOMContentLoaded
07-13 05:02:13.348  1087  1087 D GeckoToolbar: onTabChanged: LOAD_ERROR
07-13 05:02:13.358  1087  1087 D GeckoBrowserApp: BrowserApp.onTabChanged: 1: LOAD_ERROR
07-13 05:02:13.378  1087  1110 D GeckoBrowser: Handled load error: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIWebNavigation.loadURI]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://global/content/elements/browser-custom-element.js :: loadURI/< :: line 989"  data: no]
07-13 05:03:02.178   275   278 D dalvikvm: GC_CONCURRENT freed 713K, 39% free 5609K/9116K, paused 7ms+19ms, total 152ms
07-13 05:03:44.318   275   278 D dalvikvm: GC_CONCURRENT freed 724K, 39% free 5601K/9116K, paused 6ms+24ms, total 153ms
07-13 05:04:16.038   275   278 D dalvikvm: GC_CONCURRENT freed 708K, 39% free 5602K/9116K, paused 6ms+24ms, total 145ms

In this case testAboutPage is trying to open about:firefox and then wait for Content:DOMContentLoaded...but there are load errors reported. After the load errors, the test waits for the event and eventually times out. We should look closer at that load error.

(In reply to Alvina Waseem [:awaseem] from comment #10)

This is related to this bug fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1557096
Do you know what could be causing this problem?

I don't see any obvious link between the changes in bug 1557096 and these test failures. I believe the regression analysis in comment 1 was incorrect. I am retriggering to try to identify the real culprit:

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=robocop&tochange=9056a9e2cf4a32b1386216b49f6d5d2f353bccdb&fromchange=cd685b4cff6dbbfd5eed7dc27c1e169995c93653

Flags: needinfo?(gbrown)

Could be from bug 1560178.

Flags: needinfo?(gijskruitbosch+bugs)

Yes, that's it.

Regressed by: 1560178
No longer regressed by: 1557096

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #13)

Could be from bug 1560178.

This looks like it's me alright, from https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=robocop&tochange=9056a9e2cf4a32b1386216b49f6d5d2f353bccdb&fromchange=4629e855ea33d85f8adcc6e76264ac779b799bd1 . Can someone explain how these tier-2 tests are different from the "normal" robocop ones? Or have robocop tests all been moved to tier2?

Like in bug 1565908, I don't really understand how this has impacted any android runs. All the actual code (as opposed to test) changes are behind XRE_IsE10sParentProcess, so they shouldn't ever run on fennec (which AIUI is what robocop runs against). Who knows about how robocop works (in particular how prefs get set for the user profile fennec uses and how it deals with turning e10s on/off) and how we'd check why that value is wrong here? :gbrown?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gbrown)

Robocop was moved to tier 2 simply because we stopped running it on integration branches after the last release (they run on mozilla-central and try only now).

Robocop should be running against fennec and that should be non-e10s.

Definitely Fennec:

https://taskcluster-artifacts.net/Bhl_HxOsTO6CV-PVAAa5uA/0/public/test_info//logcat-emulator-5554.log

07-17 09:06:12.767 275 437 I ActivityManager: Start proc org.mozilla.fennec_aurora for added application org.mozilla.fennec_aurora: pid=1086 uid=10043 gids={50043, 3003, 1028, 1015, 1006}

and I see the same pid for all gecko activity during each test -- looks single-process.

For the most part, robocop test harness code has no concept of e10s -- robocop is always fennec, always non-e10s. There is an e10s flag passed around when launching the test, and that flag is incorrect (e10s=True), but ultimately that flag is not used.

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=0a0d5a5bbba37de5c5d0377b2e5afded22a04714

should confirm that the harness simply launches something like:

am instrument -e quit_and_finish 1 -e deviceroot /sdcard/tests -e class org.mozilla.gecko.tests.testAboutPage org.mozilla.roboexample.test/org.mozilla.gecko.FennecInstrumentationTestRunner

The user profile code is mostly shared with mochitest, so I wonder if something is being set incorrectly there...is there a particular pref I should check?

Any concerns here? Anything else I can check?

Flags: needinfo?(gbrown)

(In reply to Geoff Brown [:gbrown] from comment #21)

Checking browser.tabs.remote.autostart here:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e5d35ed6716afd0a9e861f2f22ed879d16ecffc4

It's false in the user.js file (repeatedly, even) so I'm even more puzzled than before... :-(

(In reply to Geoff Brown [:gbrown] from comment #21)

Checking browser.tabs.remote.autostart here:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e5d35ed6716afd0a9e861f2f22ed879d16ecffc4

[task 2019-07-17T19:37:12.821Z] 19:37:12 INFO - ZZZ
[task 2019-07-17T19:37:12.822Z] 19:37:12 INFO - {'marionette.log.level': 'Trace', 'browser.tabs.remote.autostart': True}
[task 2019-07-17T19:37:12.822Z] 19:37:12 INFO - ZZZ

Oops! I think I know what's happening...

Depends on: 1566957

So fwiw, I think we should fix robocop to not set the pref to true either way - sounds like gbrown might know what's going wrong there (thanks for your help!). Come to think of it, perhaps I should have seen this coming given bug 1552442 / https://bugzilla.mozilla.org/show_bug.cgi?id=1549855#c6 - as always hindsight is a wonderful thing.

But given how much breaks if that pref is set, after bug 1560178, I'm wondering if we also want a code update for docshell/apprunner that hardcodes us to miss the block I added in bug 1560178 for fennec, at least for the potential uplift to 68esr, because bricking things for users who (for whatever reason, e.g. custom sync setups prior to bug 1538015 being fixed) have the pref set is... not great. I mean, clearly the intent of XRE_IsE10sParentProcess is to only be true if we're (a) in the parent process and (b) in an app and configuration where e10s is being used, but it also clearly is imperfect by relying on this pref, which apparently on fennec could be "pretending" to be doing e10s even when we patently do not support it and will still be loading webpages same-process.

Boris & Nick, how would you feel about just adding an ifdef for the build app in XRE_IsE10sParentProcess and always returning false when built for fennec?

Flags: needinfo?(nalexander)
Flags: needinfo?(bzbarsky)

(In reply to :Gijs (he/him) from comment #24)

So fwiw, I think we should fix robocop to not set the pref to true either way - sounds like gbrown might know what's going wrong there (thanks for your help!). Come to think of it, perhaps I should have seen this coming given bug 1552442 / https://bugzilla.mozilla.org/show_bug.cgi?id=1549855#c6 - as always hindsight is a wonderful thing.

But given how much breaks if that pref is set, after bug 1560178, I'm wondering if we also want a code update for docshell/apprunner that hardcodes us to miss the block I added in bug 1560178 for fennec, at least for the potential uplift to 68esr, because bricking things for users who (for whatever reason, e.g. custom sync setups prior to bug 1538015 being fixed) have the pref set is... not great. I mean, clearly the intent of XRE_IsE10sParentProcess is to only be true if we're (a) in the parent process and (b) in an app and configuration where e10s is being used, but it also clearly is imperfect by relying on this pref, which apparently on fennec could be "pretending" to be doing e10s even when we patently do not support it and will still be loading webpages same-process.

I was about to nit pack about "clearly the intent"... but then I read the method and yes, that's clear. That should be false for Fennec with no way to change it.

Boris & Nick, how would you feel about just adding an ifdef for the build app in XRE_IsE10sParentProcess and always returning false when built for fennec?

I have no particular problem with that, although I think my voice counts little here. snorp (PTO just this moment) and bholley are better folks, if not bz (as flagged).

Flags: needinfo?(nalexander)

I have no problem with adding that ifdef.

We should also consider filing a bug for a better way to implement XRE_IsE10sParentProcess....

Flags: needinfo?(bzbarsky)

Sorry about the wrong investigation here. Thank you Geoff for the work done.

See Also: → 1567848

I'll address the utility function in bug 1567848. I'm going to mark this fixed per :gbrown's work in the other bug given that this has all the relevant intermittent-failure flags etc. Thanks!

Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla70
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: