Closed Bug 1476635 Opened 6 years ago Closed 3 years ago

Intermittent PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) during testActivityStreamPocketReferrer

Categories

(Firefox for Android Graveyard :: Awesomescreen, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: aryx, Unassigned)

References

Details

(Whiteboard: [priority:low][stockwell disabled])

This looks like a regression from bug 1385464.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=rc3&tochange=d75a549c4e930a50b240b0d28f574dd211d6be28&fromchange=b4ac908007b9af47907b57891fc537201ef489a4&selectedJob=188734804

https://treeherder.mozilla.org/logviewer.html#?job_id=188734658&repo=mozilla-inbound&lineNumber=1565

[task 2018-07-18T12:29:44.709Z] 12:29:44     INFO -  TEST-START | testActivityStreamPocketReferrer
[task 2018-07-18T12:30:37.912Z] 12:30:37     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-07-18T12:30:37.912Z] 12:30:37     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-07-18T12:30:58.849Z] 12:30:58     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-07-18T12:30:58.849Z] 12:30:58     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-07-18T12:30:59.253Z] 12:30:59     INFO -  INFO | automation.py | Application ran for: 0:01:16.568152
[task 2018-07-18T12:30:59.253Z] 12:30:59     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpklJI2fpidlog
[task 2018-07-18T12:30:59.872Z] 12:30:59     INFO -  /data/tombstones does not exist; tombstone check skipped
[task 2018-07-18T12:31:01.107Z] 12:31:01  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)
Summary: Intermittent PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) → Intermittent PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) during testActivityStreamPocketReferrer
That Try push is a month old, related code could have changed. Because parts of the Try build data is only stored for 2 weeks, retriggers don't work. The landing of 1385565 on inbound is similar to a push to inbound.
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195

Needinfo :susheel if you think this bug should be re-triaged.
Priority: -- → P5
Blocks: 1167565
Flags: needinfo?(gbrown)
Depends on: 1480546
Note that many of these failures are reported in bug 1167565 (or elsewhere). The actual failure rate is about 2x what the Intermittent Failures Robot reports.
"Missing end of test marker" indicates that the test did not run to completion. Logs also suggest that the process did not crash. Enhanced logging (bug 1480546) makes the issue more clear: The test harness stopped waiting for the test because the top activity was no longer Firefox. That seemed to happen before the test finished.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=19f6d4c9ce7c9a5b10260bcd31bdea8801ca4322
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #8)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=19f6d4c9ce7c9a5b10260bcd31bdea8801ca4322

https://treeherder.mozilla.org/logviewer.html#?job_id=191504753&repo=try&lineNumber=1603-1612

[task 2018-08-02T01:53:31.219Z] 01:53:31     INFO -  TEST-START | testActivityStreamPocketReferrer
[task 2018-08-02T01:54:14.329Z] 01:54:14     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-08-02T01:54:14.329Z] 01:54:14     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-08-02T01:54:35.267Z] 01:54:35     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-08-02T01:54:35.267Z] 01:54:35     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-08-02T01:54:35.267Z] 01:54:35     INFO -  remoteautomation wait for org.mozilla.fennec_aurora complete; top activity=com.android.launcher
[task 2018-08-02T01:54:35.572Z] 01:54:35     INFO -  INFO | automation.py | Application ran for: 0:01:17.499283
[task 2018-08-02T01:54:35.572Z] 01:54:35     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpwcZiFxpidlog
[task 2018-08-02T01:54:36.194Z] 01:54:36     INFO -  /data/tombstones does not exist; tombstone check skipped
[task 2018-08-02T01:54:37.432Z] 01:54:37  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)

https://treeherder.mozilla.org/logviewer.html#?job_id=191504753&repo=try&lineNumber=2127

[task 2018-08-02T01:54:38.080Z] 01:54:38     INFO -  08-01 18:54:14.815 I/ActivityManager(  277): moveTaskToBack: 15
[task 2018-08-02T01:54:38.080Z] 01:54:38     INFO -  08-01 18:54:14.845 W/GeckoEventDispatcher( 3662): No listener for Session:DataWritten
[task 2018-08-02T01:54:38.080Z] 01:54:38     INFO -  08-01 18:54:15.105 I/WindowManager(  277): Screenshot Window{4190bc58 u0 org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp} was all black! mSurfaceLayer=21015 minLayer=21015 maxLayer=21015
[task 2018-08-02T01:54:38.081Z] 01:54:38     INFO -  08-01 18:54:15.295 I/dalvikvm-heap(  277): Grow heap (frag case) to 6.632MB for 380496-byte allocation
[task 2018-08-02T01:54:38.081Z] 01:54:38     INFO -  08-01 18:54:15.435 I/Choreographer( 3662): Skipped 36 frames!  The application may be doing too much work on its main thread.
[task 2018-08-02T01:54:38.081Z] 01:54:38     INFO -  08-01 18:54:15.675 W/EGL_emulation(  407): eglSurfaceAttrib not implemented
[task 2018-08-02T01:54:38.082Z] 01:54:38     INFO -  08-01 18:54:16.135 D/GeckoBrowserProvider( 3662): Expiring history.
[task 2018-08-02T01:54:38.082Z] 01:54:38     INFO -  08-01 18:54:16.145 D/GeckoBrowserProvider( 3662): Expiring highlights blocklist.
[task 2018-08-02T01:54:38.083Z] 01:54:38     INFO -  08-01 18:54:16.145 D/GeckoBrowserProvider( 3662): Expiring thumbnails.
[task 2018-08-02T01:54:38.083Z] 01:54:38     INFO -  08-01 18:54:16.165 D/GeckoNetworkManager( 3662): Incoming event stop for state OnWithListeners -> OffWithListeners

...

[task 2018-08-02T01:54:38.088Z] 01:54:38     INFO -  08-01 18:54:25.118 D/Robocop ( 3662): blockForEvent timeout: Content:DOMContentLoaded
[task 2018-08-02T01:54:38.088Z] 01:54:38     INFO -  08-01 18:54:25.125 I/Robocop ( 3662): EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-08-02T01:54:38.089Z] 01:54:38     INFO -  08-01 18:54:25.125 D/Robocop ( 3662): blockForEvent timeout: Content:DOMTitleChanged
[task 2018-08-02T01:54:38.089Z] 01:54:38     INFO -  08-01 18:54:25.125 I/Robocop ( 3662): EventExpecter: no longer listening for Content:DOMTitleChanged
You can also see that the browser instance, pid 3662:

... 08-01 18:54:25.125 I/Robocop ( 3662) ...

is still alive after the test run:

[task 2018-08-02T01:54:38.885Z] 01:54:38     INFO -       USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
...
[task 2018-08-02T01:54:38.896Z] 01:54:38     INFO -       u0_a23    407   38    317688 28712 ffffffff 400433dc S com.android.launcher
...
[task 2018-08-02T01:54:38.900Z] 01:54:38     INFO -       u0_a43    3662  38    685356 111548 ffffffff 400433dc S org.mozilla.fennec_aurora
Other failures on the same try push are consistent:
 - the test run ends early because com.android.launcher is in the foreground
 - ActivityManager reports "moveTaskToBack: 15" in logcat

:mcomella - Do you have any idea what's happening here? Can you take a look?
Flags: needinfo?(michael.l.comella)
Susheel, can you triage this?
Flags: needinfo?(michael.l.comella) → needinfo?(sdaswani)
Petru, what are you thoughts on this?
Flags: needinfo?(sdaswani) → needinfo?(petru.lingurar)
There are 31 occurrences for this failure in the last 7 days. This is happening on android-em-4-3 opt builds.

Any updates on this?
Whiteboard: [stockwell needswork]
This has a high occurrence rate, 4/5 retriggers are failing. 

petru, Susheel are there any updates here?
Flags: needinfo?(sdaswani)
Will investigate more on this based on Geoff findings.
Flags: needinfo?(sdaswani)
Flags: needinfo?(petru.lingurar)
Based on a try for this with more logs [1] I see that indeed the test fails because the launcher somehow is the foreground app.
> 11:28:11     INFO -  TEST-START | testActivityStreamPocketReferrer
> 11:28:54     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
> 11:28:54     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
> 11:29:15     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
> 11:29:15     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
> 11:29:15     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=com.android.launcher
> 11:29:15     INFO -  INFO | automation.py | Application ran for: 0:01:17.485694
> 11:29:15     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpWklY1zpidlog
> 11:29:16     INFO -  /data/tombstones does not exist; tombstone check skipped
> 11:29:17  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)

but this happens before this [2] "NavigationHelper.goBack()" (the first goBack() from the first test method) is called which puts the activity to background.
> 11:29:18     INFO -  08-31 04:28:53.907 D/Robotium( 3715): goBack()
> 11:29:18     INFO -  08-31 04:28:53.907 D/Robotium( 3715): hideSoftKeyboard()
> 11:29:18     INFO -  08-31 04:28:55.278 I/ActivityManager(  277): moveTaskToBack: 15

The behaviour is the same in other failed tests.


[1] https://taskcluster-artifacts.net/IOIsaNhqSTSLieakG_1qbg/0/public/logs/live_backing.log
[2] https://dxr.mozilla.org/mozilla-central/rev/c2e3be6a1dd352b969a45f0b85e87674e24ad284/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testActivityStreamPocketReferrer.java#59
Going on with the premise that the test presses back, and this action finishes the activity I added a check [1] for
> Tabs.getInstance().getSelectedTab().canDoBack() 
to guard against this scenario and now I see that
- the issue that this ticket refers to does not appear anymore (moveTaskToBack not present in logs anymore)
- the "testActivityStreamPocketReferrer" test can still fail with other intermittent causes [2]
- for each now failing test I see a screenshot with a blank webpage [3]. So maybe there's an issue with how Gecko loads the js file that the test is based on.

[1] https://hg.mozilla.org/try/rev/e7199375391e0271a3e403accf541f27fbf347d7#l1.49
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=42ff35fb2746744a93c82c5b170e2a834fe786f2&selectedJob=197555645
[3] https://taskcluster-artifacts.net/CLTZTengQ-S4ltV7dAV_Pg/0/public/test_info//robocop-screenshot-org.mozilla.gecko.tests.testActivityStreamPocketReferrer.jpg
(In reply to Petru-Mugurel Lingurar[:petru] from comment #27)
> - for each now failing test I see a screenshot with a blank webpage [3]. So
> maybe there's an issue with how Gecko loads the js file that the test is
> based on.

Those robocop screenshots are not able to capture web content: They always show blank content, even when a page has loaded successfully.
Trying to find more evidence to support that the .js isn't loaded and that's why pressing back actually closes the activity I logged the result from "Session:GetHistory" [1] and in the one case where the test failed [2] I see
> 11:08:28.995 E/testActivityStreamPocke( 3752): No history!
> [
>              As in other successful tests, it should be
>        History item: {url=http://mochi.test:8888/tests/robocop/robocop_javascript.html?slug=1536599365428&path=testActivityStreamPocketReferrer.js, title=testActivityStreamPocketReferrer.js, selected=true}
>        History item: {url=about:home, title=Nightly Home, selected=false}
> ]
> 11:08:30.746 D/Robotium( 3752): goBack()
> 11:08:32.525 I/ActivityManager(  282): moveTaskToBack: 15

[1] https://hg.mozilla.org/try/rev/ee3c68a16800259c89c89cd318dffcaa0e60bae9#l1.84
[2] https://treeherder.mozilla.org/#/jobs?repo=try&author=plingurar@mozilla.com&selectedJob=198474843

Jim, do you know why something like this would happen? Not really sure how to debug this and how to proceed.
Flags: needinfo?(nchen)
Whiteboard: [stockwell disable-recommended] → [priority:low][stockwell disable-recommended]
I'm not sure of the exact cause. I think maybe the page load isn't complete before we call goBack(). A lot of other tests assert the title of the page has changed [1], so maybe we should do that in this test as well.

[1] https://searchfox.org/mozilla-central/rev/6c82481caa506a240a626bb44a2b8cbe0eedb3a0/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testHistory.java#26
Flags: needinfo?(nchen)
blockForReadyAndLoadJS calls NavigationHelper.enterAndLoadUrl, which calls sToolbar.enterEditingMode().enterUrl(url).commitEditingMode(), the latter of which in turn calls WaitHelper.waitForPageLoad. This looks loke it should wait for some page load events: https://dxr.mozilla.org/mozilla-central/rev/ac9f1219d11bf1a56ec1ace8e3ba9ff113b5cacb/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/helpers/WaitHelper.java#124-125

I might have overlooked something, though, or there might be some bugs in the current implementation?
This bug has failed 46 time in the last 7 days. Occurs android-em-4-3-armv7-api16 on opt.

Recent log:
https://treeherder.mozilla.org/logviewer.html#?job_id=202109929&repo=autoland&lineNumber=1501

 INFO -  SimpleTest START
[task 2018-09-28T02:55:47.999Z] 02:55:47     INFO -  TEST-START | testPictureLinkContextMenu
[task 2018-09-28T02:56:10.045Z] 02:56:10     INFO -  GECKO | EventExpecter: no longer listening for Gecko:Ready
[task 2018-09-28T02:56:20.259Z] 02:56:20     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-09-28T02:56:30.474Z] 02:56:30     INFO -  GECKO | PaintExpecter: no longer listening for events
[task 2018-09-28T02:59:06.642Z] 02:59:06     INFO -  GECKO | EventExpecter: no longer listening for Tab:Added
[task 2018-09-28T02:59:16.862Z] 02:59:16     INFO -  GECKO | EventExpecter: no longer listening for Tab:Added
[task 2018-09-28T02:59:47.507Z] 02:59:47     INFO -  Failed to get top activity, retrying, once...
[task 2018-09-28T02:59:47.935Z] 02:59:47     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=None
[task 2018-09-28T02:59:48.142Z] 02:59:48     INFO -  INFO | automation.py | Application ran for: 0:04:13.487150
[task 2018-09-28T02:59:48.142Z] 02:59:48     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmp7P4bfZpidlog
[task 2018-09-28T02:59:50.934Z] 02:59:50  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)
[task 2018-09-28T02:59:51.643Z] 02:59:51     INFO -  09-27 19:59:43.299 W/Resources( 2523): Preloaded drawable resource #0x10801de (android:drawable/btn_star_on_normal_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.644Z] 02:59:51     INFO -  09-27 19:59:43.308 W/Resources( 2523): Preloaded drawable resource #0x10801da (android:drawable/btn_star_on_disabled_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.645Z] 02:59:51     INFO -  09-27 19:59:43.318 W/Resources( 2523): Preloaded drawable resource #0x10801d0 (android:drawable/btn_star_off_disabled_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.646Z] 02:59:51     INFO -  09-27 19:59:43.328 W/Resources( 2523): Preloaded drawable resource #0x10801e0 (android:drawable/btn_star_on_pressed_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.647Z] 02:59:51     INFO -  09-27 19:59:43.338 W/Resources( 2523): Preloaded drawable resource #0x10801d6 (android:drawable/btn_star_off_pressed_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.648Z] 02:59:51     INFO -  09-27 19:59:43.338 W/Resources( 2523): Preloaded drawable resource #0x10801dc (android:drawable/btn_star_on_focused_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.648Z] 02:59:51     INFO -  09-27 19:59:43.348 W/Resources( 2523): Preloaded drawable resource #0x10801d2 (android:drawable/btn_star_off_focused_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.651Z] 02:59:51     INFO -  09-27 19:59:43.358 W/Resources( 2523): Preloaded drawable resource #0x10801d8 (android:drawable/btn_star_on_disabled_focused_holo_light) that varies with configuration!!

sdaswani: Can you please take a look at this bug?
Flags: needinfo?(sdaswani)
Whiteboard: [priority:low][stockwell disable-recommended] → [priority:low][stockwell disable-recommended][stockwell needswork]
Looked at this again today.
The only correlation I see is no history when asking for "Session:GetHistory" but i don't know why that would happen. And only on Android 4.3.
As such, I think this test should be skipped on 4.3.
Flags: needinfo?(sdaswani)
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9008729b15e5
Disable robocop testActivityStreamPocketReferrer; r=me,a=test-only
Keywords: leave-open
Whiteboard: [priority:low][stockwell disable-recommended][stockwell needswork] → [priority:low][stockwell disabled]
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe4d9985684b
Disable robocop testActivityStreamPocketReferrer, really; r=me,a=test-only
Priority: P5 → P1

Downgrading priority from P1 to P2

Priority: P1 → P2
We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.