Intermittent PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) during testActivityStreamPocketReferrer

NEW
Unassigned

Status

()

defect
P1
normal
10 months ago
6 months ago

People

(Reporter: aryx, Unassigned)

Tracking

(Blocks 1 bug, {leave-open})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [priority:low][stockwell disabled])

This looks like a regression from bug 1385464.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=rc3&tochange=d75a549c4e930a50b240b0d28f574dd211d6be28&fromchange=b4ac908007b9af47907b57891fc537201ef489a4&selectedJob=188734804

https://treeherder.mozilla.org/logviewer.html#?job_id=188734658&repo=mozilla-inbound&lineNumber=1565

[task 2018-07-18T12:29:44.709Z] 12:29:44     INFO -  TEST-START | testActivityStreamPocketReferrer
[task 2018-07-18T12:30:37.912Z] 12:30:37     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-07-18T12:30:37.912Z] 12:30:37     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-07-18T12:30:58.849Z] 12:30:58     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-07-18T12:30:58.849Z] 12:30:58     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-07-18T12:30:59.253Z] 12:30:59     INFO -  INFO | automation.py | Application ran for: 0:01:16.568152
[task 2018-07-18T12:30:59.253Z] 12:30:59     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpklJI2fpidlog
[task 2018-07-18T12:30:59.872Z] 12:30:59     INFO -  /data/tombstones does not exist; tombstone check skipped
[task 2018-07-18T12:31:01.107Z] 12:31:01  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)
Summary: Intermittent PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) → Intermittent PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?) during testActivityStreamPocketReferrer
That Try push is a month old, related code could have changed. Because parts of the Try build data is only stored for 2 weeks, retriggers don't work. The landing of 1385565 on inbound is similar to a push to inbound.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195

Needinfo :susheel if you think this bug should be re-triaged.
Priority: -- → P5
Comment hidden (Intermittent Failures Robot)

Updated

10 months ago
Blocks: 1167565

Updated

10 months ago
Flags: needinfo?(gbrown)

Updated

10 months ago
Depends on: 1480546
Note that many of these failures are reported in bug 1167565 (or elsewhere). The actual failure rate is about 2x what the Intermittent Failures Robot reports.
"Missing end of test marker" indicates that the test did not run to completion. Logs also suggest that the process did not crash. Enhanced logging (bug 1480546) makes the issue more clear: The test harness stopped waiting for the test because the top activity was no longer Firefox. That seemed to happen before the test finished.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=19f6d4c9ce7c9a5b10260bcd31bdea8801ca4322
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #8)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=19f6d4c9ce7c9a5b10260bcd31bdea8801ca4322

https://treeherder.mozilla.org/logviewer.html#?job_id=191504753&repo=try&lineNumber=1603-1612

[task 2018-08-02T01:53:31.219Z] 01:53:31     INFO -  TEST-START | testActivityStreamPocketReferrer
[task 2018-08-02T01:54:14.329Z] 01:54:14     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-08-02T01:54:14.329Z] 01:54:14     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-08-02T01:54:35.267Z] 01:54:35     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-08-02T01:54:35.267Z] 01:54:35     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
[task 2018-08-02T01:54:35.267Z] 01:54:35     INFO -  remoteautomation wait for org.mozilla.fennec_aurora complete; top activity=com.android.launcher
[task 2018-08-02T01:54:35.572Z] 01:54:35     INFO -  INFO | automation.py | Application ran for: 0:01:17.499283
[task 2018-08-02T01:54:35.572Z] 01:54:35     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpwcZiFxpidlog
[task 2018-08-02T01:54:36.194Z] 01:54:36     INFO -  /data/tombstones does not exist; tombstone check skipped
[task 2018-08-02T01:54:37.432Z] 01:54:37  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)

https://treeherder.mozilla.org/logviewer.html#?job_id=191504753&repo=try&lineNumber=2127

[task 2018-08-02T01:54:38.080Z] 01:54:38     INFO -  08-01 18:54:14.815 I/ActivityManager(  277): moveTaskToBack: 15
[task 2018-08-02T01:54:38.080Z] 01:54:38     INFO -  08-01 18:54:14.845 W/GeckoEventDispatcher( 3662): No listener for Session:DataWritten
[task 2018-08-02T01:54:38.080Z] 01:54:38     INFO -  08-01 18:54:15.105 I/WindowManager(  277): Screenshot Window{4190bc58 u0 org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp} was all black! mSurfaceLayer=21015 minLayer=21015 maxLayer=21015
[task 2018-08-02T01:54:38.081Z] 01:54:38     INFO -  08-01 18:54:15.295 I/dalvikvm-heap(  277): Grow heap (frag case) to 6.632MB for 380496-byte allocation
[task 2018-08-02T01:54:38.081Z] 01:54:38     INFO -  08-01 18:54:15.435 I/Choreographer( 3662): Skipped 36 frames!  The application may be doing too much work on its main thread.
[task 2018-08-02T01:54:38.081Z] 01:54:38     INFO -  08-01 18:54:15.675 W/EGL_emulation(  407): eglSurfaceAttrib not implemented
[task 2018-08-02T01:54:38.082Z] 01:54:38     INFO -  08-01 18:54:16.135 D/GeckoBrowserProvider( 3662): Expiring history.
[task 2018-08-02T01:54:38.082Z] 01:54:38     INFO -  08-01 18:54:16.145 D/GeckoBrowserProvider( 3662): Expiring highlights blocklist.
[task 2018-08-02T01:54:38.083Z] 01:54:38     INFO -  08-01 18:54:16.145 D/GeckoBrowserProvider( 3662): Expiring thumbnails.
[task 2018-08-02T01:54:38.083Z] 01:54:38     INFO -  08-01 18:54:16.165 D/GeckoNetworkManager( 3662): Incoming event stop for state OnWithListeners -> OffWithListeners

...

[task 2018-08-02T01:54:38.088Z] 01:54:38     INFO -  08-01 18:54:25.118 D/Robocop ( 3662): blockForEvent timeout: Content:DOMContentLoaded
[task 2018-08-02T01:54:38.088Z] 01:54:38     INFO -  08-01 18:54:25.125 I/Robocop ( 3662): EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-08-02T01:54:38.089Z] 01:54:38     INFO -  08-01 18:54:25.125 D/Robocop ( 3662): blockForEvent timeout: Content:DOMTitleChanged
[task 2018-08-02T01:54:38.089Z] 01:54:38     INFO -  08-01 18:54:25.125 I/Robocop ( 3662): EventExpecter: no longer listening for Content:DOMTitleChanged
You can also see that the browser instance, pid 3662:

... 08-01 18:54:25.125 I/Robocop ( 3662) ...

is still alive after the test run:

[task 2018-08-02T01:54:38.885Z] 01:54:38     INFO -       USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
...
[task 2018-08-02T01:54:38.896Z] 01:54:38     INFO -       u0_a23    407   38    317688 28712 ffffffff 400433dc S com.android.launcher
...
[task 2018-08-02T01:54:38.900Z] 01:54:38     INFO -       u0_a43    3662  38    685356 111548 ffffffff 400433dc S org.mozilla.fennec_aurora
Other failures on the same try push are consistent:
 - the test run ends early because com.android.launcher is in the foreground
 - ActivityManager reports "moveTaskToBack: 15" in logcat

:mcomella - Do you have any idea what's happening here? Can you take a look?
Flags: needinfo?(michael.l.comella)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Susheel, can you triage this?
Flags: needinfo?(michael.l.comella) → needinfo?(sdaswani)

Comment 15

9 months ago
Petru, what are you thoughts on this?
Flags: needinfo?(sdaswani) → needinfo?(petru.lingurar)
Comment hidden (Intermittent Failures Robot)
There are 31 occurrences for this failure in the last 7 days. This is happening on android-em-4-3 opt builds.

Any updates on this?
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot)
This has a high occurrence rate, 4/5 retriggers are failing. 

petru, Susheel are there any updates here?
Flags: needinfo?(sdaswani)
Will investigate more on this based on Geoff findings.
Flags: needinfo?(sdaswani)
Flags: needinfo?(petru.lingurar)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Based on a try for this with more logs [1] I see that indeed the test fails because the launcher somehow is the foreground app.
> 11:28:11     INFO -  TEST-START | testActivityStreamPocketReferrer
> 11:28:54     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
> 11:28:54     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
> 11:29:15     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
> 11:29:15     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMTitleChanged
> 11:29:15     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=com.android.launcher
> 11:29:15     INFO -  INFO | automation.py | Application ran for: 0:01:17.485694
> 11:29:15     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpWklY1zpidlog
> 11:29:16     INFO -  /data/tombstones does not exist; tombstone check skipped
> 11:29:17  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)

but this happens before this [2] "NavigationHelper.goBack()" (the first goBack() from the first test method) is called which puts the activity to background.
> 11:29:18     INFO -  08-31 04:28:53.907 D/Robotium( 3715): goBack()
> 11:29:18     INFO -  08-31 04:28:53.907 D/Robotium( 3715): hideSoftKeyboard()
> 11:29:18     INFO -  08-31 04:28:55.278 I/ActivityManager(  277): moveTaskToBack: 15

The behaviour is the same in other failed tests.


[1] https://taskcluster-artifacts.net/IOIsaNhqSTSLieakG_1qbg/0/public/logs/live_backing.log
[2] https://dxr.mozilla.org/mozilla-central/rev/c2e3be6a1dd352b969a45f0b85e87674e24ad284/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testActivityStreamPocketReferrer.java#59
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Going on with the premise that the test presses back, and this action finishes the activity I added a check [1] for
> Tabs.getInstance().getSelectedTab().canDoBack() 
to guard against this scenario and now I see that
- the issue that this ticket refers to does not appear anymore (moveTaskToBack not present in logs anymore)
- the "testActivityStreamPocketReferrer" test can still fail with other intermittent causes [2]
- for each now failing test I see a screenshot with a blank webpage [3]. So maybe there's an issue with how Gecko loads the js file that the test is based on.

[1] https://hg.mozilla.org/try/rev/e7199375391e0271a3e403accf541f27fbf347d7#l1.49
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=42ff35fb2746744a93c82c5b170e2a834fe786f2&selectedJob=197555645
[3] https://taskcluster-artifacts.net/CLTZTengQ-S4ltV7dAV_Pg/0/public/test_info//robocop-screenshot-org.mozilla.gecko.tests.testActivityStreamPocketReferrer.jpg
(In reply to Petru-Mugurel Lingurar[:petru] from comment #27)
> - for each now failing test I see a screenshot with a blank webpage [3]. So
> maybe there's an issue with how Gecko loads the js file that the test is
> based on.

Those robocop screenshots are not able to capture web content: They always show blank content, even when a page has loaded successfully.
Comment hidden (Intermittent Failures Robot)
Trying to find more evidence to support that the .js isn't loaded and that's why pressing back actually closes the activity I logged the result from "Session:GetHistory" [1] and in the one case where the test failed [2] I see
> 11:08:28.995 E/testActivityStreamPocke( 3752): No history!
> [
>              As in other successful tests, it should be
>        History item: {url=http://mochi.test:8888/tests/robocop/robocop_javascript.html?slug=1536599365428&path=testActivityStreamPocketReferrer.js, title=testActivityStreamPocketReferrer.js, selected=true}
>        History item: {url=about:home, title=Nightly Home, selected=false}
> ]
> 11:08:30.746 D/Robotium( 3752): goBack()
> 11:08:32.525 I/ActivityManager(  282): moveTaskToBack: 15

[1] https://hg.mozilla.org/try/rev/ee3c68a16800259c89c89cd318dffcaa0e60bae9#l1.84
[2] https://treeherder.mozilla.org/#/jobs?repo=try&author=plingurar@mozilla.com&selectedJob=198474843

Jim, do you know why something like this would happen? Not really sure how to debug this and how to proceed.
Flags: needinfo?(nchen)

Updated

9 months ago
Whiteboard: [stockwell disable-recommended] → [priority:low][stockwell disable-recommended]
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
I'm not sure of the exact cause. I think maybe the page load isn't complete before we call goBack(). A lot of other tests assert the title of the page has changed [1], so maybe we should do that in this test as well.

[1] https://searchfox.org/mozilla-central/rev/6c82481caa506a240a626bb44a2b8cbe0eedb3a0/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testHistory.java#26
Flags: needinfo?(nchen)
blockForReadyAndLoadJS calls NavigationHelper.enterAndLoadUrl, which calls sToolbar.enterEditingMode().enterUrl(url).commitEditingMode(), the latter of which in turn calls WaitHelper.waitForPageLoad. This looks loke it should wait for some page load events: https://dxr.mozilla.org/mozilla-central/rev/ac9f1219d11bf1a56ec1ace8e3ba9ff113b5cacb/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/helpers/WaitHelper.java#124-125

I might have overlooked something, though, or there might be some bugs in the current implementation?
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
This bug has failed 46 time in the last 7 days. Occurs android-em-4-3-armv7-api16 on opt.

Recent log:
https://treeherder.mozilla.org/logviewer.html#?job_id=202109929&repo=autoland&lineNumber=1501

 INFO -  SimpleTest START
[task 2018-09-28T02:55:47.999Z] 02:55:47     INFO -  TEST-START | testPictureLinkContextMenu
[task 2018-09-28T02:56:10.045Z] 02:56:10     INFO -  GECKO | EventExpecter: no longer listening for Gecko:Ready
[task 2018-09-28T02:56:20.259Z] 02:56:20     INFO -  GECKO | EventExpecter: no longer listening for Content:DOMContentLoaded
[task 2018-09-28T02:56:30.474Z] 02:56:30     INFO -  GECKO | PaintExpecter: no longer listening for events
[task 2018-09-28T02:59:06.642Z] 02:59:06     INFO -  GECKO | EventExpecter: no longer listening for Tab:Added
[task 2018-09-28T02:59:16.862Z] 02:59:16     INFO -  GECKO | EventExpecter: no longer listening for Tab:Added
[task 2018-09-28T02:59:47.507Z] 02:59:47     INFO -  Failed to get top activity, retrying, once...
[task 2018-09-28T02:59:47.935Z] 02:59:47     INFO -  wait for org.mozilla.fennec_aurora complete; top activity=None
[task 2018-09-28T02:59:48.142Z] 02:59:48     INFO -  INFO | automation.py | Application ran for: 0:04:13.487150
[task 2018-09-28T02:59:48.142Z] 02:59:48     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmp7P4bfZpidlog
[task 2018-09-28T02:59:50.934Z] 02:59:50  WARNING -  PROCESS-CRASH | Automation Error: Missing end of test marker (process crashed?)
[task 2018-09-28T02:59:51.643Z] 02:59:51     INFO -  09-27 19:59:43.299 W/Resources( 2523): Preloaded drawable resource #0x10801de (android:drawable/btn_star_on_normal_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.644Z] 02:59:51     INFO -  09-27 19:59:43.308 W/Resources( 2523): Preloaded drawable resource #0x10801da (android:drawable/btn_star_on_disabled_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.645Z] 02:59:51     INFO -  09-27 19:59:43.318 W/Resources( 2523): Preloaded drawable resource #0x10801d0 (android:drawable/btn_star_off_disabled_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.646Z] 02:59:51     INFO -  09-27 19:59:43.328 W/Resources( 2523): Preloaded drawable resource #0x10801e0 (android:drawable/btn_star_on_pressed_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.647Z] 02:59:51     INFO -  09-27 19:59:43.338 W/Resources( 2523): Preloaded drawable resource #0x10801d6 (android:drawable/btn_star_off_pressed_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.648Z] 02:59:51     INFO -  09-27 19:59:43.338 W/Resources( 2523): Preloaded drawable resource #0x10801dc (android:drawable/btn_star_on_focused_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.648Z] 02:59:51     INFO -  09-27 19:59:43.348 W/Resources( 2523): Preloaded drawable resource #0x10801d2 (android:drawable/btn_star_off_focused_holo_light) that varies with configuration!!
[task 2018-09-28T02:59:51.651Z] 02:59:51     INFO -  09-27 19:59:43.358 W/Resources( 2523): Preloaded drawable resource #0x10801d8 (android:drawable/btn_star_on_disabled_focused_holo_light) that varies with configuration!!

sdaswani: Can you please take a look at this bug?
Flags: needinfo?(sdaswani)
Whiteboard: [priority:low][stockwell disable-recommended] → [priority:low][stockwell disable-recommended][stockwell needswork]
Looked at this again today.
The only correlation I see is no history when asking for "Session:GetHistory" but i don't know why that would happen. And only on Android 4.3.
As such, I think this test should be skipped on 4.3.

Updated

8 months ago
Flags: needinfo?(sdaswani)

Comment 40

8 months ago
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9008729b15e5
Disable robocop testActivityStreamPocketReferrer; r=me,a=test-only
Keywords: leave-open
Whiteboard: [priority:low][stockwell disable-recommended][stockwell needswork] → [priority:low][stockwell disabled]

Comment 41

8 months ago
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe4d9985684b
Disable robocop testActivityStreamPocketReferrer, really; r=me,a=test-only
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)

Updated

7 months ago
Priority: P5 → P1
You need to log in before you can comment on or make changes to this bug.