Closed
Bug 1004668
Opened 11 years ago
Closed 10 years ago
Intermittent Android 4.0 robocop-5 shutdown hang following testUITelemetry
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox30 unaffected, firefox31 fixed, firefox32 fixed, firefox-esr24 unaffected)
RESOLVED
FIXED
Firefox 32
Tracking | Status | |
---|---|---|
firefox30 | --- | unaffected |
firefox31 | --- | fixed |
firefox32 | --- | fixed |
firefox-esr24 | --- | unaffected |
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file, 1 obsolete file)
3.15 KB,
patch
|
rnewman
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1004657 +++
Breaking out the Android panda instances from bug 918759 in robocop-5.
These seem to always follow testUITelemetry, which was disabled on Android 2.2 for a different error.
eg:
(In reply to TBPL Robot from comment #2789)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=38749817&tree=Fx-Team
> Android 4.0 Panda fx-team opt test robocop-5 on 2014-04-29 16:11:17
> revision: 3e0ff6c29303
> slave: panda-0613
>
> TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds
> with no output
> Return code: 2
(In reply to TBPL Robot from comment #2805)
> philor
> https://tbpl.mozilla.org/php/getParsedLog.php?id=38852462&tree=Mozilla-
> Inbound
> Android 4.0 Panda mozilla-inbound opt test robocop-5 on 2014-04-30 21:00:15
> revision: 658330efb152
> slave: panda-0519
>
> TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds
> with no output
> Return code: 2
Assignee | ||
Comment 1•11 years ago
|
||
After the test ends, there is a service timeout, then an anr...
16:38:42 INFO - 04-29 16:28:04.578 I/Robocop ( 2883): 44 INFO SimpleTest FINISHED
16:38:42 INFO - 04-29 16:28:04.578 I/GeckoDump( 2883): Robocop:Quit received -- requesting quit
16:38:42 INFO - 04-29 16:28:04.882 D/GeckoThumbnailHelper( 2883): Using new thumbnail size: 186472 (width 286)
16:38:42 INFO - 04-29 16:28:04.882 D/GeckoThumbnailHelper( 2883): Sending thumbnail event: 286, 163
...
16:38:42 INFO - 04-29 16:32:23.843 W/ActivityManager( 1402): Timeout executing service: ServiceRecord{4159cd90 org.mozilla.fennec/org.mozilla.gecko.background.healthreport.upload.HealthReportUploadService}
16:38:42 INFO - 04-29 16:32:23.874 I/Process ( 1402): Sending signal. PID: 2883 SIG: 3
16:38:42 INFO - 04-29 16:32:24.117 I/Process ( 1402): Sending signal. PID: 1402 SIG: 3
16:38:42 INFO - 04-29 16:32:24.117 I/dalvikvm( 1402): threadid=3: reacting to signal 3
16:38:42 INFO - 04-29 16:32:24.132 I/dalvikvm( 1402): Wrote stack traces to '/data/anr/traces.txt'
16:38:42 INFO - 04-29 16:32:24.148 I/Process ( 1402): Sending signal. PID: 1602 SIG: 3
16:38:42 INFO - 04-29 16:32:24.148 I/dalvikvm( 1602): threadid=3: reacting to signal 3
16:38:42 INFO - 04-29 16:32:24.164 I/dalvikvm( 1602): Wrote stack traces to '/data/anr/traces.txt'
16:38:42 INFO - 04-29 16:32:24.164 I/Process ( 1402): Sending signal. PID: 1500 SIG: 3
16:38:42 INFO - 04-29 16:32:24.164 I/dalvikvm( 1500): threadid=3: reacting to signal 3
16:38:42 INFO - 04-29 16:32:24.164 I/dalvikvm( 1500): Wrote stack traces to '/data/anr/traces.txt'
16:38:42 INFO - 04-29 16:32:25.351 I/Process ( 1402): Sending signal. PID: 1894 SIG: 3
16:38:42 INFO - 04-29 16:32:25.351 I/dalvikvm( 1894): threadid=3: reacting to signal 3
16:38:42 INFO - 04-29 16:32:25.359 I/dalvikvm( 1894): Wrote stack traces to '/data/anr/traces.txt'
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): ANR in org.mozilla.fennec
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): Reason: Executing service org.mozilla.fennec/org.mozilla.gecko.background.healthreport.upload.HealthReportUploadService
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): Load: 0.01 / 0.16 / 0.15
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): CPU usage from 18771ms to 0ms ago with 99% awake:
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 2.2% 1894/com.mozilla.SUTAgentAndroid: 1.9% user + 0.3% kernel / faults: 46 minor
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 0.5% 1291/surfaceflinger: 0.2% user + 0.2% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 0.2% 1402/system_server: 0% user + 0.1% kernel / faults: 2 minor
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 0.1% 4/kworker/0:0: 0% user + 0.1% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 0% 325/sync_supers: 0% user + 0% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 0% 2872/app_process: 0% user + 0% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 0.3% TOTAL: 0.2% user + 0% kernel + 0% iowait
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): CPU usage from 917ms to 1472ms later:
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 5.4% 1402/system_server: 0% user + 5.4% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 5.4% 1416/ActivityManager: 0% user + 5.4% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 1.8% 1894/com.mozilla.SUTAgentAndroid: 1.8% user + 0% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 3.6% 1961/RedirOutputThre: 3.6% user + 0% kernel
16:38:42 INFO - 04-29 16:32:25.367 E/ActivityManager( 1402): 5% TOTAL: 2% user + 3% kernel
Comment 2•11 years ago
|
||
(Tweaking summary to reduce mis-stars later :-))
Summary: Intermittent Android 4.0 TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output...following testUITelemetry → Intermittent Android 4.0 *robocop-5* TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output...following testUITelemetry
Comment 3•11 years ago
|
||
Well, if the machines don't have network access to the FHR upload endpoints, then the upload service will time out. That's expected. The uploader will run unless you've opted out of upload, but it shouldn't be running soon after an install…
Document generation might block one of the browser threads, but we shouldn't be blocking in that way while we time out on upload.
Comment 4•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #3)
> Well, if the machines don't have network access to the FHR upload endpoints,
> then the upload service will time out. That's expected.
We should be preffing off FHR rather than relying on network blocks; filed bug 1004682.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 9•11 years ago
|
||
There is a slight variation on Comment 1 in some runs:
https://tbpl.mozilla.org/php/getParsedLog.php?id=38998335&tree=Try&full=1#error0
10:27:13 INFO - 05-03 10:25:21.906 W/ActivityManager( 1403): Timeout executing service: ServiceRecord{415218c0 org.mozilla.fennec/org.mozilla.gecko.background.announcements.AnnouncementsService}
10:27:13 INFO - 05-03 10:25:21.937 I/Process ( 1403): Sending signal. PID: 2344 SIG: 3
10:27:13 INFO - 05-03 10:25:22.249 I/Process ( 1403): Sending signal. PID: 1403 SIG: 3
10:27:13 INFO - 05-03 10:25:22.249 I/dalvikvm( 1403): threadid=3: reacting to signal 3
10:27:13 INFO - 05-03 10:25:22.289 I/dalvikvm( 1403): Wrote stack traces to '/data/anr/traces.txt'
10:27:13 INFO - 05-03 10:25:22.296 I/Process ( 1403): Sending signal. PID: 1572 SIG: 3
10:27:13 INFO - 05-03 10:25:22.296 I/dalvikvm( 1572): threadid=3: reacting to signal 3
10:27:13 INFO - 05-03 10:25:22.296 I/dalvikvm( 1572): Wrote stack traces to '/data/anr/traces.txt'
10:27:13 INFO - 05-03 10:25:22.296 I/Process ( 1403): Sending signal. PID: 1502 SIG: 3
10:27:13 INFO - 05-03 10:25:22.304 I/dalvikvm( 1502): threadid=3: reacting to signal 3
10:27:13 INFO - 05-03 10:25:22.312 I/dalvikvm( 1502): Wrote stack traces to '/data/anr/traces.txt'
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): ANR in org.mozilla.fennec
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): Reason: Executing service org.mozilla.fennec/org.mozilla.gecko.background.announcements.AnnouncementsService
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): Load: 0.0 / 0.04 / 0.08
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): CPU usage from 16707ms to 0ms ago:
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 1.1% 1897/com.mozilla.SUTAgentAndroid: 1% user + 0.1% kernel / faults: 62 minor
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0.1% 1403/system_server: 0.1% user + 0% kernel / faults: 1 minor
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0% 1292/surfaceflinger: 0% user + 0% kernel
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0% 1359/kworker/0:2: 0% user + 0% kernel
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0% 2001/com.android.gallery3d: 0% user + 0% kernel / faults: 7 minor
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0% 2016/com.svox.pico: 0% user + 0% kernel / faults: 33 minor
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0% 2333/app_process: 0% user + 0% kernel
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 0.3% TOTAL: 0.2% user + 0% kernel + 0% iowait
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): CPU usage from 1020ms to 1562ms later:
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 7.4% 1403/system_server: 0% user + 7.4% kernel / faults: 2 minor
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 7.4% 1417/ActivityManager: 0% user + 7.4% kernel
10:27:13 INFO - 05-03 10:25:23.507 E/ActivityManager( 1403): 4.2% TOTAL: 3.1% user + 1% kernel
Notice org.mozilla.gecko.background.announcements.AnnouncementsService instead of org.mozilla.gecko.background.healthreport.upload.HealthReportUploadService.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 20•11 years ago
|
||
I was not able to determine the cause of the ANRs, so took a closer look at the test itself. Surprise -- most of the test is not running!
"let type = m[0]" results in type being undefined, so the rest of the test does not check anything: All of the checks for expected events and sessions simply do not run.
Correcting for type turned up a couple of other oversights (s/method/reason, add TEST_PREFIX to sessions), presumably never caught because that code never ran?
Correcting those problems sometimes gave me a clean run on Android 4.0; in the intermittent failed cases, we received "home.1" as a session.
If I allow for "home.1", the test seems to pass consistently on Android 4.0 and Android 4.2 x86, with no sign of ANRs or other problems: https://tbpl.mozilla.org/php/getParsedLog.php?id=39223869&tree=Try&full=1.
But on Android 2.2 and Android 2.3, there are *still* frequent failures on session because both "home.1" and "homepanel.1.<guid>" are received: https://tbpl.mozilla.org/php/getParsedLog.php?id=39225984&tree=Try&full=1.
Is there a better way to handle "home.1" and "homepanel.1.xxx>, or shall I continue to hack?
Attachment #8418920 -
Flags: feedback?(rnewman)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 25•11 years ago
|
||
*shakes fist at sola*
Comment 26•11 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #20)
Thanks for doing the sleuthing!
> "let type = m[0]" results in type being undefined, so the rest of the test
> does not check anything: All of the checks for expected events and sessions
> simply do not run.
Why so? That bit of code didn't change from when I wrote this, and it worked then... each measurement should be
["event", ...]
so m[0] should be correct, no?
> Correcting those problems sometimes gave me a clean run on Android 4.0; in
> the intermittent failed cases, we received "home.1" as a session.
The test is racing with the real app's UI telemetry.
> Is there a better way to handle "home.1" and "homepanel.1.xxx>, or shall I
> continue to hack?
That set will extend over time, so we shouldn't hard-code to accept those. The right thing to do is to -- as with events -- filter out sessions that don't start with the test prefix, and add the test prefix to the sessions the test creates.
Assignee | ||
Comment 27•11 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #26)
> (In reply to Geoff Brown [:gbrown] from comment #20)
>
> > "let type = m[0]" results in type being undefined, so the rest of the test
> > does not check anything: All of the checks for expected events and sessions
> > simply do not run.
>
> Why so? That bit of code didn't change from when I wrote this, and it worked
> then... each measurement should be
>
> ["event", ...]
>
> so m[0] should be correct, no?
I'm sure you know this code better than I do, but I read it as:
- measurements is the return value of getUIMeasurements() (filtered, but of the same form), which looks to me like an array of objects:
eg. measurements = [
{type: "session", name: "blah", reason: "blah", ...},
{type: "event", action: "blah", method: "blah", ...},
...
]
- m = measurements[i], an object
eg. m = {type: "session", name: "blah", reason: "blah" ...}
-> m.type == "session" / m[0] == ??
Comment 28•11 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #27)
> of the same form), which looks to me like an array of objects:
Good catch! I think events started life as arrays, and ended up as objects... and the test continued to pass, so wasn't fixed.
Damn you, dynamic languages.
Comment 29•11 years ago
|
||
Comment on attachment 8418920 [details] [diff] [review]
work in progress
Review of attachment 8418920 [details] [diff] [review]:
-----------------------------------------------------------------
::: mobile/android/base/tests/testUITelemetry.js
@@ +79,5 @@
> if (type == "event") {
> let [type, action, method, sessions, extras] = expected[i];
> do_check_eq(m.action, action);
> do_check_eq(m.method, method);
> + do_check_session_array(m.sessions, sessions);
I'd prefer simply:
do_check_array_eq(m.sessions.filter(TEST_REGEX.test), sessions);
Attachment #8418920 -
Flags: feedback?(rnewman) → feedback+
Updated•11 years ago
|
Status: NEW → ASSIGNED
Hardware: ARM → All
Assignee | ||
Comment 30•11 years ago
|
||
Incorporated your suggestion for sessions -- thanks!
Latest try run looks very solid and resolves bug 993813 as a bonus:
https://tbpl.mozilla.org/?tree=Try&rev=98387b9caf43
Attachment #8418920 -
Attachment is obsolete: true
Attachment #8419883 -
Flags: review?(rnewman)
Comment 31•11 years ago
|
||
Comment on attachment 8419883 [details] [diff] [review]
update testUITelemetry
Review of attachment 8419883 [details] [diff] [review]:
-----------------------------------------------------------------
::: mobile/android/base/tests/testUITelemetry.js
@@ +68,5 @@
> do_check_eq(m.action, action);
> do_check_eq(m.method, method);
> + // might receive real sessions in addition to the test ones -- remove the real ones
> + let test_sessions = m.sessions.filter(function(s) {
> + return TEST_REGEX.test(s);
You can just use TEST_REGEX.test as the argument.
Attachment #8419883 -
Flags: review?(rnewman) → review+
Assignee | ||
Comment 32•11 years ago
|
||
Assignee | ||
Comment 33•11 years ago
|
||
Backed-out: https://hg.mozilla.org/integration/mozilla-inbound/rev/c80cc079cbc8 for a silly mistake. Will correct when I am more alert.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 35•11 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 41•11 years ago
|
||
(In reply to TBPL Robot from comment #40)
> RyanVM
> https://tbpl.mozilla.org/php/getParsedLog.php?id=39379587&tree=Mozilla-Inbound
> Android 4.0 Panda mozilla-inbound opt test robocop-5 on 2014-05-09 12:04:31
It seems my change does not eliminate the ANR + time out -- more work to do here.
Whiteboard: [leave open]
Comment 42•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [leave open]
Target Milestone: --- → Firefox 32
Assignee | ||
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 45•11 years ago
|
||
(mcMerge now looks primarily for the keyword "leave-open" rather than the whiteboard value; though don't know why the backward compat support for the whiteboard didn't kick in)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 48•11 years ago
|
||
We got the test running better, but the shutdown hang persists. I don't know why.
I won't be able to get back to this before May 22nd.
Assignee | ||
Updated•11 years ago
|
Assignee: gbrown → nobody
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 51•11 years ago
|
||
Another possible dup of bug 924622 (landed 05/23)?
Assignee: nobody → gbrown
Comment 52•10 years ago
|
||
Let's call the shutdown hang fixed by bug 924622 & the test cleanup fixed by the comment 35 landing.
As such think we're all done here :-)
(Tweaking summary to avoid future TBPL false positives)
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Summary: Intermittent Android 4.0 *robocop-5* TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output...following testUITelemetry → Intermittent Android 4.0 robocop-5 shutdown hang following testUITelemetry
Comment 53•10 years ago
|
||
status-firefox30:
--- → unaffected
status-firefox31:
--- → fixed
status-firefox32:
--- → fixed
status-firefox-esr24:
--- → unaffected
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•