Permanent failure during test_pb_notification_ipc.js | command timed out: 1200 seconds without output, attempting to kill

RESOLVED INCOMPLETE

Status

()

Core
Document Navigation
RESOLVED INCOMPLETE
6 years ago
2 years ago

People

(Reporter: emorley, Unassigned)

Tracking

15 Branch
x86
Windows XP
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [purple][leave open])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

6 years ago
Filing this so I have something to point the tree closure message at.

Earlier there was what appears to have been some infra issues, so a number of windows builds failed, requiring retriggers (which are still running, ETA 1 hour+ until tests complete) before we know the correct regression range for this. Tbh, I can't see that it's anything other than CPG related sadly.

The range we have so far is:
hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f5a3a7b9c6b0&tochange=400c2b30015d

If it is indeed CPG, bholley/luke/jst have asked that CPG not be backed out if we can help it:

bholley_mobile: Really though, we want to avoid a backout if at all possible
bholley_mobile: Cpg touches lots of stuff. Every two days it spends outside the tree something else lands that breaks with it
bholley_mobile: This has happened a bunch of times already
bholley_mobile: So i (and jst) think we should bias towards disabling tests and fixing them in a followup
bholley_mobile: I promise to be super responsive on that front
bholley_mobile: Im just taking today away from the laptop to unwind a bit
bholley_mobile: There are also lots of things like sec bugs blocked on it
bholley_mobile: And ionmonkey
edmorley: I'll file a bug to point the tree closure at and try disabling the test that is causing the suite to abort, hopefully that will be it

--

Similar to bug 717448, except perma-purple and mostly happens during test_pb_notification_ipc.js. Both Win opt and WinXP opt are perma-purple with this.


Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:07:06 PDT for push 400c2b30015d

slave: talos-r3-xp-074

https://tbpl.mozilla.org/php/getParsedLog.php?id=11419727&tree=Mozilla-Inbound

{
TEST-INFO | C:\talos-slave\test\build\xpcshell\tests\docshell\test\unit\test_privacy_transition.js | running test ...
TEST-PASS | C:\talos-slave\test\build\xpcshell\tests\docshell\test\unit\test_privacy_transition.js | test passed (time: 219.000ms)
TEST-INFO | C:\talos-slave\test\build\xpcshell\tests\docshell\test\unit_ipc\test_pb_notification_ipc.js | running test ...

command timed out: 1200 seconds without output, attempting to kill
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1

remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: exceptions.RuntimeError: SIGKILL failed to kill process
]
<snip>
<p class="error">exceptions.RuntimeError: SIGKILL failed to kill process</p>
Traceback (most recent call last):
Failure: exceptions.RuntimeError
}
(Reporter)

Comment 1

6 years ago
Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:07:25 PDT for push 400c2b30015d
slave: talos-r3-w7-040
https://tbpl.mozilla.org/php/getParsedLog.php?id=11419750&tree=Mozilla-Inbound
test_writer_starvation.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:11:18 PDT for push 12d1d626759c
slave: talos-r3-xp-021
https://tbpl.mozilla.org/php/getParsedLog.php?id=11419829&tree=Mozilla-Inbound
test_writer_starvation.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:13:57 PDT for push 12d1d626759c
slave: talos-r3-w7-079
https://tbpl.mozilla.org/php/getParsedLog.php?id=11419893&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:39:41 PDT for push a6a335cd2c94
slave: talos-r3-xp-039
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420444&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:38:05 PDT for push a6a335cd2c94
slave: talos-r3-w7-061
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420464&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound pgo test xpcshell on 2012-05-03 07:24:32 PDT for push de5745bce8bc
slave: talos-r3-xp-013
https://tbpl.mozilla.org/php/getParsedLog.php?id=11429138&tree=Mozilla-Inbound
*** test_0000_bootstrap_svc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:43:13 PDT for push de5745bce8bc
slave: talos-r3-xp-037
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420501&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:52:47 PDT for push de5745bce8bc
slave: talos-r3-w7-021
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420814&tree=Mozilla-Inbound
test_pb_notification_ipc.js
Depends on: 723541
Summary: Permanent failure during test_pb_notification_ipc.js or test_writer_starvation.js | command timed out: 1200 seconds without output, attempting to kill → Permanent failure during test_pb_notification_ipc.js or test_writer_starvation.js or test_0000_bootstrap_svc.js | command timed out: 1200 seconds without output, attempting to kill
I'm not sure why this is related to bug 723541?
(Reporter)

Comment 3

6 years ago
Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:00:26 PDT for push 074c8fb332a8
slave: talos-r3-xp-074
https://tbpl.mozilla.org/php/getParsedLog.php?id=11422366&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:00:12 PDT for push 074c8fb332a8
slave: talos-r3-w7-050
https://tbpl.mozilla.org/php/getParsedLog.php?id=11422383&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:55:07 PDT for push cda84ca70452
slave: talos-r3-xp-050
https://tbpl.mozilla.org/php/getParsedLog.php?id=11424139&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:55:05 PDT for push cda84ca70452
slave: talos-r3-w7-045
https://tbpl.mozilla.org/php/getParsedLog.php?id=11424164&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:42:44 PDT for push d7271f499b8b
slave: talos-r3-xp-061
https://tbpl.mozilla.org/php/getParsedLog.php?id=11423785&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:42:36 PDT for push d7271f499b8b
slave: talos-r3-w7-076
https://tbpl.mozilla.org/php/getParsedLog.php?id=11423821&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:27:42 PDT for push d9525fcfdd8b
slave: talos-r3-xp-007
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425245&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:27:41 PDT for push d9525fcfdd8b
slave: talos-r3-w7-041
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425287&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:37:27 PDT for push 47ececd6cb72
slave: talos-r3-xp-004
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425515&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:37:30 PDT for push 47ececd6cb72
slave: talos-r3-w7-052
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425565&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:16:17 PDT for push 94b06a04f17b
slave: talos-r3-xp-021
https://tbpl.mozilla.org/php/getParsedLog.php?id=11426619&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:16:12 PDT for push 94b06a04f17b
slave: talos-r3-w7-070
https://tbpl.mozilla.org/php/getParsedLog.php?id=11426634&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:41:44 PDT for push 02b78dbc753f
slave: talos-r3-xp-012
https://tbpl.mozilla.org/php/getParsedLog.php?id=11427396&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:40:16 PDT for push 02b78dbc753f
slave: talos-r3-w7-072
https://tbpl.mozilla.org/php/getParsedLog.php?id=11427453&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 07:17:15 PDT for push 94913b445e72
slave: talos-r3-xp-017
https://tbpl.mozilla.org/php/getParsedLog.php?id=11428455&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 07:17:11 PDT for push 94913b445e72
slave: talos-r3-w7-052
https://tbpl.mozilla.org/php/getParsedLog.php?id=11428464&tree=Mozilla-Inbound
test_pb_notification_ipc.js
(Reporter)

Comment 4

6 years ago
(In reply to Brian R. Bondy [:bbondy] from comment #2)
> I'm not sure why this is related to bug 723541?

Because of:

> Rev3 WINNT 5.1 mozilla-inbound pgo test xpcshell on 2012-05-03 07:24:32 PDT
> for push de5745bce8bc
> slave: talos-r3-xp-013
> https://tbpl.mozilla.org/php/getParsedLog.php?id=11429138&tree=Mozilla-
> Inbound
> *** test_0000_bootstrap_svc.js

Out of the ~24 failures, all bar two have been in test_pb_notification_ipc.js, presume the others may just have been the other known intermittent purples occurring by coincidence.
I think it i just coincidence but adding myself to the CC list
(Reporter)

Comment 6

6 years ago
Ok, retriggers finally come through: range confirmed as CPG:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ac00c792933e&tochange=bed8c4e3dfdf
(Reporter)

Comment 7

6 years ago
Created attachment 620727 [details] [diff] [review]
Temporarily disable test_pb_notification_ipc.js
Attachment #620727 - Flags: review?(luke)
(Reporter)

Comment 8

6 years ago
Going to call the test_writer_starvation.js and test_0000_bootstrap_svc.js instances coincidence/already filed intermittent purple for now.
No longer depends on: 717448, 723541
Summary: Permanent failure during test_pb_notification_ipc.js or test_writer_starvation.js or test_0000_bootstrap_svc.js | command timed out: 1200 seconds without output, attempting to kill → Permanent failure during test_pb_notification_ipc.js | command timed out: 1200 seconds without output, attempting to kill
Comment on attachment 620727 [details] [diff] [review]
Temporarily disable test_pb_notification_ipc.js

Stealing review request, r=jst.
Attachment #620727 - Flags: review?(luke) → review+
(Reporter)

Comment 10

6 years ago
Many thanks :-)

Disable test:
https://hg.mozilla.org/integration/mozilla-inbound/rev/90d25e0f6c68
Whiteboard: [purple] → [purple] [leave open]
(Reporter)

Comment 11

6 years ago
Seems to have worked :-D 
(I was kinda worried yet more failures were hiding behind the aborted run)

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=90d25e0f6c68

Comment 12

6 years ago
A locally build (non-pgo) opt xpcshell does not reproduce, I'll try the binary try is using.

Comment 13

6 years ago
Ah, it does repro 100% with a PGO xpcshell.

Comment 14

6 years ago
Woohoo, I think I found the culprit: conservative GC.  Changing a
  do_timeout(0, Components.utils.forceGC)
to
  Components.utils.schedulePreciseGC(function(){})
fixes the problem.

Comment 15

6 years ago
Created attachment 620883 [details] [diff] [review]
re-enable and fix test
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #620883 - Flags: review?(bzbarsky)

Updated

6 years ago
Attachment #620883 - Flags: review?(bzbarsky) → review+

Updated

6 years ago
Whiteboard: [purple] [leave open] → [purple]
Oh, fun, it was permafailing on *Windows* PGO before; now it's permafailing on Linux PGO.
(Reporter)

Comment 22

6 years ago
(In reply to Phil Ringnalda (:philor) from comment #19)
> Oh, fun, it was permafailing on *Windows* PGO before; now it's permafailing
> on Linux PGO.

Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c2c2046677f
Target Milestone: mozilla15 → ---

Updated

6 years ago
No longer blocks: 438871
(Reporter)

Comment 24

6 years ago
> Comment on attachment 620883 [details] [diff] [review]
> re-enable and fix test

Merged the re-enable, since inbound was backing up:
https://hg.mozilla.org/mozilla-central/rev/8d220661ef24

The latest backout will be merged to m-c shortly.

Comment 25

6 years ago
<meme>run all the GCs</mem>

On the bright side, this is definitely an issue in how the test is forcing the temporary DocShell to get GC'd, definitely not a CPG bug.  bz/jst: is there a canonical way to ensure the DocShell schedulePreciseGC?  (jdm was expressing doubt that schedulePreciseGC worked on xpcshell, but I didn't understand.)
I'm not sure what the question is, exactly....

Comment 27

6 years ago
How can that test reliably kill that DocShell (the one created at the beginning of destroy_transient_docshell which you can see in the patch context).
I have no idea...  I'd think that a precise GC should do it.
I think ehsan meant https://hg.mozilla.org/mozilla-central/rev/9c2c2046677f

I'm stumped here as to why this docshell wouldn't be cleaned up now... :(
My doubt has to do with the implementation of schedulePreciseGC, which will not actually perform the GC until no JS is running at all. In xpcshell, that never occurs. In tests where nothing happens until the callback occurs, this means the test never completes; in this case, it probably just means that a regular gc ends up being triggered on some platforms and the docshell is destroyed.
Luke, is there a way for us to truly force a full GC to happen exactly when we need one, whether there's JS on the stack or not? IOW, if the JS engine APIs exist to do that we can expose that (if it's not already exposed), and call that here to make this deterministic. I'm happy to help write that patch if those APIs exist.

Comment 34

6 years ago
The problem with a synchronous GC is that it would have to do conservative stack scanning which is the source of non-determinacy here.  schedulePreciseGC would be great if it worked.
(Reporter)

Updated

6 years ago
Severity: blocker → normal

Updated

6 years ago
Assignee: luke → nobody
Disabled the non-ipc version in https://hg.mozilla.org/integration/mozilla-inbound/rev/fb17ffb3bf77 because another unrelated change caused it to start timing out, only on 10.5 debug. Bye-bye test, see you when schedulePreciseGC from xpcshell works.
(Reporter)

Updated

5 years ago
Attachment #620727 - Attachment is obsolete: true
(Reporter)

Updated

3 years ago
Status: ASSIGNED → NEW
(Reporter)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.