Open Bug 751575 Opened 12 years ago Updated 11 months ago

Permanent failure during test_pb_notification_ipc.js | command timed out: 1200 seconds without output, attempting to kill

Categories

(Core :: DOM: Navigation, defect)

15 Branch
x86
Windows XP
defect

Tracking

()

REOPENED

People

(Reporter: emorley, Unassigned)

References

Details

(Keywords: leave-open, Whiteboard: [purple][leave open])

Attachments

(1 file, 1 obsolete file)

Filing this so I have something to point the tree closure message at.

Earlier there was what appears to have been some infra issues, so a number of windows builds failed, requiring retriggers (which are still running, ETA 1 hour+ until tests complete) before we know the correct regression range for this. Tbh, I can't see that it's anything other than CPG related sadly.

The range we have so far is:
hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f5a3a7b9c6b0&tochange=400c2b30015d

If it is indeed CPG, bholley/luke/jst have asked that CPG not be backed out if we can help it:

bholley_mobile: Really though, we want to avoid a backout if at all possible
bholley_mobile: Cpg touches lots of stuff. Every two days it spends outside the tree something else lands that breaks with it
bholley_mobile: This has happened a bunch of times already
bholley_mobile: So i (and jst) think we should bias towards disabling tests and fixing them in a followup
bholley_mobile: I promise to be super responsive on that front
bholley_mobile: Im just taking today away from the laptop to unwind a bit
bholley_mobile: There are also lots of things like sec bugs blocked on it
bholley_mobile: And ionmonkey
edmorley: I'll file a bug to point the tree closure at and try disabling the test that is causing the suite to abort, hopefully that will be it

--

Similar to bug 717448, except perma-purple and mostly happens during test_pb_notification_ipc.js. Both Win opt and WinXP opt are perma-purple with this.


Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:07:06 PDT for push 400c2b30015d

slave: talos-r3-xp-074

https://tbpl.mozilla.org/php/getParsedLog.php?id=11419727&tree=Mozilla-Inbound

{
TEST-INFO | C:\talos-slave\test\build\xpcshell\tests\docshell\test\unit\test_privacy_transition.js | running test ...
TEST-PASS | C:\talos-slave\test\build\xpcshell\tests\docshell\test\unit\test_privacy_transition.js | test passed (time: 219.000ms)
TEST-INFO | C:\talos-slave\test\build\xpcshell\tests\docshell\test\unit_ipc\test_pb_notification_ipc.js | running test ...

command timed out: 1200 seconds without output, attempting to kill
SIGKILL failed to kill process
using fake rc=-1
program finished with exit code -1

remoteFailed: [Failure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: exceptions.RuntimeError: SIGKILL failed to kill process
]
<snip>
<p class="error">exceptions.RuntimeError: SIGKILL failed to kill process</p>
Traceback (most recent call last):
Failure: exceptions.RuntimeError
}
Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:07:25 PDT for push 400c2b30015d
slave: talos-r3-w7-040
https://tbpl.mozilla.org/php/getParsedLog.php?id=11419750&tree=Mozilla-Inbound
test_writer_starvation.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:11:18 PDT for push 12d1d626759c
slave: talos-r3-xp-021
https://tbpl.mozilla.org/php/getParsedLog.php?id=11419829&tree=Mozilla-Inbound
test_writer_starvation.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:13:57 PDT for push 12d1d626759c
slave: talos-r3-w7-079
https://tbpl.mozilla.org/php/getParsedLog.php?id=11419893&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:39:41 PDT for push a6a335cd2c94
slave: talos-r3-xp-039
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420444&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:38:05 PDT for push a6a335cd2c94
slave: talos-r3-w7-061
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420464&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound pgo test xpcshell on 2012-05-03 07:24:32 PDT for push de5745bce8bc
slave: talos-r3-xp-013
https://tbpl.mozilla.org/php/getParsedLog.php?id=11429138&tree=Mozilla-Inbound
*** test_0000_bootstrap_svc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:43:13 PDT for push de5745bce8bc
slave: talos-r3-xp-037
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420501&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 02:52:47 PDT for push de5745bce8bc
slave: talos-r3-w7-021
https://tbpl.mozilla.org/php/getParsedLog.php?id=11420814&tree=Mozilla-Inbound
test_pb_notification_ipc.js
Depends on: 723541
Summary: Permanent failure during test_pb_notification_ipc.js or test_writer_starvation.js | command timed out: 1200 seconds without output, attempting to kill → Permanent failure during test_pb_notification_ipc.js or test_writer_starvation.js or test_0000_bootstrap_svc.js | command timed out: 1200 seconds without output, attempting to kill
I'm not sure why this is related to bug 723541?
Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:00:26 PDT for push 074c8fb332a8
slave: talos-r3-xp-074
https://tbpl.mozilla.org/php/getParsedLog.php?id=11422366&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:00:12 PDT for push 074c8fb332a8
slave: talos-r3-w7-050
https://tbpl.mozilla.org/php/getParsedLog.php?id=11422383&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:55:07 PDT for push cda84ca70452
slave: talos-r3-xp-050
https://tbpl.mozilla.org/php/getParsedLog.php?id=11424139&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:55:05 PDT for push cda84ca70452
slave: talos-r3-w7-045
https://tbpl.mozilla.org/php/getParsedLog.php?id=11424164&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:42:44 PDT for push d7271f499b8b
slave: talos-r3-xp-061
https://tbpl.mozilla.org/php/getParsedLog.php?id=11423785&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 04:42:36 PDT for push d7271f499b8b
slave: talos-r3-w7-076
https://tbpl.mozilla.org/php/getParsedLog.php?id=11423821&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:27:42 PDT for push d9525fcfdd8b
slave: talos-r3-xp-007
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425245&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:27:41 PDT for push d9525fcfdd8b
slave: talos-r3-w7-041
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425287&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:37:27 PDT for push 47ececd6cb72
slave: talos-r3-xp-004
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425515&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 05:37:30 PDT for push 47ececd6cb72
slave: talos-r3-w7-052
https://tbpl.mozilla.org/php/getParsedLog.php?id=11425565&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:16:17 PDT for push 94b06a04f17b
slave: talos-r3-xp-021
https://tbpl.mozilla.org/php/getParsedLog.php?id=11426619&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:16:12 PDT for push 94b06a04f17b
slave: talos-r3-w7-070
https://tbpl.mozilla.org/php/getParsedLog.php?id=11426634&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:41:44 PDT for push 02b78dbc753f
slave: talos-r3-xp-012
https://tbpl.mozilla.org/php/getParsedLog.php?id=11427396&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 06:40:16 PDT for push 02b78dbc753f
slave: talos-r3-w7-072
https://tbpl.mozilla.org/php/getParsedLog.php?id=11427453&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 5.1 mozilla-inbound opt test xpcshell on 2012-05-03 07:17:15 PDT for push 94913b445e72
slave: talos-r3-xp-017
https://tbpl.mozilla.org/php/getParsedLog.php?id=11428455&tree=Mozilla-Inbound
test_pb_notification_ipc.js

Rev3 WINNT 6.1 mozilla-inbound opt test xpcshell on 2012-05-03 07:17:11 PDT for push 94913b445e72
slave: talos-r3-w7-052
https://tbpl.mozilla.org/php/getParsedLog.php?id=11428464&tree=Mozilla-Inbound
test_pb_notification_ipc.js
(In reply to Brian R. Bondy [:bbondy] from comment #2)
> I'm not sure why this is related to bug 723541?

Because of:

> Rev3 WINNT 5.1 mozilla-inbound pgo test xpcshell on 2012-05-03 07:24:32 PDT
> for push de5745bce8bc
> slave: talos-r3-xp-013
> https://tbpl.mozilla.org/php/getParsedLog.php?id=11429138&tree=Mozilla-
> Inbound
> *** test_0000_bootstrap_svc.js

Out of the ~24 failures, all bar two have been in test_pb_notification_ipc.js, presume the others may just have been the other known intermittent purples occurring by coincidence.
I think it i just coincidence but adding myself to the CC list
Attachment #620727 - Flags: review?(luke)
Going to call the test_writer_starvation.js and test_0000_bootstrap_svc.js instances coincidence/already filed intermittent purple for now.
No longer depends on: 717448, 723541
Summary: Permanent failure during test_pb_notification_ipc.js or test_writer_starvation.js or test_0000_bootstrap_svc.js | command timed out: 1200 seconds without output, attempting to kill → Permanent failure during test_pb_notification_ipc.js | command timed out: 1200 seconds without output, attempting to kill
Comment on attachment 620727 [details] [diff] [review]
Temporarily disable test_pb_notification_ipc.js

Stealing review request, r=jst.
Attachment #620727 - Flags: review?(luke) → review+
Many thanks :-)

Disable test:
https://hg.mozilla.org/integration/mozilla-inbound/rev/90d25e0f6c68
Whiteboard: [purple] → [purple] [leave open]
Seems to have worked :-D 
(I was kinda worried yet more failures were hiding behind the aborted run)

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=90d25e0f6c68
A locally build (non-pgo) opt xpcshell does not reproduce, I'll try the binary try is using.
Ah, it does repro 100% with a PGO xpcshell.
Woohoo, I think I found the culprit: conservative GC.  Changing a
  do_timeout(0, Components.utils.forceGC)
to
  Components.utils.schedulePreciseGC(function(){})
fixes the problem.
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #620883 - Flags: review?(bzbarsky)
Attachment #620883 - Flags: review?(bzbarsky) → review+
Whiteboard: [purple] [leave open] → [purple]
Comment on attachment 620883 [details] [diff] [review]
re-enable and fix test

r=me, fwiw
Oh, fun, it was permafailing on *Windows* PGO before; now it's permafailing on Linux PGO.
(In reply to Phil Ringnalda (:philor) from comment #19)
> Oh, fun, it was permafailing on *Windows* PGO before; now it's permafailing
> on Linux PGO.

Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c2c2046677f
Target Milestone: mozilla15 → ---
No longer blocks: 438871
> Comment on attachment 620883 [details] [diff] [review]
> re-enable and fix test

Merged the re-enable, since inbound was backing up:
https://hg.mozilla.org/mozilla-central/rev/8d220661ef24

The latest backout will be merged to m-c shortly.
<meme>run all the GCs</mem>

On the bright side, this is definitely an issue in how the test is forcing the temporary DocShell to get GC'd, definitely not a CPG bug.  bz/jst: is there a canonical way to ensure the DocShell schedulePreciseGC?  (jdm was expressing doubt that schedulePreciseGC worked on xpcshell, but I didn't understand.)
I'm not sure what the question is, exactly....
How can that test reliably kill that DocShell (the one created at the beginning of destroy_transient_docshell which you can see in the patch context).
I have no idea...  I'd think that a precise GC should do it.
I think ehsan meant https://hg.mozilla.org/mozilla-central/rev/9c2c2046677f

I'm stumped here as to why this docshell wouldn't be cleaned up now... :(
My doubt has to do with the implementation of schedulePreciseGC, which will not actually perform the GC until no JS is running at all. In xpcshell, that never occurs. In tests where nothing happens until the callback occurs, this means the test never completes; in this case, it probably just means that a regular gc ends up being triggered on some platforms and the docshell is destroyed.
Luke, is there a way for us to truly force a full GC to happen exactly when we need one, whether there's JS on the stack or not? IOW, if the JS engine APIs exist to do that we can expose that (if it's not already exposed), and call that here to make this deterministic. I'm happy to help write that patch if those APIs exist.
The problem with a synchronous GC is that it would have to do conservative stack scanning which is the source of non-determinacy here.  schedulePreciseGC would be great if it worked.
Severity: blocker → normal
Assignee: luke → nobody
Disabled the non-ipc version in https://hg.mozilla.org/integration/mozilla-inbound/rev/fb17ffb3bf77 because another unrelated change caused it to start timing out, only on 10.5 debug. Bye-bye test, see you when schedulePreciseGC from xpcshell works.
Attachment #620727 - Attachment is obsolete: true
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE

Re-opening as the test got re-disabled, and hence it is still disabled today and should be investigated at some stage.

Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: INCOMPLETE → ---
Depends on: 1766230
Severity: normal → S3

The leave-open keyword is there and there is no activity for 6 months.
:kmag, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(kmaglione+bmo)

The test is still disabled

Flags: needinfo?(kmaglione+bmo)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: