Closed
Bug 847558
Opened 12 years ago
Closed 11 years ago
on amazon ec2 vm's browser_dbg_createChrome.js hangs (timed out) on opt builds in _MD_WaitUnixProcess or nsProcess::Kill
Categories
(DevTools :: Debugger, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
Firefox 31
People
(Reporter: jmaher, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
3.17 KB,
patch
|
Details | Diff | Splinter Review | |
1.06 KB,
patch
|
armenzg
:
review+
past
:
feedback+
|
Details | Diff | Splinter Review |
we had fixed browser_dbg_createChrome.js to work on our VM images, now we are experiencing a crash while running the test. This doesn't happen in debug builds, only in opt builds.
Here is a link to the failing testlog:
https://tbpl.mozilla.org/php/getParsedLog.php?id=20299323&tree=Cedar
You can always look for browser-chrome test failures on Ubuntu machines on the Cedar branch via tbpl:
https://tbpl.mozilla.org/?tree=Cedar
If there are suggestions for debugging this, working around this, or fixing it, I can reproduce this while working on a VM.
Updated•12 years ago
|
Depends on: 860349
Summary: on amazon ec2 vm's browser_dbg_createChrome.js crashes on opt builds → on amazon ec2 vm's browser_dbg_createChrome.js hangs (timed out) on opt builds
Comment 1•12 years ago
|
||
landed a potential fix on m-c for bug 860349. It needs a merge / landing.
Comment 2•12 years ago
|
||
leave this open for a bit, but first run doesn't have this error in it.
Updated•12 years ago
|
Assignee: nobody → past
Status: NEW → ASSIGNED
Priority: -- → P1
Comment 3•12 years ago
|
||
Added some more logging around the critical part of the code and I expect to get a better understanding of the failure when it reaches Cedar:
https://hg.mozilla.org/integration/fx-team/rev/b63a15eddd36
Updated•12 years ago
|
Whiteboard: [leave open]
Comment 4•12 years ago
|
||
Can we please land a change on mozilla-central to disable the test while we fix it on Cedar?
This will help me move load from Fedora machines to Ubuntu faster and re-purpose the rev3 minis as Win7 to help with wait times.
Thanks in advance!
Reporter | ||
Comment 5•12 years ago
|
||
if we disable this test, we can get the browser chrome tests running (verified 5 runs in a row on my loaner ubuntu test slave). Once there, we can actually test the fixes on try server which is much faster turnaround time than m-c -> cedar.
:past, please speak up if this doesn't sound good.
Attachment #745440 -
Flags: superreview?(past)
Attachment #745440 -
Flags: review?(armenzg)
Reporter | ||
Comment 6•12 years ago
|
||
Attachment #745440 -
Attachment is obsolete: true
Attachment #745440 -
Flags: superreview?(past)
Attachment #745440 -
Flags: review?(armenzg)
Attachment #745517 -
Flags: superreview?(past)
Attachment #745517 -
Flags: review?(armenzg)
Reporter | ||
Updated•12 years ago
|
Attachment #745517 -
Flags: superreview?(past) → feedback?(past)
Comment 7•12 years ago
|
||
Comment on attachment 745517 [details] [diff] [review]
disable test for linux only (1.1)
Review of attachment 745517 [details] [diff] [review]:
-----------------------------------------------------------------
I'm OK with it, but would it be too much to do a merge with the current patch first, since it's already in (or even a transplant)? I'm on vacation currently, otherwise I would have merged it myself.
Attachment #745517 -
Flags: feedback?(past) → feedback+
Comment 8•12 years ago
|
||
Comment 9•12 years ago
|
||
Comment on attachment 745517 [details] [diff] [review]
disable test for linux only (1.1)
Review of attachment 745517 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks!
Attachment #745517 -
Flags: review?(armenzg) → review+
Comment 10•12 years ago
|
||
Comment 11•12 years ago
|
||
It looks like trying to kill the browser debugger process blocks for some reason:
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/DebuggerUI.jsm#498
nsIProcess::Kill is here:
http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsProcessCommon.cpp#557
Can anyone see what is special about our Ubuntu VMs regarding this code?
Reporter | ||
Comment 12•12 years ago
|
||
ted: any ideas on Comment 11 ?
Comment 13•12 years ago
|
||
Can we view a stack trace?
Reporter | ||
Comment 14•12 years ago
|
||
here is what happens:
https://tbpl.mozilla.org/php/getParsedLog.php?id=22945243&tree=Cedar&full=1#error0
We see it timeout, then we kill the process and get a stack trace. I am not sure if that stack trace is very useful though.
Comment 15•12 years ago
|
||
_MD_WaitUnixProcess is waiting [1] for WaitPidDaemonThread() which is waiting for notification that the child has terminated [2].
[1] http://hg.mozilla.org/mozilla-central/annotate/eeb89381c7a3/nsprpub/pr/src/md/unix/uxproces.c#l824
[2] http://hg.mozilla.org/mozilla-central/annotate/eeb89381c7a3/nsprpub/pr/src/md/unix/uxproces.c#l612
I wondered about bug 705543, but I don't see evidence of the IcedTea java plugin being loaded.
Perhaps something else is handling SIGCHLD (bug 705543 comment 8).
There are a number of warning messages in the log like
WARNING: waitpid failed pid:2149 errno:10: file ../../../ipc/chromium/src/base/process_util_posix.cc, line 260
That may be due to bug 227246 issue 2, or perhaps something else is calling wait() or waitpid(-1,,).
Depends on: 678369
Comment 16•12 years ago
|
||
I believe the log from comment 14 is wrong, it belongs to a failure in a different test. Here is one of the few with the additional logging for this bug before the test was disabled:
https://tbpl.mozilla.org/php/getParsedLog.php?id=22707338&full=1&branch=cedar#error0
Comment 17•12 years ago
|
||
That looks like a similar problem, with one extra level of indirection.
The main thread is waiting for the nsProcess::Monitor() thread
2 libnspr4.so!PR_JoinThread [ptthread.c:95e8754a36dc : 580 + 0xd]
rip = 0x00007fca3dbeb90e rsp = 0x00007fffd81a87d0
rbp = 0x00007fffd81a87d8
Found by: stack scanning
3 libxul.so!nsProcess::Kill() [nsProcessCommon.cpp:95e8754a36dc : 581 + 0x8]
rip = 0x00007fca3acd118d rsp = 0x00007fffd81a87f0
Found by: stack scanning
which is in _MD_WaitUnixProcess:
Thread 37
1 libnspr4.so!PR_WaitCondVar [ptsynch.c:95e8754a36dc : 385 + 0xb]
rip = 0x00007fca3dbe5b2e rsp = 0x00007fc9ff5fedd0
Found by: stack scanning
2 libnspr4.so!_MD_WaitUnixProcess [uxproces.c:95e8754a36dc : 824 + 0xd]
rip = 0x00007fca3dbee304 rsp = 0x00007fc9ff5fedf0
Found by: stack scanning
3 libxul.so!nsProcess::Monitor(void*) [nsProcessCommon.cpp:95e8754a36dc : 264 + 0x8]
rip = 0x00007fca3acd13ff rsp = 0x00007fc9ff5fee20
Found by: stack scanning
Thread 32
1 libnspr4.so!_pr_poll_with_poll [ptio.c:95e8754a36dc : 3920 + 0xe]
rip = 0x00007fca3dbe6a7f rsp = 0x00007fca0cbfdb20
Found by: stack scanning
8 libnspr4.so!WaitPidDaemonThread [uxproces.c:95e8754a36dc : 612 + 0x11]
rip = 0x00007fca3dbede22 rsp = 0x00007fca0cbfddb0
Found by: stack scanning
Updated•12 years ago
|
Summary: on amazon ec2 vm's browser_dbg_createChrome.js hangs (timed out) on opt builds → on amazon ec2 vm's browser_dbg_createChrome.js hangs (timed out) on opt builds in _MD_WaitUnixProcess or nsProcess::Kill
Reporter | ||
Comment 18•11 years ago
|
||
we haven't been running this for 6+ months, should we work on fixing this?
Comment 19•11 years ago
|
||
I wouldn't bother starting the tests running again if no one is working on fixing this.
The priority for fixing this depends on whether the r3 slaves are wanted for something else, though they also seem a bit slower.
If I were permitted time for this, I'd fix bug 678369, which would either make this problem go away or turn into a different problem.
Running strace -e trace=process -e trace=signal (or -e trace=process,signal if that is the right usage) might be helpful to identify which tests are potentially involved. That could detect when and if something else stealing the SIGCHLD handler, but, if that is not happening, we'd really need stacks to know whether something else is using wait or waitpid incorrectly.
It would also be interesting to know whether this tests hangs when run on its own, or narrow down which previous test has caused the problem.
Comment 20•11 years ago
|
||
FTR the r3 minis are not wanted. We're just waiting to move tests off them so we can decommission them.
Comment 21•11 years ago
|
||
I'm not actually working on this.
Assignee: past → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
Comment 22•11 years ago
|
||
should we close this?
Reporter | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Resolution: FIXED → INCOMPLETE
Whiteboard: [leave open]
Comment 23•11 years ago
|
||
Sounds like bug 850101 is a priority again.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Comment 24•11 years ago
|
||
The test was enabled again in all platforms here (bug 918507):
https://hg.mozilla.org/mozilla-central/rev/54e8f6d1410a
It seems to work so far, so I'm closing this bug.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 25•11 years ago
|
||
Probably fixed by bug 943174.
Depends on: bad-waitpid
Target Milestone: --- → Firefox 31
Updated•7 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•