Closed
Bug 924771
Opened 11 years ago
Closed 8 years ago
Some Mac builds always hit test_browserElement_oop_CloseFromOpener.html,test_browserElement_oop_CloseApp.html,test_browserElement_oop_ExposableURI.html | application timed out after 330 seconds with no output
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
e10s | - | --- |
People
(Reporter: cbook, Unassigned)
References
()
Details
(Keywords: intermittent-failure)
Rev4 MacOSX Lion 10.7 mozilla-inbound debug test mochitest-2 on 2013-10-08 14:57:10 PDT for push 7cd3fde42755 slave: talos-r4-lion-067 https://tbpl.mozilla.org/php/getParsedLog.php?id=28844636&tree=Mozilla-Inbound TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_oop_CloseFromOpener.html | application timed out after 330 seconds with no output
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 10•11 years ago
|
||
Markus, this sounds familiar to issues you've run into lately.
Flags: needinfo?(mstange)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 33•11 years ago
|
||
Yes, it does sound very familiar! Thanks for letting me know.
Flags: needinfo?(mstange)
Comment 34•11 years ago
|
||
Josh, can you find somebody who can look into this? I think the important part in the logs is this one: > WARNING: parent WaitForMessage() failed: 0x10004003 (ipc/rcv) timed out: file ../../../ipc/glue/GeckoChildProcessHost.cpp, line 677 and shortly after that: > WARNING: child WaitForMessage() failed: file ../../../toolkit/xre/nsEmbedFunctions.cpp, line 361 The relevant code was added by you in bug 592951. This issue seems extremely timing-dependent. It's reproducible 100% of the time on 10.7 debug mochitest-2 when pushing inbound revision http://hg.mozilla.org/integration/mozilla-inbound/rev/17411d6192e4 to try. But whenever I add any instrumentation logging around it, it no longer occurs. (I've made about 10 pushes to try now, trying to debug this...) This bug is keeping me from landing the patches in bug 886999 which would fix OMTC crashes and would allow us to enable OMTC by default on 10.6. Unfortunately, I haven't been able to reproduce it locally. I've tried on multiply machines, both with 10.6 and 10.7, with the exact builds that were produced by the try builders. Maybe you have an idea where to look and or what could be going wrong?
Blocks: 886999
Flags: needinfo?(joshmoz)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 40•11 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #34) > Maybe you have an idea where to look and or what could be going wrong? I don't recall anything about that code really, I'd have to start all over again to get into it.
Flags: needinfo?(joshmoz)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 45•11 years ago
|
||
Now it's holding one or both of https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=f45b84411aae hostage too, and inbound is closed while we wait for someone to decide whether to back out even more patches because this test is far too fragile, or to just disable the test on OS X.
Severity: normal → blocker
Comment 46•11 years ago
|
||
Gah, and it wasn't actually that push that started it up, either.
Comment 47•11 years ago
|
||
In the log, we have: 08:19:38 INFO - WARNING: child WaitForMessage() failed: file ../../../toolkit/xre/nsEmbedFunctions.cpp, line 361 08:19:38 INFO - WARNING: NS_ENSURE_SUCCESS(rv, 1) failed with result 0x80004005: file ../../../ipc/app/MozillaRuntimeMain.cpp, line 117 Which means this function is failing somehow <http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/chrome/common/mach_ipc_mac.mm#237>. This debugging patch should help figure out what is causing the failures: https://hg.mozilla.org/integration/mozilla-inbound/rev/5373095214d9
Component: DOM → IPC
Comment 48•11 years ago
|
||
Fascinating, this episode was brought to you by https://hg.mozilla.org/integration/mozilla-inbound/rev/dc2b71e57211 and needs-clobber - failed 60-80% while that was in and unclobbered, and zero of twenty runs when it relanded with a clobber.
Severity: blocker → normal
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 51•11 years ago
|
||
16:49:50 INFO - 3020 INFO TEST-START | /tests/dom/browser-element/mochitest/test_browserElement_oop_CloseFromOpener.html 16:49:50 INFO - ++DOMWINDOW == 61 (0x13f6e0c08) [serial = 2029] [outer = 0x115ce0528] 16:49:50 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file ../../../../embedding/browser/webBrowser/nsWebBrowser.cpp, line 366 16:49:50 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(domWindow) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 86 16:49:50 INFO - WaitForMessage returned 268451843 16:49:50 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file ../../../../embedding/browser/webBrowser/nsWebBrowser.cpp, line 366 16:49:50 INFO - [Parent 1262] WARNING: parent WaitForMessage() failed: 0x10004003 (ipc/rcv) timed out: file ../../../ipc/glue/GeckoChildProcessHost.cpp, line 677 16:49:50 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(domWindow) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 86 16:49:50 INFO - --DOMWINDOW == 2 (0x1062f4e98) [serial = 2] [outer = 0x1057d8378] [url = about:blank] 16:49:50 INFO - WARNING: XPCOM objects created/destroyed from static ctor/dtor: file ../../../xpcom/base/nsTraceRefcntImpl.cpp, line 138 16:49:50 INFO - WARNING: XPCOM objects created/destroyed from static ctor/dtor: file ../../../xpcom/base/nsTraceRefcntImpl.cpp, line 138 16:49:50 INFO - WARNING: XPCOM objects created/destroyed from static ctor/dtor: file ../../../xpcom/base/nsTraceRefcntImpl.cpp, line 138 16:49:50 INFO - WARNING: XPCOM objects created/destroyed from static ctor/dtor: file ../../../xpcom/base/nsTraceRefcntImpl.cpp, line 138 16:49:50 INFO - --DOMWINDOW == 2 (0x10d10d108) [serial = 2] [outer = 0x1068e3a98] [url = about:blank] 16:49:51 INFO - WARNING: child WaitForMessage() failed: file ../../../toolkit/xre/nsEmbedFunctions.cpp, line 361 16:49:51 INFO - WARNING: NS_ENSURE_SUCCESS(rv, 1) failed with result 0x80004005: file ../../../ipc/app/MozillaRuntimeMain.cpp, line 117 16:49:51 INFO - ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/nq/80zh_y2524l6058rrskcpmyw00000w/T/tmpdhJRG1/runtests_leaks.log 16:49:51 INFO - WaitForMessage returned 268451843 16:49:51 INFO - nsStringStats 16:49:51 INFO - => mAllocCount: 1 16:49:51 INFO - => mReallocCount: 0 16:49:51 INFO - => mFreeCount: 1 16:49:51 INFO - => mShareCount: 0 16:49:51 INFO - => mAdoptCount: 0 16:49:51 INFO - => mAdoptFreeCount: 0 16:49:55 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(inBrowser) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 82 16:49:55 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file ../../../../embedding/browser/webBrowser/nsWebBrowser.cpp, line 366 16:49:55 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file ../../../../embedding/browser/webBrowser/nsWebBrowser.cpp, line 366 16:49:55 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(domWindow) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 86 16:49:55 INFO - [Child 1280] WARNING: NS_ENSURE_TRUE(inBrowser) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 82 16:49:56 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(inBrowser) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 82 16:49:56 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file ../../../../embedding/browser/webBrowser/nsWebBrowser.cpp, line 366 16:49:56 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file ../../../../embedding/browser/webBrowser/nsWebBrowser.cpp, line 366 16:49:56 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(domWindow) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 86 16:49:56 INFO - [Child 1281] WARNING: NS_ENSURE_TRUE(inBrowser) failed: file ../../../../embedding/browser/webBrowser/nsDocShellTreeOwner.cpp, line 82 16:55:26 WARNING - TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_oop_CloseFromOpener.html | application timed out after 330 seconds with no output
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 55•11 years ago
|
||
Markus: I think it's permaorange again, and I'm certainly not going to do anything about that other than disable the test, so this would be an awesome time to land the OpenGL crash fix.
Comment 56•11 years ago
|
||
Too late, just another run of five in a row, not actually permaorange.
Comment 57•11 years ago
|
||
Test disabled on all platforms in https://hg.mozilla.org/integration/mozilla-inbound/rev/dbf01bda0a79 - if someone develops an interest in seeing it running on not-Mac again before someone develops an interest in fixing it on Mac, they can test the nested ifdefs and reenable it for not-Mac.
Whiteboard: [test disabled]
Updated•11 years ago
|
Whiteboard: [test disabled] → [test disabled][leave open]
Comment 58•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/5373095214d9 https://hg.mozilla.org/mozilla-central/rev/dbf01bda0a79
Comment 59•11 years ago
|
||
FWIW, in this log <https://tbpl.mozilla.org/php/getParsedLog.php?id=29384876&tree=Mozilla-Inbound> this error was triggered: <https://hg.mozilla.org/integration/mozilla-inbound/rev/5373095214d9#l1.37> 18:59:45 INFO - WaitForMessage returned 268451843 Which is mach_msg failing with MACH_RCV_TIMED_OUT (0x10004003). I'm not sure what that means. I backed out the debugging patch: https://hg.mozilla.org/integration/mozilla-inbound/rev/eb4a385bd96c
Comment 60•11 years ago
|
||
Whoops, I totally missed the activity here over the last days! In the logs, before WaitForMessage fails in the child, it fails in the parent: [Parent 1265] WARNING: parent WaitForMessage() failed: 0x10004003 (ipc/rcv) timed out: file ../../../ipc/glue/GeckoChildProcessHost.cpp, line 677 When I talked to jdm about this yesterday, our hypothesis was that we may be failing to launch the new child process, time out in the parent process waiting for it, and hang the parent process (no idea how). Then the remaining child processes from the preceding tests time out waiting for the parent process. (The child error messages come from child processes with PIDs that were already present before the test_browserElement_oop_CloseFromOpener.html test started.)
Comment 61•11 years ago
|
||
Interesting theory. Do you think there is anything special about test_browserElement_oop_CloseFromOpener.html which would trigger that kind of bug?
Comment 62•11 years ago
|
||
No idea, sorry. It was pretty much a guess in the dark. We just assumed that the problem is not in WaitForMessage itself or in its direct caller because that code has been around for so long.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•11 years ago
|
Summary: Intermittent TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_oop_CloseFromOpener.html | application timed out after 330 seconds with no output → Intermittent test_browserElement_oop_CloseFromOpener.html | application timed out after 330 seconds with no output
Comment 64•11 years ago
|
||
We had just enough of a gap (one of the OOM closures?) before the next one started that I didn't realize bug 925200 was just the next test inheriting the exact same thing. Now we're on to a third test.
Summary: Intermittent test_browserElement_oop_CloseFromOpener.html | application timed out after 330 seconds with no output → Some Mac builds always hit test_browserElement_oop_CloseFromOpener.html,test_browserElement_oop_CloseApp.html,test_browserElement_oop_ExposableURI.html | application timed out after 330 seconds with no output
Comment 66•11 years ago
|
||
Disabled test_browserElement_oop_ExposableURI.html in https://hg.mozilla.org/integration/mozilla-inbound/rev/d1689276f118
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 70•10 years ago
|
||
Mass tracking-e10s flag change. Filter bugmail on "2be0fcce-e36a-4e2c-aa80-0e3d33eb5406".
tracking-e10s:
--- → +
Updated•10 years ago
|
Comment 71•8 years ago
|
||
Try confirms these work without timing out now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Whiteboard: [test disabled][leave open]
Comment 72•8 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e0516ed5cd30 bug 930449 - Re-enable some OOP browser-element tests that no longer timeout.
Reporter | ||
Comment 73•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e0516ed5cd30
You need to log in
before you can comment on or make changes to this bug.
Description
•