Closed Bug 1142693 Opened 5 years ago Closed 5 years ago

Intermittent Mac OS X mochitest failures with "pipe error: Message too long" | application crashed [@ mozalloc_abort(char const*)][@ mozilla::ipc::MessageChannel::OnChannelErrorFromLink()]

Categories

(Core :: IPC, defect)

x86_64
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
e10s + ---
firefox38 --- unaffected
firefox39 --- fixed
firefox40 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Assigned: jld)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

Attachments

(1 file, 1 obsolete file)

12:10:23 INFO - 2008 INFO TEST-START | toolkit/components/aboutmemory/tests/test_memoryReporters2.xul
12:10:23 INFO - ++DOMWINDOW == 75 (0x1230a4500) [pid = 2382] [serial = 4331] [outer = 0x124d14400]
12:10:23 INFO - ++DOCSHELL 0x11ee0fd00 == 15 [pid = 2382] [id = 1088]
12:10:23 INFO - ++DOMWINDOW == 76 (0x1292f1400) [pid = 2382] [serial = 4332] [outer = 0x0]
12:10:23 INFO - ++DOMWINDOW == 77 (0x1292f1800) [pid = 2382] [serial = 4333] [outer = 0x1292f1400]
12:10:23 INFO - ++DOCSHELL 0x11f076500 == 16 [pid = 2382] [id = 1089]
12:10:23 INFO - ++DOMWINDOW == 78 (0x1295d1400) [pid = 2382] [serial = 4334] [outer = 0x0]
12:10:23 INFO - ++DOMWINDOW == 79 (0x129782400) [pid = 2382] [serial = 4335] [outer = 0x1295d1400]
12:10:23 INFO - ++DOCSHELL 0x12122e600 == 17 [pid = 2382] [id = 1090]
12:10:23 INFO - ++DOMWINDOW == 80 (0x12980e400) [pid = 2382] [serial = 4336] [outer = 0x0]
12:10:23 INFO - ++DOMWINDOW == 81 (0x12a0e0400) [pid = 2382] [serial = 4337] [outer = 0x12980e400]
12:10:24 INFO - [2409] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2409] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2409] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2409] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2409] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/vf/m1cfx9wd41d6z5qv92fbkzy800000w/T/tmpK4uXsD.mozrunner/runtests_leaks_tab_pid2409.log
12:10:24 INFO - [Parent 2382] WARNING: No docshells for remote frames!: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 513
12:10:24 INFO - [Parent 2382] WARNING: No docshells for remote frames!: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 513
12:10:24 INFO - [Parent 2382] WARNING: pipe error: Message too long: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728
12:10:24 INFO - [Child 2409] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1589
12:10:24 INFO - [Parent 2382] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 20
12:10:24 INFO - #01: mozilla::ipc::ProcessLink::OnChannelError() [xpcom/glue/Monitor.h:36]
12:10:24 INFO - #02: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) [ipc/chromium/src/chrome/common/ipc_channel_posix.cc:854]
12:10:24 INFO - #03: event_base_loop [ipc/chromium/src/third_party/libevent/event.c:1355]
12:10:24 INFO - #04: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_libevent.cc:332]
12:10:24 INFO - #05: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:234]
12:10:24 INFO - #06: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:508]
12:10:24 INFO - #07: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:173]
12:10:24 INFO - #08: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:39]
12:10:24 INFO - #09: libsystem_pthread.dylib + 0x3268
12:10:24 INFO - #10: libsystem_pthread.dylib + 0x31e5
12:10:24 INFO - [Child 2409] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1589
12:10:24 INFO - Hit MOZ_CRASH() at /builds/slave/m-in-osx64-d-00000000000000000/build/src/memory/mozalloc/mozalloc_abort.cpp:33
12:10:24 INFO - [2410] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2410] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2410] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2410] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2410] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/vf/m1cfx9wd41d6z5qv92fbkzy800000w/T/tmpK4uXsD.mozrunner/runtests_leaks_tab_pid2410.log
12:10:24 INFO - [Parent 2382] WARNING: No docshells for remote frames!: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 513
12:10:24 INFO - [Parent 2382] WARNING: No docshells for remote frames!: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 513
12:10:24 INFO - [2411] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2411] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2411] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2411] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - [2411] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 120
12:10:24 INFO - ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/vf/m1cfx9wd41d6z5qv92fbkzy800000w/T/tmpK4uXsD.mozrunner/runtests_leaks_tab_pid2411.log
12:10:24 INFO - [Parent 2382] WARNING: No docshells for remote frames!: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 513
12:10:24 INFO - [Parent 2382] WARNING: No docshells for remote frames!: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameLoader.cpp, line 513
12:10:24 INFO - ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
12:10:24 INFO - ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
12:10:24 INFO - [Parent 2382] WARNING: NS_ENSURE_TRUE(mCallback->DoLoadMessageManagerScript(aURL, aRunInGlobalScope)) failed: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/base/nsFrameMessageManager.cpp, line 462
12:10:24 INFO - 2009 INFO TEST-UNEXPECTED-FAIL | toolkit/components/aboutmemory/tests/test_memoryReporters2.xul | uncaught exception - NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIFrameScriptLoader.loadFrameScript] at chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_memoryReporters2.xul:44
12:10:24 INFO - JavaScript error: chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_memoryReporters2.xul, line 44: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIFrameScriptLoader.loadFrameScript]
12:10:24 INFO - ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
12:10:24 INFO - ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
12:10:24 INFO - [Parent 2382] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/events/ContentEventHandler.cpp, line 110
12:10:24 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv
12:10:24 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv
12:10:24 INFO - ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv
12:10:24 INFO - ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
12:10:24 INFO - [Parent 2382] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 20
12:10:24 INFO - [Parent 2382] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 20
12:10:24 INFO - [Parent 2382] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 20
12:10:24 INFO - [Child 2410] WARNING: '!compMgr', file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/glue/nsComponentManagerUtils.cpp, line 63
12:10:24 INFO - [Child 2410] WARNING: NS_ENSURE_TRUE(svc) failed: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/ipc/nsIContentChild.cpp, line 31
12:10:24 INFO - [Child 2410] WARNING: Error constructing actor PJavaScriptChild: file ./PContentChild.cpp, line 985
12:10:24 INFO - [Parent 2382] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/events/ContentEventHandler.cpp, line 110
12:10:24 INFO - [Child 2411] WARNING: '!compMgr', file /builds/slave/m-in-osx64-d-00000000000000000/build/src/xpcom/glue/nsComponentManagerUtils.cpp, line 63
12:10:24 INFO - [Child 2411] WARNING: NS_ENSURE_TRUE(svc) failed: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/dom/ipc/nsIContentChild.cpp, line 31
12:10:24 INFO - [Child 2411] WARNING: Error constructing actor PJavaScriptChild: file ./PContentChild.cpp, line 985
12:10:24 INFO - 2010 INFO MEMORY STAT vsize after test: 3709063168
12:10:24 INFO - 2011 INFO MEMORY STAT residentFast after test: 580915200
12:10:24 INFO - 2012 INFO MEMORY STAT heapAllocated after test: 184610360
12:10:24 INFO - [Child 2410] WARNING: preserving unexpected JS escape sequence: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/modules/libpref/prefread.cpp, line 393
12:10:24 INFO - [Child 2411] WARNING: preserving unexpected JS escape sequence: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/modules/libpref/prefread.cpp, line 393
12:10:24 INFO - 2013 INFO TEST-UNEXPECTED-ERROR | toolkit/components/aboutmemory/tests/test_memoryReporters2.xul | This test left crash dumps behind, but we weren't expecting it to!
12:10:24 INFO - TEST-INFO took 1153ms
12:10:24 INFO - 2014 INFO TEST-OK | toolkit/components/aboutmemory/tests/test_memoryReporters2.xul | 
12:10:26 INFO - 2015 INFO TEST-UNEXPECTED-FAIL | toolkit/components/aboutmemory/tests/test_memoryReporters2.xul | undefined assertion name - Result logged after SimpleTest.finish() 
12:29:55 WARNING - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozalloc_abort(char const*)]
12:29:55 INFO - Crash dump filename: /var/folders/vf/m1cfx9wd41d6z5qv92fbkzy800000w/T/tmpK4uXsD.mozrunner/minidumps/C3DDECAC-F0AA-4C9B-AE56-AD046FBC2D1F.dmp
12:29:55 INFO - Operating system: Mac OS X
12:29:55 INFO - 10.10.2 14C109
12:29:55 INFO - CPU: amd64
12:29:55 INFO - family 6 model 42 stepping 7
12:29:55 INFO - 8 CPUs
12:29:55 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
12:29:55 INFO - Crash address: 0x0
12:29:55 INFO - Thread 2 (crashed)
12:29:55 INFO - 0 libmozglue.dylib!mozalloc_abort(char const*) [mozalloc_abort.cpp:7529425ef21f : 33 + 0x0]
12:29:55 INFO - rbx = 0x00007fff77545c50 r12 = 0x000000010a2035f0
12:29:55 INFO - r13 = 0x0000000000000003 r14 = 0x000000010400104c
12:29:55 INFO - r15 = 0x00000001004d3290 rip = 0x0000000100034c91
12:29:55 INFO - rsp = 0x000000010a203570 rbp = 0x000000010a203580
12:29:55 INFO - Found by: given as instruction pointer in context
12:29:55 INFO - 1 XUL!Abort [nsDebugImpl.cpp:7529425ef21f : 469 + 0x4]
12:29:55 INFO - rbx = 0x00007fff77545c50 r12 = 0x000000010a2035f0
12:29:55 INFO - r13 = 0x0000000000000003 r14 = 0x000000010400104c
12:29:55 INFO - r15 = 0x00000001004d3290 rip = 0x00000001004d32f9
12:29:55 INFO - rsp = 0x000000010a203590 rbp = 0x000000010a203590
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 2 XUL!NS_DebugBreak [nsDebugImpl.cpp:7529425ef21f : 426 + 0x4]
12:29:55 INFO - rbx = 0x00007fff77545c50 r12 = 0x000000010a2035f0
12:29:55 INFO - r13 = 0x0000000000000003 r14 = 0x000000010400104c
12:29:55 INFO - r15 = 0x00000001004d3290 rip = 0x00000001004d3086
12:29:55 INFO - rsp = 0x000000010a2035a0 rbp = 0x000000010a203a10
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 3 XUL!mozilla::ipc::MessageChannel::OnChannelErrorFromLink() [MessageChannel.cpp:7529425ef21f : 1589 + 0x1f]
12:29:55 INFO - rbx = 0x0000000106527490 r12 = 0x000000010658b170
12:29:55 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000001
12:29:55 INFO - r15 = 0x000000010650a170 rip = 0x00000001008e8da6
12:29:55 INFO - rsp = 0x000000010a203a20 rbp = 0x000000010a203a30
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 4 XUL!mozilla::ipc::ProcessLink::OnChannelError() [MessageLink.cpp:7529425ef21f : 405 + 0x8]
12:29:55 INFO - rbx = 0x000000010650a160 r12 = 0x000000010658b170
12:29:55 INFO - r13 = 0x0000000000000001 r14 = 0x000000010651d3c0
12:29:55 INFO - r15 = 0x000000010650a170 rip = 0x00000001008eae88
12:29:55 INFO - rsp = 0x000000010a203a40 rbp = 0x000000010a203a60
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 5 XUL!IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) [ipc_channel_posix.cc:7529425ef21f : 846 + 0x9]
12:29:55 INFO - rbx = 0x000000010658b000 r12 = 0x000000010658b170
12:29:55 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000003
12:29:55 INFO - r15 = 0x0000000106596000 rip = 0x00000001008b983a
12:29:55 INFO - rsp = 0x000000010a203a70 rbp = 0x000000010a203af0
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 6 XUL!event_base_loop [event.c:7529425ef21f : 1350 + 0x13]
12:29:55 INFO - rbx = 0x000000010658b000 r12 = 0x000000010658b170
12:29:55 INFO - r13 = 0x0000000000000001 r14 = 0x00000001065171b0
12:29:55 INFO - r15 = 0x000000010652c500 rip = 0x00000001008ae50d
12:29:55 INFO - rsp = 0x000000010a203b00 rbp = 0x000000010a203ba0
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 7 XUL!base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [message_pump_libevent.cc:7529425ef21f : 332 + 0xd]
12:29:55 INFO - rbx = 0x000000010650e758 r12 = 0x000000010650e740
12:29:55 INFO - r13 = 0x000000010a203be0 r14 = 0x000000010a203cd8
12:29:55 INFO - r15 = 0x000000010650e740 rip = 0x00000001008b410e
12:29:55 INFO - rsp = 0x000000010a203bb0 rbp = 0x000000010a203c30
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 8 XUL!MessageLoop::RunInternal() [message_loop.cc:7529425ef21f : 233 + 0x7]
12:29:55 INFO - rbx = 0x000000010a203cd8 r12 = 0x000000010a203dd8
12:29:55 INFO - r13 = 0x00000000000008ff r14 = 0x000000010a203cd8
12:29:55 INFO - r15 = 0x00000001065159b8 rip = 0x00000001008c1c8f
12:29:55 INFO - rsp = 0x000000010a203c40 rbp = 0x000000010a203c70
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 9 XUL!MessageLoop::Run() [message_loop.cc:7529425ef21f : 226 + 0x4]
12:29:55 INFO - rbx = 0x0000000106515990 r12 = 0x000000010a203dd8
12:29:55 INFO - r13 = 0x00000000000008ff r14 = 0x000000010a203cd8
12:29:55 INFO - r15 = 0x00000001065159b8 rip = 0x00000001008c1b90
12:29:55 INFO - rsp = 0x000000010a203c80 rbp = 0x000000010a203ca0
12:29:55 INFO - Found by: call frame info
12:29:55 INFO - 10 XUL!base::Thread::ThreadMain() [thread.cc:7529425ef21f : 170 + 0x7]
12:29:55 INFO - rbx = 0x0000000106515990 r12 = 0x000000010a203dd8
12:29:55 INFO - r13 = 0x00000000000008ff r14 = 0x000000010a203cd8
12:29:55 INFO - r15 = 0x00000001065159b8 rip = 0x00000001008ca8d7
12:29:55 INFO - rsp = 0x000000010a203cb0 rbp = 0x000000010a203ee0
12:29:55 INFO - Found by: call frame info
Duplicate of this bug: 1142961
Summary: Intermittent test_memoryReporters2.xul | application crashed [@ mozalloc_abort(char const*)][@ mozilla::ipc::MessageChannel::OnChannelErrorFromLink()] → Intermittent test_aboutmemory5.xul,test_dumpGCAndCCLogsToFile.xul,test_memoryReporters2.xul | application crashed [@ mozalloc_abort(char const*)][@ mozilla::ipc::MessageChannel::OnChannelErrorFromLink()]
Jim, this is pretty frequent on 10.10. Can you help find an owner?
Flags: needinfo?(jmathies)
Marking for e10s triage due to this stack:

12:10:24 INFO - [Child 2409] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1589
12:10:24 INFO - [Parent 2382] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 20
12:10:24 INFO - #01: mozilla::ipc::ProcessLink::OnChannelError() [xpcom/glue/Monitor.h:36]
12:10:24 INFO - #02: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) [ipc/chromium/src/chrome/common/ipc_channel_posix.cc:854]
12:10:24 INFO - #03: event_base_loop [ipc/chromium/src/third_party/libevent/event.c:1355]
12:10:24 INFO - #04: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_libevent.cc:332]
12:10:24 INFO - #05: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:234]
12:10:24 INFO - #06: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:508]
12:10:24 INFO - #07: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:173]
12:10:24 INFO - #08: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:39]
12:10:24 INFO - #09: libsystem_pthread.dylib + 0x3268
12:10:24 INFO - #10: libsystem_pthread.dylib + 0x31e5
12:10:24 INFO - [Child 2409] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1589
12:10:24 INFO - Hit MOZ_CRASH() at /builds/slave/m-in-osx64-d-00000000000000000/build/src/memory/mozalloc/mozalloc_abort.cpp:33
tracking-e10s: --- → ?
Flags: needinfo?(jmathies)
From comment #0:

> 12:10:24 INFO - [Parent 2382] WARNING: pipe error: Message too long: file /builds/slave/m-in-osx64-d-00000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728

I think this is where the failure starts.  A stack from this point (and/or other info about the message in question) might help.
See Also: → 1143242
Blocks: e10s-tests
See Also: → 1150572
See Also: → 1150619
comment 42 is reasonably conclusive that this is a regression from
https://hg.mozilla.org/integration/fx-team/rev/3a3ffe5df28e

Combined with the complex of related bugs, this appears to be one of the top intermittent failures right now.  Is there a quick fix, or should the above be backed out?
Flags: needinfo?(dao)
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #44)
> comment 42 is reasonably conclusive that this is a regression from
> https://hg.mozilla.org/integration/fx-team/rev/3a3ffe5df28e
> 
> Combined with the complex of related bugs, this appears to be one of the top
> intermittent failures right now.  Is there a quick fix, or should the above
> be backed out?

The first failures logged in this bug were way before that patch landed, and I don't see what part of that rather straightforward patch could reasonably have caused this crash either. Backing it out would most likely just hide the underlying bug until the next innocent patch comes along.
Flags: needinfo?(dao)
(In reply to Dão Gottwald [:dao] from comment #50)
> The first failures logged in this bug were way before that patch landed,

That doesn't mean every failure that looks the same has the same underlying cause.  There was clearly a regression when the patch landed.

> and
> I don't see what part of that rather straightforward patch could reasonably
> have caused this crash either. Backing it out would most likely just hide
> the underlying bug until the next innocent patch comes along.

We allocate responsibility for passing tests to the person trying to land code, because it's the only way to allocate that responsibility fairly across people working on the project.  Attempting to allocate such work to "owners" would mean that people who have been working on the project a long time would either have to spend all of their time fixing test failures or that we'd just start randomly disabling old (and often important) tests.


That said, it's possible this is related to low-memory conditions we've been seeing recently.  Is there a way that your patch would have increased memory usage?  Could you see which part of the patch is responsible in some useful way that will narrow the problem down?
Flags: needinfo?(dao)
And, to be clear, just because it's the responsibility of the person landing code doesn't mean that they have to do all the work -- but it does mean that they're responsible for reporting the problem to others in a useful way and driving that report to resolution.
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #51)
> (In reply to Dão Gottwald [:dao] from comment #50)
> > The first failures logged in this bug were way before that patch landed,
> 
> That doesn't mean every failure that looks the same has the same underlying
> cause.  There was clearly a regression when the patch landed.

There's pretty much no way how that patch could have introduced a new "underlying cause", though.

> That said, it's possible this is related to low-memory conditions we've been
> seeing recently.  Is there a way that your patch would have increased memory
> usage?  Could you see which part of the patch is responsible in some useful
> way that will narrow the problem down?

It reduced the number of images packaged in omni.jar, so if anything I'd expect it to reduce memory usage. It was also 95% irrelevant for OS X. (Look for "osx" to find the part relevant for OS X.)

(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #52)
> And, to be clear, just because it's the responsibility of the person landing
> code doesn't mean that they have to do all the work -- but it does mean that
> they're responsible for reporting the problem to others in a useful way and
> driving that report to resolution.

My best guess is that somebody familiar with those aboutmemory tests (CC'ing njn), memory allocation (CC'ing mh and njn) and/or IPC (CC'ing mccr8) will need to look into this.
Flags: needinfo?(dao) → needinfo?(n.nethercote)
(In reply to Dão Gottwald [:dao] from comment #53)
> My best guess is that somebody familiar with those aboutmemory tests (CC'ing
> njn), memory allocation (CC'ing mh and njn) and/or IPC (CC'ing mccr8) will
> need to look into this.

njn is on PTO, adding erahm, who might have ideas. Specifically, could this be related to the size of the about:memory data sent from the child process? (no idea how they are being sent, but I could see that being related)
Flags: needinfo?(erahm)
Judging from comment 13 it looks like bug 1151597 might fix this. Jed does that sound about right?
Flags: needinfo?(erahm) → needinfo?(jld)
Depends on: 1051567
See Also: → 1136302
erahm and I think we've figured this out: it's a duplicate of https://crbug.com/29225, a Chromium bug filed in December 2009 about a case where Mac OS X sendmsg() fails with EMSGSIZE instead of EAGAIN when trying to nonblockingly pass fds over a socket whose buffer is almost full.

https://codereview.chromium.org/460102 has some additional info.  Apparently the socket will poll as writable if there's at least 1 byte free, but sending a fd will fail unless there's at least 16 bytes (empirically in 2009; may have changed), so treating EMSGSIZE as EAGAIN means the sending thread will spin the message loop until the receiver makes progress.  Chromium appears to still do this as of 2015, so I'm assuming it's not a significant problem in practice.

The observant reader will notice that all of the failures are OS X 10.10 debug mochitests that use remote browsers (except comment #28 and comment #55, which look mis-starred), and that many of them aren't about:memory related.  I wonder if there's something about 10.10, and/or about debug builds, that explains why this spent 5+ years not breaking for us.

The patch looks simple enough to backport.
Assignee: nobody → jld
Flags: needinfo?(n.nethercote)
Flags: needinfo?(jld)
Summary: Intermittent test_aboutmemory5.xul,test_dumpGCAndCCLogsToFile.xul,test_memoryReporters2.xul | application crashed [@ mozalloc_abort(char const*)][@ mozilla::ipc::MessageChannel::OnChannelErrorFromLink()] → Intermittent Mac OS X mochitest failures with "pipe error: Message too long" | application crashed [@ mozalloc_abort(char const*)][@ mozilla::ipc::MessageChannel::OnChannelErrorFromLink()]
(In reply to Jed Davis [:jld] {UTC-7} from comment #71)
> erahm and I think we've figured this out: it's a duplicate of
> https://crbug.com/29225, a Chromium bug filed in December 2009 about a case
> where Mac OS X sendmsg() fails with EMSGSIZE instead of EAGAIN when trying
> to nonblockingly pass fds over a socket whose buffer is almost full.
> 
> https://codereview.chromium.org/460102 has some additional info.  Apparently
> the socket will poll as writable if there's at least 1 byte free, but
> sending a fd will fail unless there's at least 16 bytes (empirically in
> 2009; may have changed), so treating EMSGSIZE as EAGAIN means the sending
> thread will spin the message loop until the receiver makes progress. 
> Chromium appears to still do this as of 2015, so I'm assuming it's not a
> significant problem in practice.
> 
> The observant reader will notice that all of the failures are OS X 10.10
> debug mochitests that use remote browsers (except comment #28 and comment
> #55, which look mis-starred), and that many of them aren't about:memory
> related.  I wonder if there's something about 10.10, and/or about debug
> builds, that explains why this spent 5+ years not breaking for us.
> 
> The patch looks simple enough to backport.

This is great news!
My tests show we get a few different errors depending on the situation.

On my 10.10.2 machine:
 - the internal buffer size is 8KiB
 - the size needed to send an FD is 16 bytes
 - that leaves 8176 bytes for regular messages

Situations:
  - Internal buffer *full*
    - Send a message w/ an FD, get EMSGSIZE
    - Send a message w/o an FD, get EWOULDBLOCK
  - Internal buffer full enough an FD would not fit
    - Send a message w/ an FD, get EMSGSIZE
    - Send a message w/o an FD, larger than available buffer, get EWOULDBLOCK
      Note this is crazy, we should have sent at least *some* bytes
    - Send a message w/o an FD, smaller than available buffer, no error
  - Internal buffer full enough an FD *would* fit, but a message would not
    - Send a message w/ and FD, get EWOULDBLOCK
    - Send just an FD, all is well
    - Send a message w/o an FD that small enough, all is well
    - Send a message w/o an FD, larger than available buffer, get EWOULDBLOCK

My conclusion: we probably have to handle EWOULDBLOCK as well as EMSGSIZE.
(In reply to Eric Rahm [:erahm] from comment #73)
> My conclusion: we probably have to handle EWOULDBLOCK as well as EMSGSIZE.

It looks like we already are:

#define EAGAIN          35              /* Resource temporarily unavailable */
#define EWOULDBLOCK     EAGAIN          /* Operation would block */
Attached patch Patch (obsolete) — Splinter Review
Attachment #8589970 - Flags: review?(bent.mozilla)
Comment on attachment 8589970 [details] [diff] [review]
Patch

As discussed on irc we're going to try to yield the thread in the EMSGSIZE case.
Attachment #8589970 - Flags: review?(bent.mozilla)
Restructured the patch to add a sched_yield in the EMSGSIZE case.  Trying: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f06819d35300 (also tested locally on 10.10 and on Linux).
Attachment #8589970 - Attachment is obsolete: true
Attachment #8591013 - Flags: review?(bent.mozilla)
Comment on attachment 8591013 [details] [diff] [review]
Patch [v2, with sched_yield]

Review of attachment 8591013 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8591013 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/16ad00f89299
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Thanks! Can you please nominate this for Aurora uplift when you get a chance? :)
Flags: needinfo?(jld)
Comment on attachment 8591013 [details] [diff] [review]
Patch [v2, with sched_yield]

Approval Request Comment
[Feature/regressing bug #]: Bug has been present since 2009, but seems to have only affected us on OS X 10.10.
[User impact if declined]: Intermittent failures ("sad tab") in e10s mode on OS X.
[Describe test coverage new/current, TreeHerder]: No specific tests, but it's been stable on m-c for a few days now, and the intermittent test failures that aren't happening anymore are cases where this code has received coverage.
[Risks and why]: Low — if this patch matters we were already going to (incorrectly) kill the child process, and the fix is similar to what Chromium has been doing for 5+ years.
[String/UUID change made/needed]: None.
Flags: needinfo?(jld)
Attachment #8591013 - Flags: approval-mozilla-aurora?
Comment on attachment 8591013 [details] [diff] [review]
Patch [v2, with sched_yield]

Approved for uplift since it's been stable on m-c so far.
Attachment #8591013 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Excellent detective work, jld and erahm!
You need to log in before you can comment on or make changes to this bug.