Closed
Bug 1153094
Opened 10 years ago
Closed 10 years ago
Crash on WebRTC mochitest shutdown
Categories
(Core :: Memory Allocator, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
Tracking | Status | |
---|---|---|
firefox40 | --- | fixed |
People
(Reporter: drno, Assigned: jya)
References
Details
Attachments
(1 file)
52.70 KB,
patch
|
cpearce
:
review+
|
Details | Diff | Splinter Review |
With a self compiled Nightly from today (238479:fec90cbfbaad) on Mac OSX 10.10.3 when executing this command:
./mach mochitest --debugger lldb dom/media/tests/mochitest/test_peerConnection_basicAudio.html
I get the following stack trace:
Hit MOZ_CRASH() at /Users/nohlmeier/src/mozilla-central/memory/mozjemalloc/jemalloc.c:1610
Process 3588 stopped
* thread #8: tid = 0x16521e, 0x000000010001f6ea libmozglue.dylib`jemalloc_crash + 58 at jemalloc.c:1610, name = 'Socket Thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010001f6ea libmozglue.dylib`jemalloc_crash + 58 at jemalloc.c:1610
1607 MOZ_NEVER_INLINE static void
1608 jemalloc_crash()
1609 {
-> 1610 MOZ_CRASH();
1611 }
1612
1613 #if defined(MOZ_JEMALLOC_HARD_ASSERTS)
(lldb) bt
* thread #8: tid = 0x16521e, 0x000000010001f6ea libmozglue.dylib`jemalloc_crash + 58 at jemalloc.c:1610, name = 'Socket Thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x000000010001f6ea libmozglue.dylib`jemalloc_crash + 58 at jemalloc.c:1610
frame #1: 0x000000010001ccd7 libmozglue.dylib`arena_dalloc + 113 at jemalloc.c:4651
frame #2: 0x000000010001cc66 libmozglue.dylib`arena_dalloc(ptr=<unavailable>, offset=<unavailable>) + 886 at jemalloc.c:4719
frame #3: 0x000000010001a400 libmozglue.dylib`free_impl(ptr=0x000000011bc8b918) + 96 at replace_malloc.c:201
frame #4: 0x000000010001ab82 libmozglue.dylib`zone_free_definite_size(zone=0x000000010004d4b8, ptr=0x000000011bc8b918, size=16) + 146 at replace_malloc.c:359
frame #5: 0x0000000101db6cd5 XUL`nsTArrayInfallibleAllocator::Free(aPtr=0x000000011bc8b918) + 21 at nsTArray.h:188
frame #6: 0x0000000101db88fd XUL`nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::~nsTArray_base(this=0x000000011bc8b910) + 77 at nsTArray-inl.h:22
frame #7: 0x0000000101db9ae2 XUL`nsTArray_Impl<unsigned char, nsTArrayInfallibleAllocator>::~nsTArray_Impl(this=0x000000011bc8b910) + 34 at nsTArray.h:790
frame #8: 0x0000000101dbd9b5 XUL`nsTArray<unsigned char>::~nsTArray(this=0x000000011bc8b910) + 21 at nsTArray.h:1801
frame #9: 0x0000000101dbd995 XUL`mozilla::DataBuffer::~DataBuffer(this=0x000000011bc8b910) + 21 at MediaData.h:425
frame #10: 0x0000000101dbd975 XUL`mozilla::DataBuffer::~DataBuffer(this=0x000000011bc8b910) + 21 at MediaData.h:425
frame #11: 0x0000000102f522fd XUL`nsAutoPtr<mozilla::DataBuffer>::~nsAutoPtr(this=0x0000000116fcf670) + 45 at nsAutoPtr.h:74
frame #12: 0x0000000102f4f605 XUL`nsAutoPtr<mozilla::DataBuffer>::~nsAutoPtr(this=0x0000000116fcf670) + 21 at nsAutoPtr.h:73
frame #13: 0x0000000102f52148 XUL`mozilla::runnable_args_m_1<mozilla::RefPtr<mozilla::MediaPipeline::PipelineTransport>, nsresult (mozilla::MediaPipeline::PipelineTransport::*)(nsAutoPtr<mozilla::DataBuffer>), nsAutoPtr<mozilla::DataBuffer> >::Run(this=0x000000011bd75900) + 184 at runnable_utils_generated.h:122
frame #14: 0x0000000101f02a7a XUL`nsThread::ProcessNextEvent(this=0x00000001003319a0, aMayWait=true, aResult=0x0000000116fcf88e) + 2042 at nsThread.cpp:866
frame #15: 0x0000000101f5e1d7 XUL`NS_ProcessNextEvent(aThread=0x00000001003319a0, aMayWait=true) + 151 at nsThreadUtils.cpp:265
frame #16: 0x00000001020a6ca6 XUL`nsSocketTransportService::Run(this=0x0000000100347bf0) + 1014 at nsSocketTransportService2.cpp:851
frame #17: 0x00000001020a794c XUL`non-virtual thunk to nsSocketTransportService::Run(this=0x0000000100347c08) + 28 at nsSocketTransportService2.cpp:927
frame #18: 0x0000000101f02a7a XUL`nsThread::ProcessNextEvent(this=0x00000001003319a0, aMayWait=false, aResult=0x0000000116fcfc5e) + 2042 at nsThread.cpp:866
frame #19: 0x0000000101f5e1d7 XUL`NS_ProcessNextEvent(aThread=0x00000001003319a0, aMayWait=false) + 151 at nsThreadUtils.cpp:265
frame #20: 0x0000000102591090 XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000117581780, aDelegate=0x0000000100326690) + 656 at MessagePump.cpp:339
frame #21: 0x0000000102504875 XUL`MessageLoop::RunInternal(this=0x0000000100326690) + 117 at message_loop.cc:233
frame #22: 0x0000000102504785 XUL`MessageLoop::RunHandler(this=0x0000000100326690) + 21 at message_loop.cc:226
frame #23: 0x000000010250472d XUL`MessageLoop::Run(this=0x0000000100326690) + 45 at message_loop.cc:200
frame #24: 0x0000000101f00fe9 XUL`nsThread::ThreadFunc(aArg=0x00000001003319a0) + 329 at nsThread.cpp:364
frame #25: 0x0000000101b6b2b1 libnss3.dylib`_pt_root(arg=0x0000000100398530) + 449 at ptthread.c:212
frame #26: 0x00007fff84106268 libsystem_pthread.dylib`_pthread_body + 131
frame #27: 0x00007fff841061e5 libsystem_pthread.dylib`_pthread_start + 176
frame #28: 0x00007fff8410441d libsystem_pthread.dylib`thread_start + 13
This assertion is likely just checking for memory corruption.
Comment 2•10 years ago
|
||
Primary suspicion would be on bug 1101651, which landed Thursday morning on inbound, due to the timeframe. Of course that may have exposed an existing bug. Regression testing seems feasible given high crash frequency for jib (and nils?)
There are some changes to non-standalone; I note that some of the "image_" fiddling is no longer in MOZILLA_INTERNAL_API; though that shouldn't matter to mochitests. However, I've only looked at MediaPipeline.cpp/h thus far.
The crash is almost certainly in the cleanup for SendRtpPacket_s() or SendRtcpPacket_s()
Blocks: 1101651
Comment 3•10 years ago
|
||
Likely this isn't an allocator bug - but you never know, and IIRC jemalloc3 landed recently
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #3)
> Likely this isn't an allocator bug - but you never know, and IIRC jemalloc3
> landed recently
Agreed.I overlooked the MediaPipeline in the stack trace and initially thought this is JS related.
Another observation: if I add a 10s sleep to my tests (for other purposes) this crash happens in the middle of the call.
Yes I'll try regression or ASAN build.
Reporter | ||
Comment 5•10 years ago
|
||
Revision 238432 right before bug 1101651 seems broken.
Revision 238000 seems to be fine.
Comment 7•10 years ago
|
||
The range 238216:238432 includes these in media/webrtc:
Bug 1151139: Simplify how we choose which streams to gather stats from.
and in dom/media (mostly MSE-related)
Bug 1148571 - Dispatch AudioSink notifications asynchronously. r=jww
Bug 1150539: log getUserMedia constraints used in MediaManager:5. r=jesup
Bug 1150539 - fix getUserMedia constraint default to aPrefs.mFPS, not aPrefs.mMinFPS.
Bug 1149494 - Part 1. Add a listener directly to the unblocked input stream that reports the s
Bug 1149494 - Part 2. Add mochitest. r=jesup
Bug 1151656 - (bunch of patches)
Bug 1152236: OMX codec should use AnnexB as input format.
Bug 1094764 - Implement AudioContext.suspend and friends. (backed out after getting to m-c for mulet failures)
Bug 1152359 - Fix doppler shift calculation formula
Bug 1150853: Part1. Create MediaRawData object type.
(2 more parts)
Bug 1149616 - Fix the calculation of slots count in Read/WriteBuffer
Most interesting would be bug 1149494 (pehrsons), though none of these stand out.
Could be a chunking/timing change exposed something, and RTP data is hanging around to be processed "too long" in shutdown.
Comment 8•10 years ago
|
||
Also in mtransport:
Bug 1150966: Check whether |streams_| is null on stats methods in NrIceMediaStream.
Bug 935838 - Add per app network traffic statistics to the UDP socket. r=sicking, r=mayhemer
Comment 9•10 years ago
|
||
Please do not refer to revision numbers, they only apply to local clones, and are not guaranteed to mean the same thing for everyone. For instance, 238216:238432 contains none of the bugs you refer in comment 7 in my local clone.
Reporter | ||
Comment 10•10 years ago
|
||
I'm guessing/hope that the full revison plus hash string is better then just the revision...
238216:27d9481e341b = works
238432:6f7d0ef4912a = broken
BTW Jesup we are not talking about a shutdown problem as I tried to indicate in c4. If a test runs long enough it kills the browser mid call.
Comment 11•10 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #10)
> I'm guessing/hope that the full revison plus hash string is better then just
> the revision...
>
> 238216:27d9481e341b = works
> 238432:6f7d0ef4912a = broken
Those match my clone of mozilla-central (used for the hg log entries above:
hg log -r 27d9481e341b:6f7d0ef4912a media/mtransport (or dom/media, etc)
Comment 12•10 years ago
|
||
The mtransport ones seem very unlikely to directly cause this:
Bug 1150966 Bug 935838
Overall, I'm suspecting chunking/order/timing changes triggering a latent bug. Further regression runs that isolate the changeset where things changed likely will make this clear, though may not help track down the failure directly.
mt's ASAN results would also point in this direction. (Unfortunately, as ASAN was a good hope -- but ASAN is slow enough to throw timings off a lot.)
Reporter | ||
Comment 13•10 years ago
|
||
238216:27d9481e341b = works
238324:5b615a235096 = broken
Comment 14•10 years ago
|
||
As I noted in the other bug, bug 1150853 is the culprit.
Assignee | ||
Comment 16•10 years ago
|
||
DataBuffer (as defined in MediaData.h) is a ref counted object, can't assign it to a nsAutoPtr.
AFAIK, that object shouldn't be used outside the mp4 demuxer.
Could be that DataBuffer introduce a naming conflict and that name was used elsewhere (did check though)
Assignee | ||
Comment 17•10 years ago
|
||
backtraceframe #10: 0x0000000101dbd975 XUL`mozilla::DataBuffer::~DataBuffer(this=0x000000011bc8b910) + 21 at MediaData.h:425
indicates that it's using the wrong DataBuffer type (the one define in MediaData.h) and not the one define in media/mtransport/databuffer.h
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jyavenard
I'm guessing the compiler never sees both classes together, so it doesn't complain and generates calls to a "mozilla::DataBuffer::~DataBuffer" (that exists in mtransport).
But then the linker probably just drops one of them, so you end up calling the wrong one.
I could reproduce the same issue with a toy program:
data1.h:
void data1();
data1.cpp:
#include "data1.h"
struct Data
{
public:
Data() { printf("Data1()\n"); };
~Data() { printf("~Data1()\n"); };
};
void data1()
{
Data data;
}
data2.h:
void data2();
data2.cpp:
#include "data2.h"
struct Data
{
public:
Data() { printf("Data2()\n"); };
~Data() { printf("~Data2()\n"); };
};
void data1()
{
Data data;
}
main.cpp:
#include "data1.h"
#include "data2.h"
int main() {
data1();
data2();
return 0;
}
When run on my machine, I get:
Data2()
~Data2()
Data2()
~Data2()
So the Data struct in data2.cpp takes over the world!
You'll need to rename things I'm afraid...
Typo above: data2.cpp defines "void data2()" of course.
Assignee | ||
Comment 20•10 years ago
|
||
rename the classes LargeDataBuffer into MediaLargeByteBuffer and DataBuffer into MediaByteBuffer
Attachment #8592584 -
Flags: review?(cpearce)
Comment 21•10 years ago
|
||
Comment on attachment 8592584 [details] [diff] [review]
Rename LargeDataBuffer and DataBuffer object
Review of attachment 8592584 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/media/mediasource/SourceBuffer.h
@@ +162,1 @@
> uint32_t aLength,
Can you fix the hanging indent here?
Attachment #8592584 -
Flags: review?(cpearce) → review+
Assignee | ||
Comment 22•10 years ago
|
||
Comment 23•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in
before you can comment on or make changes to this bug.
Description
•