Open Bug 1065892 Opened 10 years ago Updated 2 years ago

###!!! ASSERTION: Wrong size for this Shmem!: 'Error', file ../../../gecko/ipc/glue/Shmem.cpp, line 459

Categories

(Core :: IPC, defect)

ARM
Gonk (Firefox OS)
defect

Tracking

()

People

(Reporter: jwwang, Unassigned)

References

Details

Found this error while enabling content/media/test/* on B2G emulator debug.
https://tbpl.mozilla.org/php/getParsedLog.php?id=47787527&tree=Try&full=1

07:30:12     INFO -  177 INFO TEST-START | /tests/content/media/test/test_bug465498.html
07:30:13     INFO -  [Child 768] WARNING: Failed to retarget HTML data delivery to the parser thread.: file ../../../gecko/parser/html/nsHtml5StreamParser.cpp, line 947
07:30:27     INFO -  [Parent 693] ###!!! ASSERTION: Wrong size for this Shmem!: 'Error', file ../../../gecko/ipc/glue/Shmem.cpp, line 459
07:30:27     INFO -  UNKNOWN [libxul.so +0x00586DC8]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0063090A]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0057D076]
07:30:27     INFO -  UNKNOWN [libxul.so +0x00581E82]
07:30:27     INFO -  UNKNOWN [libxul.so +0x00581F58]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0034B842]
07:30:27     INFO -  UNKNOWN [libxul.so +0x005781AC]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0056BF78]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0056F52A]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0057017C]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0056EF04]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0056F07A]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0056F092]
07:30:27     INFO -  UNKNOWN [libxul.so +0x0057288E]
07:30:27     INFO -  UNKNOWN [libxul.so +0x00562814]
07:30:27     INFO -  __thread_entry+0x00000034 [libc.so +0x00012E4C]
07:30:27     INFO -  pthread_create+0x000000B8 [libc.so +0x0001299C]
07:30:55     INFO -  [Child 768] WARNING: stride not available, assuming width: file ../../../../gecko/content/media/omx/OmxDecoder.cpp, line 586
07:30:55     INFO -  [Child 768] WARNING: slice height not available, assuming height: file ../../../../gecko/content/media/omx/OmxDecoder.cpp, line 591
07:30:55     INFO -  [Child 768] WARNING: rotation not available, assuming 0: file ../../../../gecko/content/media/omx/OmxDecoder.cpp, line 610
07:30:56     INFO -  [Child 768] WARNING: stride not available, assuming width: file ../../../../gecko/content/media/omx/OmxDecoder.cpp, line 586
07:30:56     INFO -  [Child 768] WARNING: slice height not available, assuming height: file ../../../../gecko/content/media/omx/OmxDecoder.cpp, line 591
07:30:56     INFO -  [Child 768] WARNING: rotation not available, assuming 0: file ../../../../gecko/content/media/omx/OmxDecoder.cpp, line 610
07:31:31     INFO -  178 INFO TEST-OK | /tests/content/media/test/test_bug465498.html | took 79522ms

I can also repro this issue locally. Here is GDB trace:
#0  mozilla::ipc::Shmem::OpenExisting (aDescriptor=<value optimized out>, aId=<value optimized out>, aProtect=true) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/glue/Shmem.cpp:459
#1  0x40e00882 in mozilla::layers::PImageBridgeParent::OnMessageReceived (this=0x4025b800, __msg=<value optimized out>) at /media/jwwang/DATA/codebase/b2gemu/objdir-central2/ipc/ipdl/PImageBridgeParent.cpp:472
#2  0x40d4cff6 in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0x4025b830, aMsg=...) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/glue/MessageChannel.cpp:1233
#3  0x40d51e02 in mozilla::ipc::MessageChannel::DispatchMessage (this=0x4025b830, aMsg=...) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/glue/MessageChannel.cpp:1115
#4  0x40d51ed8 in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=<value optimized out>) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/glue/MessageChannel.cpp:1098
#5  0x40b1b7c2 in DispatchToMethod<FdWatcher, void (FdWatcher::*)()> (this=<value optimized out>) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/tuple.h:383
#6  RunnableMethod<FdWatcher, void (FdWatcher::*)(), Tuple0>::Run (this=<value optimized out>) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/task.h:307
#7  0x40d4812c in mozilla::ipc::MessageChannel::RefCountedTask::Run (this=<value optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:411
#8  mozilla::ipc::MessageChannel::DequeueTask::Run (this=<value optimized out>) at ../../dist/include/mozilla/ipc/MessageChannel.h:428
#9  0x40d3bef8 in MessageLoop::RunTask (this=0x461ffdd4, task=0x47310580) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_loop.cc:357
#10 0x40d3f4aa in MessageLoop::DeferOrRunPendingTask (this=0x4025b830, pending_task=<value optimized out>) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_loop.cc:365
#11 0x40d400fc in MessageLoop::DoWork (this=0x461ffdd4) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_loop.cc:443
#12 0x40d3ee84 in base::MessagePumpDefault::Run (this=0x45cfae40, delegate=0x461ffdd4) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_pump_default.cc:34
#13 0x40d3effa in MessageLoop::RunInternal (this=0x461ffdd4) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_loop.cc:229
#14 0x40d3f012 in MessageLoop::RunHandler (this=0x461ffdd4) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_loop.cc:222
#15 MessageLoop::Run (this=0x461ffdd4) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/message_loop.cc:196
#16 0x40d4280e in base::Thread::ThreadMain (this=0x44a4b760) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/thread.cc:168
#17 0x40d32794 in ThreadFunc (closure=0x1) at /media/jwwang/DATA/codebase/mozilla-central2/ipc/chromium/src/base/platform_thread_posix.cc:39
#18 0x40076e4c in __thread_entry (func=0x40d3278d <ThreadFunc>, arg=0x44a4b760, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#19 0x4007699c in pthread_create (thread_out=<value optimized out>, attr=0xbeb595e0, start_routine=0x40d3278d <ThreadFunc>, arg=0x44a4b760) at bionic/libc/bionic/pthread.c:357
#20 0x00000000 in ?? ()
(gdb) p size
$1 = 115240
(gdb) p header->mSize
$2 = 0
(gdb) p header
$3 = (mozilla::ipc::Header *) 0x4711f000
(gdb) p * header
$4 = {mSize = 0, mUnsafe = 0, mMagic = '\000' <repeats 191 times>}

It is strange the content of |header| is all zeros.
It is unpleasant to see a bunch of such errors in Try logs. However, it doesn't seem to cause problem of media mochitests.
Hi Kyle,
Do you have any clue about this memory error?
Flags: needinfo?(khuey)
No.  Try Benoit?
Flags: needinfo?(khuey) → needinfo?(bjacob)
Hrm, I could have got it wrong when I added this assertion, and now it's a bit fuzzy, sorry. Maybe those two 'sizes' do not have to be equal after all? :bent or :bsmedberg, who were my reviewers back then, would know.
Flags: needinfo?(bjacob) → needinfo?(bent.mozilla)
Per comment 0:

(gdb) p * header
$4 = {mSize = 0, mUnsafe = 0, mMagic = '\000' <repeats 191 times>}
It is strange the content of |header| is all zeros.
Someone just needs to debug this.
Flags: needinfo?(bent.mozilla)
https://tbpl.mozilla.org/php/getParsedLog.php?id=49234284&full=1&branch=try#error0

It also happened on Ubuntu e10s. I guess remote iframe/tab is the key...
This creates a tremendous amount of logspam in B2G debug mochitest runs. Would be really nice if somebody could look into this as it makes going through logs pretty painful.
I ran mochitest from my local built emulator, there're chances that the received shmem has been deallocated already...
See Also: → 1127046
I managed to reproduce this under rr and in my case what is happening is that the Shmem is created and destroyed on the child side before the parent side has had time to receive the creation message and deserialize it. As a result the mSize part in the header is nulled out before the parent side maps the buffer and checks that the info corresponds, hence the assertion.

I don't know if the race can lead to very bad bugs since if this happens with layer textures it means that the parent side is never going to receive anything that uses the shmem.

TextureClient/Host is designed to prevent this problem but it relies on the idea that the texture data is really serialized in TextureData::Serialize. In the case of shmem, a message is sent under the hood as soon as the shmem is allocated and there is no mechanism in place to prevent this race from happening at the IPDL level.

TextureClient could use a "lower level" shmem type that don't do this kind of messaging under the hood since all of the proper handshakes are done at the TextureClient/Host level, and that would prevent the race but considering that I haven't seen this warning associated to serious bugs, I think there are more important things to spend time on at this point.
See Also: → 1235980
I found a way of reproducing this assertion pretty reliably in bug 1235980, FWIW.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.