Closed Bug 560630 Opened 14 years ago Closed 14 years ago

NotificationService for e10s fennec initialized and used in different threads.

Categories

(Core :: IPC, defect)

Other
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0a1+ ---

People

(Reporter: romaxa, Assigned: romaxa)

References

Details

Attachments

(1 file, 2 obsolete files)

On attempt to make full fix for bug 553904, we tried to add notification service initialization into MozillaChildThread, but this is causing some strange race condition problem.

We should try to fix it.
Ok ,there is no race with MozillaChildThread, because it is not available...

But bug 553904 is reproducible again... seems after fixing bug 528146
#0  NotificationService::HasKey (map=..., source=...)
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:21
#1  0xb734dd3c in NotificationService::Notify (this=0x0, type=..., source=..., details=...)
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:88
#2  0xb7341cfb in Run (this=0xae90d5e0)
    at mozilla/ipc/chromium/src/chrome/common/child_process_host.cc:44
#3  0xb7321112 in MessageLoop::RunTask (this=0xb2cff2b8, task=0xae90d5e0)
    at mozilla/ipc/chromium/src/base/message_loop.cc:336
#4  0xb7321942 in MessageLoop::DeferOrRunPendingTask (this=0xb2cff2b8, pending_task=...)
    at mozilla/ipc/chromium/src/base/message_loop.cc:344
#5  0xb7323797 in MessageLoop::DoWork (this=0xb2cff2b8)
---Type <return> to continue, or q <return> to quit---
    at mozilla/ipc/chromium/src/base/message_loop.cc:444
#6  0xb7355e9c in base::MessagePumpLibevent::Run (this=0xb2d2e130, delegate=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_pump_libevent.cc:309
#7  0xb73217e9 in MessageLoop::RunInternal (this=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_loop.cc:216
#8  0xb732181c in MessageLoop::RunHandler (this=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_loop.cc:199
#9  0xb73218aa in MessageLoop::Run (this=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_loop.cc:173
#10 0xb7336811 in base::Thread::ThreadMain (this=0xb2d2d348)
    at mozilla/ipc/chromium/src/base/thread.cc:156
#11 0xb735699f in ThreadFunc (closure=0xb2d2d348)
    at mozilla/ipc/chromium/src/base/platform_thread_posix.cc:26
#12 0xb77281bf in start_thread () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#13 0xb4b4411e in clone () from /lib/libc.so.6
Blocks: 528146
And NotificationService created in different thread...
#0  NotificationService::current ()
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:15
#1  0xb741cced in NotificationService (this=0xbfe5c6ec)
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:32
#2  0xb67981d5 in XRE_InitChildProcess (aArgc=2, aArgv=0xbfe5d124, aProcess=GeckoProcessType_Content)
    at mozilla/toolkit/xre/nsEmbedFunctions.cpp:340
#3  0x08048fcb in main (argc=0, argv=0x0)
    at mozilla/ipc/app/MozillaRuntimeMain.cpp:87
Summary: Race condition happens when notification service added to MozillaChildThread → NotificationService for e10s fennec initialized and used in different threads.
Attachment #445543 - Flags: review?(bent.mozilla)
Here is the backtrace of notification task creation.
Breakpoint 1, ChildNotificationTask (this=0xb0f5a8b0, type=...)
    at ipc/chromium/src/chrome/common/child_process_host.cc:41
41	      printf(">>>>>>Func:%s::%d pid:%i, thre:'%i':'%s'\n", __FUNCTION__, __LINE__, getpid(), PlatformThread::CurrentId(), IntToString(PlatformThread::CurrentId()).c_str());
(gdb) bt
#0  ChildNotificationTask (this=0xb0f5a8b0, type=...)
    at ipc/chromium/src/chrome/common/child_process_host.cc:41
#1  ChildProcessHost::Notify (this=0xb0f5a8b0, type=...)
    at ipc/chromium/src/chrome/common/child_process_host.cc:134
#2  0xb7be0568 in ChildProcessHost::ListenerHook::OnChannelConnected (this=0xb0f5a8d4, peer_pid=26991)
    at ipc/chromium/src/chrome/common/child_process_host.cc:215
#3  0xb7bfff42 in IPC::Channel::ChannelImpl::ProcessIncomingMessages (this=0xb0f5b000)
    at ipc/chromium/src/chrome/common/ipc_channel_posix.cc:546
#4  0xb7c0043d in IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking (this=0xb0f5b000, fd=19)
    at ipc/chromium/src/chrome/common/ipc_channel_posix.cc:---Type <return> to continue, or q <return> to quit---
722
#5  0xb7bf4266 in base::MessagePumpLibevent::OnLibeventNotification (fd=19, flags=2, context=0xb0f5b000)
    at ipc/chromium/src/base/message_pump_libevent.cc:210
#6  0xb7ba57d7 in event_process_active (base=0xb360e400, flags=<value optimized out>)
    at ipc/chromium/src/third_party/libevent/event.c:385
#7  event_base_loop (base=0xb360e400, flags=<value optimized out>)
    at ipc/chromium/src/third_party/libevent/event.c:522
#8  0xb7bf44b0 in base::MessagePumpLibevent::Run (this=0xb362e130, delegate=0xb35ff2b8)
    at ipc/chromium/src/base/message_pump_libevent.cc:330
#9  0xb7bbfbf9 in MessageLoop::RunInternal (this=0xb35ff2b8)
    at ipc/chromium/src/base/message_loop.cc:218
#10 0xb7bbfc2c in MessageLoop::RunHandler (this=0xb35ff2b8)
    at ipc/chromium/src/base/message_loop.cc:200
#11 0xb7bbfcba in MessageLoop::Run (this=0xb35ff2b8)
---Type <return> to continue, or q <return> to quit---
Comment on attachment 445543 [details] [diff] [review]
No ideas where is better to initialize notification service

As I commented on IRC, the stack you've got here shows a task running on the wrong thread. I'm assuming the fix will be something unrelated to the notification service change here.
Attachment #445543 - Flags: review?(bent.mozilla) → review-
Here is the backtrace with GTK build (latest e10s + latest e10s fennec) with enabled ipc plugins.

Content process crashes

0xb70062e4 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, ObserverList<NotificationObserver, false>*>, std::_Select1st<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::_M_begin (this=0x0)
    at /usr/include/c++/4.4/bits/stl_tree.h:488
488		  (this->_M_impl._M_header._M_parent);
(gdb) bt
#0  0xb70062e4 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, ObserverList<NotificationObserver, false>*>, std::_Select1st<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::_M_begin (this=0x0)
    at /usr/include/c++/4.4/bits/stl_tree.h:488
#1  0xb7005add in std::_Rb_tree<unsigned int, std::pair<unsigned int const, ObserverList<NotificationObserver, false>*>, std::_Select1st<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::find (this=0x0, __k=@0xb42fef4c)
    at /usr/include/c++/4.4/bits/stl_tree.h:1434
#2  0xb7005414 in std::map<unsigned int, ObserverList<NotificationObserver, false>*, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::find (this=0x0, __x=@0xb42fef4c)
    at /usr/include/c++/4.4/bits/stl_map.h:674
#3  0xb700456c in NotificationService::HasKey (map=@0x0, source=@0xb42fefd8)
    at ipc/chromium/src/chrome/common/notification_service.cc:21
#4  0xb7004a4d in NotificationService::Notify (this=0x0, type={value = NotificationType::CHILD_PROCESS_HOST_CONNECTED}, 
    source=@0xb42ff04c, details=@0xb42ff048)
    at ipc/chromium/src/chrome/common/notification_service.cc:88
#5  0xb6ff189a in Run (this=0xad961be0)
    at ipc/chromium/src/chrome/common/child_process_host.cc:44
#6  0xb6fbc33e in MessageLoop::RunTask (this=0xb42ff1ec, task=0xad961be0)
    at ipc/chromium/src/base/message_loop.cc:336
#7  0xb6fbc3a7 in MessageLoop::DeferOrRunPendingTask (this=0xb42ff1ec, pending_task=@0xb42ff0cc)
    at ipc/chromium/src/base/message_loop.cc:344
#8  0xb6fbc77d in MessageLoop::DoWork (this=0xb42ff1ec)
---Type <return> to continue, or q <return> to quit---
    at ipc/chromium/src/base/message_loop.cc:444
#9  0xb7012300 in base::MessagePumpLibevent::Run (this=0xb43111f0, delegate=0xb42ff1ec)
    at ipc/chromium/src/base/message_pump_libevent.cc:309
#10 0xb6fbbe9b in MessageLoop::RunInternal (this=0xb42ff1ec)
    at ipc/chromium/src/base/message_loop.cc:216
#11 0xb6fbbe1b in MessageLoop::RunHandler (this=0xb42ff1ec)
    at ipc/chromium/src/base/message_loop.cc:199
#12 0xb6fbbdbf in MessageLoop::Run (this=0xb42ff1ec)
    at ipc/chromium/src/base/message_loop.cc:173
#13 0xb6fe0f47 in base::Thread::ThreadMain (this=0xb430d708)
    at ipc/chromium/src/base/thread.cc:156
#14 0xb701293e in ThreadFunc (closure=0xb430d708)
    at ipc/chromium/src/base/platform_thread_posix.cc:26
#15 0xb775c585 in start_thread (arg=0xb42ffb70) at pthread_create.c:300
#16 0xb57cd29e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
(gdb)
Here is the MOZ_IPC_MESSAGE_LOG=1 output before crash:

[time:1275204729564457][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
[time:1275204729565702][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
pldhash: for the table at address 0xb09c9498, the given entrySize of 52 probably favors chaining over double hashing.
[time:1275204729600517][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
[time:1275204729606244][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
Xlib:  extension "RANDR" missing on display ":0.0".
LoadPlugin() /usr/lib/mozilla/plugins/libflashplayer.so returned b44bf7e0
[time:1275204733696276][PPluginModuleParent] Sending Msg_NP_Initialize([TODO])
Attached patch This seems to work fine. (obsolete) — Splinter Review
Attachment #445543 - Attachment is obsolete: true
Attachment #448267 - Flags: review?(bent.mozilla)
Comment on attachment 448267 [details] [diff] [review]
This seems to work fine.

We should never be asking for the IO loop in the first place. Please just do:

 #ifdef CHROMIUM_MOZILLA_BUILD
   MessageLoop* loop = mozilla::ipc::ProcessChild::message_loop();
   loop->PostTask(
 #else
Attachment #448267 - Flags: review?(bent.mozilla) → review-
Assignee: nobody → romaxa
Attachment #448267 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #448276 - Flags: review?(bent.mozilla)
Comment on attachment 448276 [details] [diff] [review]
comment + include.

Looks good!
Attachment #448276 - Flags: review?(bent.mozilla) → review+
Comment on attachment 448276 [details] [diff] [review]
comment + include.

Er. Wait... I take that back. Does this work on a clobber build? I think we can't include ipc/glue headers in ipc/chromium...
Attachment #448276 - Flags: review+ → review-
? 
+#include "mozilla/ipc/ProcessChild.h"
 #include "mozilla/ipc/BrowserProcessSubThread.h"

BrowserProcessSubThread.h - this is also part of ipc/glue...
Comment on attachment 448276 [details] [diff] [review]
comment + include.

Ah, true... Maybe I'm just misremembering the way this works. Ok, let's give this a shot!
Attachment #448276 - Flags: review- → review+
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/704e77dcb303
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Lots of orange so I backed out:
http://hg.mozilla.org/mozilla-central/rev/b412ab506b1e
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
bent any ideas why it is working on linux, and does not work on win... is logic different there?
Ok, I sent patch to try server again... also need to check if ti still needed after latest changes
Ok, seems try server is happy with this patch now
And patch is still needed, this patch fixing next case:
take fennec with this patch applied (try build http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1281130098.1281134004.5041.gz):

enable plugins, enable remote plugins.
run 
./fennec -url "http://www.communitymx.com/content/source/E5141/wmodeopaque.htm"

without this patch you will get just crash, with this patch page works fine
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0a1+
http://hg.mozilla.org/projects/electrolysis/rev/769df85cde9a
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: