NotificationService for e10s fennec initialized and used in different threads.

RESOLVED FIXED

Status

()

Core
IPC
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: romaxa, Assigned: romaxa)

Tracking

Trunk
Other
Linux
Points:
---

Firefox Tracking Flags

(fennec2.0a1+)

Details

Attachments

(1 attachment, 2 obsolete attachments)

1.39 KB, patch
Ben Turner (not reading bugmail, use the needinfo flag!)
: review+
Details | Diff | Splinter Review
(Assignee)

Description

8 years ago
On attempt to make full fix for bug 553904, we tried to add notification service initialization into MozillaChildThread, but this is causing some strange race condition problem.

We should try to fix it.
(Assignee)

Comment 1

8 years ago
Ok ,there is no race with MozillaChildThread, because it is not available...

But bug 553904 is reproducible again... seems after fixing bug 528146
#0  NotificationService::HasKey (map=..., source=...)
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:21
#1  0xb734dd3c in NotificationService::Notify (this=0x0, type=..., source=..., details=...)
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:88
#2  0xb7341cfb in Run (this=0xae90d5e0)
    at mozilla/ipc/chromium/src/chrome/common/child_process_host.cc:44
#3  0xb7321112 in MessageLoop::RunTask (this=0xb2cff2b8, task=0xae90d5e0)
    at mozilla/ipc/chromium/src/base/message_loop.cc:336
#4  0xb7321942 in MessageLoop::DeferOrRunPendingTask (this=0xb2cff2b8, pending_task=...)
    at mozilla/ipc/chromium/src/base/message_loop.cc:344
#5  0xb7323797 in MessageLoop::DoWork (this=0xb2cff2b8)
---Type <return> to continue, or q <return> to quit---
    at mozilla/ipc/chromium/src/base/message_loop.cc:444
#6  0xb7355e9c in base::MessagePumpLibevent::Run (this=0xb2d2e130, delegate=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_pump_libevent.cc:309
#7  0xb73217e9 in MessageLoop::RunInternal (this=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_loop.cc:216
#8  0xb732181c in MessageLoop::RunHandler (this=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_loop.cc:199
#9  0xb73218aa in MessageLoop::Run (this=0xb2cff2b8)
    at mozilla/ipc/chromium/src/base/message_loop.cc:173
#10 0xb7336811 in base::Thread::ThreadMain (this=0xb2d2d348)
    at mozilla/ipc/chromium/src/base/thread.cc:156
#11 0xb735699f in ThreadFunc (closure=0xb2d2d348)
    at mozilla/ipc/chromium/src/base/platform_thread_posix.cc:26
#12 0xb77281bf in start_thread () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#13 0xb4b4411e in clone () from /lib/libc.so.6
Blocks: 528146
(Assignee)

Comment 2

8 years ago
And NotificationService created in different thread...
#0  NotificationService::current ()
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:15
#1  0xb741cced in NotificationService (this=0xbfe5c6ec)
    at mozilla/ipc/chromium/src/chrome/common/notification_service.cc:32
#2  0xb67981d5 in XRE_InitChildProcess (aArgc=2, aArgv=0xbfe5d124, aProcess=GeckoProcessType_Content)
    at mozilla/toolkit/xre/nsEmbedFunctions.cpp:340
#3  0x08048fcb in main (argc=0, argv=0x0)
    at mozilla/ipc/app/MozillaRuntimeMain.cpp:87
(Assignee)

Updated

8 years ago
Summary: Race condition happens when notification service added to MozillaChildThread → NotificationService for e10s fennec initialized and used in different threads.
(Assignee)

Comment 3

8 years ago
Created attachment 445543 [details] [diff] [review]
No ideas where is better to initialize notification service
Attachment #445543 - Flags: review?(bent.mozilla)
(Assignee)

Comment 4

8 years ago
Here is the backtrace of notification task creation.
Breakpoint 1, ChildNotificationTask (this=0xb0f5a8b0, type=...)
    at ipc/chromium/src/chrome/common/child_process_host.cc:41
41	      printf(">>>>>>Func:%s::%d pid:%i, thre:'%i':'%s'\n", __FUNCTION__, __LINE__, getpid(), PlatformThread::CurrentId(), IntToString(PlatformThread::CurrentId()).c_str());
(gdb) bt
#0  ChildNotificationTask (this=0xb0f5a8b0, type=...)
    at ipc/chromium/src/chrome/common/child_process_host.cc:41
#1  ChildProcessHost::Notify (this=0xb0f5a8b0, type=...)
    at ipc/chromium/src/chrome/common/child_process_host.cc:134
#2  0xb7be0568 in ChildProcessHost::ListenerHook::OnChannelConnected (this=0xb0f5a8d4, peer_pid=26991)
    at ipc/chromium/src/chrome/common/child_process_host.cc:215
#3  0xb7bfff42 in IPC::Channel::ChannelImpl::ProcessIncomingMessages (this=0xb0f5b000)
    at ipc/chromium/src/chrome/common/ipc_channel_posix.cc:546
#4  0xb7c0043d in IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking (this=0xb0f5b000, fd=19)
    at ipc/chromium/src/chrome/common/ipc_channel_posix.cc:---Type <return> to continue, or q <return> to quit---
722
#5  0xb7bf4266 in base::MessagePumpLibevent::OnLibeventNotification (fd=19, flags=2, context=0xb0f5b000)
    at ipc/chromium/src/base/message_pump_libevent.cc:210
#6  0xb7ba57d7 in event_process_active (base=0xb360e400, flags=<value optimized out>)
    at ipc/chromium/src/third_party/libevent/event.c:385
#7  event_base_loop (base=0xb360e400, flags=<value optimized out>)
    at ipc/chromium/src/third_party/libevent/event.c:522
#8  0xb7bf44b0 in base::MessagePumpLibevent::Run (this=0xb362e130, delegate=0xb35ff2b8)
    at ipc/chromium/src/base/message_pump_libevent.cc:330
#9  0xb7bbfbf9 in MessageLoop::RunInternal (this=0xb35ff2b8)
    at ipc/chromium/src/base/message_loop.cc:218
#10 0xb7bbfc2c in MessageLoop::RunHandler (this=0xb35ff2b8)
    at ipc/chromium/src/base/message_loop.cc:200
#11 0xb7bbfcba in MessageLoop::Run (this=0xb35ff2b8)
---Type <return> to continue, or q <return> to quit---
Comment on attachment 445543 [details] [diff] [review]
No ideas where is better to initialize notification service

As I commented on IRC, the stack you've got here shows a task running on the wrong thread. I'm assuming the fix will be something unrelated to the notification service change here.
Attachment #445543 - Flags: review?(bent.mozilla) → review-
(Assignee)

Comment 6

8 years ago
Here is the backtrace with GTK build (latest e10s + latest e10s fennec) with enabled ipc plugins.

Content process crashes

0xb70062e4 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, ObserverList<NotificationObserver, false>*>, std::_Select1st<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::_M_begin (this=0x0)
    at /usr/include/c++/4.4/bits/stl_tree.h:488
488		  (this->_M_impl._M_header._M_parent);
(gdb) bt
#0  0xb70062e4 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, ObserverList<NotificationObserver, false>*>, std::_Select1st<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::_M_begin (this=0x0)
    at /usr/include/c++/4.4/bits/stl_tree.h:488
#1  0xb7005add in std::_Rb_tree<unsigned int, std::pair<unsigned int const, ObserverList<NotificationObserver, false>*>, std::_Select1st<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::find (this=0x0, __k=@0xb42fef4c)
    at /usr/include/c++/4.4/bits/stl_tree.h:1434
#2  0xb7005414 in std::map<unsigned int, ObserverList<NotificationObserver, false>*, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, ObserverList<NotificationObserver, false>*> > >::find (this=0x0, __x=@0xb42fef4c)
    at /usr/include/c++/4.4/bits/stl_map.h:674
#3  0xb700456c in NotificationService::HasKey (map=@0x0, source=@0xb42fefd8)
    at ipc/chromium/src/chrome/common/notification_service.cc:21
#4  0xb7004a4d in NotificationService::Notify (this=0x0, type={value = NotificationType::CHILD_PROCESS_HOST_CONNECTED}, 
    source=@0xb42ff04c, details=@0xb42ff048)
    at ipc/chromium/src/chrome/common/notification_service.cc:88
#5  0xb6ff189a in Run (this=0xad961be0)
    at ipc/chromium/src/chrome/common/child_process_host.cc:44
#6  0xb6fbc33e in MessageLoop::RunTask (this=0xb42ff1ec, task=0xad961be0)
    at ipc/chromium/src/base/message_loop.cc:336
#7  0xb6fbc3a7 in MessageLoop::DeferOrRunPendingTask (this=0xb42ff1ec, pending_task=@0xb42ff0cc)
    at ipc/chromium/src/base/message_loop.cc:344
#8  0xb6fbc77d in MessageLoop::DoWork (this=0xb42ff1ec)
---Type <return> to continue, or q <return> to quit---
    at ipc/chromium/src/base/message_loop.cc:444
#9  0xb7012300 in base::MessagePumpLibevent::Run (this=0xb43111f0, delegate=0xb42ff1ec)
    at ipc/chromium/src/base/message_pump_libevent.cc:309
#10 0xb6fbbe9b in MessageLoop::RunInternal (this=0xb42ff1ec)
    at ipc/chromium/src/base/message_loop.cc:216
#11 0xb6fbbe1b in MessageLoop::RunHandler (this=0xb42ff1ec)
    at ipc/chromium/src/base/message_loop.cc:199
#12 0xb6fbbdbf in MessageLoop::Run (this=0xb42ff1ec)
    at ipc/chromium/src/base/message_loop.cc:173
#13 0xb6fe0f47 in base::Thread::ThreadMain (this=0xb430d708)
    at ipc/chromium/src/base/thread.cc:156
#14 0xb701293e in ThreadFunc (closure=0xb430d708)
    at ipc/chromium/src/base/platform_thread_posix.cc:26
#15 0xb775c585 in start_thread (arg=0xb42ffb70) at pthread_create.c:300
#16 0xb57cd29e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
(gdb)
(Assignee)

Comment 7

8 years ago
Here is the MOZ_IPC_MESSAGE_LOG=1 output before crash:

[time:1275204729564457][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
[time:1275204729565702][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
pldhash: for the table at address 0xb09c9498, the given entrySize of 52 probably favors chaining over double hashing.
[time:1275204729600517][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
[time:1275204729606244][PContentProcessParent] Sending Msg_NotifyRemotePrefObserver([TODO])
Xlib:  extension "RANDR" missing on display ":0.0".
LoadPlugin() /usr/lib/mozilla/plugins/libflashplayer.so returned b44bf7e0
[time:1275204733696276][PPluginModuleParent] Sending Msg_NP_Initialize([TODO])
(Assignee)

Comment 8

8 years ago
Message posted from here:
http://mxr.mozilla.org/projects-central/source/electrolysis/ipc/chromium/src/chrome/common/child_process_host.cc#130

And io_loop taken from here in this case:
http://mxr.mozilla.org/projects-central/source/electrolysis/ipc/chromium/src/chrome/common/child_process_host.cc#125
(Assignee)

Comment 9

8 years ago
Created attachment 448267 [details] [diff] [review]
This seems to work fine.
Attachment #445543 - Attachment is obsolete: true
Attachment #448267 - Flags: review?(bent.mozilla)
Comment on attachment 448267 [details] [diff] [review]
This seems to work fine.

We should never be asking for the IO loop in the first place. Please just do:

 #ifdef CHROMIUM_MOZILLA_BUILD
   MessageLoop* loop = mozilla::ipc::ProcessChild::message_loop();
   loop->PostTask(
 #else
Attachment #448267 - Flags: review?(bent.mozilla) → review-
(Assignee)

Comment 11

8 years ago
Created attachment 448276 [details] [diff] [review]
comment + include.
Assignee: nobody → romaxa
Attachment #448267 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #448276 - Flags: review?(bent.mozilla)
Comment on attachment 448276 [details] [diff] [review]
comment + include.

Looks good!
Attachment #448276 - Flags: review?(bent.mozilla) → review+
Comment on attachment 448276 [details] [diff] [review]
comment + include.

Er. Wait... I take that back. Does this work on a clobber build? I think we can't include ipc/glue headers in ipc/chromium...
Attachment #448276 - Flags: review+ → review-
(Assignee)

Comment 14

8 years ago
? 
+#include "mozilla/ipc/ProcessChild.h"
 #include "mozilla/ipc/BrowserProcessSubThread.h"

BrowserProcessSubThread.h - this is also part of ipc/glue...
Comment on attachment 448276 [details] [diff] [review]
comment + include.

Ah, true... Maybe I'm just misremembering the way this works. Ok, let's give this a shot!
Attachment #448276 - Flags: review- → review+
(Assignee)

Updated

8 years ago
Keywords: checkin-needed
(Assignee)

Comment 16

8 years ago
http://hg.mozilla.org/mozilla-central/rev/704e77dcb303
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Lots of orange so I backed out:
http://hg.mozilla.org/mozilla-central/rev/b412ab506b1e
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 18

8 years ago
Windows failure:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1275458817.1275459675.1988.gz

Another failure
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1275458819.1275459305.32562.gz
(Assignee)

Comment 19

8 years ago
bent any ideas why it is working on linux, and does not work on win... is logic different there?
(Assignee)

Comment 20

7 years ago
Ok, I sent patch to try server again... also need to check if ti still needed after latest changes
(Assignee)

Comment 21

7 years ago
Ok, seems try server is happy with this patch now
(Assignee)

Comment 22

7 years ago
And patch is still needed, this patch fixing next case:
take fennec with this patch applied (try build http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1281130098.1281134004.5041.gz):

enable plugins, enable remote plugins.
run 
./fennec -url "http://www.communitymx.com/content/source/E5141/wmodeopaque.htm"

without this patch you will get just crash, with this patch page works fine
tracking-fennec: --- → ?

Updated

7 years ago
tracking-fennec: ? → 2.0a1+
(Assignee)

Comment 23

7 years ago
http://hg.mozilla.org/projects/electrolysis/rev/769df85cde9a
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.