Closed
Bug 869167
Opened 11 years ago
Closed 7 years ago
Mac: Hang when ASan-built plugin is used in non-ASan build
Categories
(Core :: IPC, defect, P3)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: mwobensmith, Unassigned, Mentored)
Details
(Whiteboard: [lang=c++][good next bug])
Attachments
(4 files, 3 obsolete files)
1. Install Test.plugin from attachment in Mac plugins folder: ~/Library/Internet Plugins/ 2. Open bug file, which contains only an empty embed. Result: Hangs the browser Stack: https://crash-stats.mozilla.com/report/index/257aadcc-6adc-4881-a698-5e1f02130506
Reporter | ||
Comment 1•11 years ago
|
||
Reporter | ||
Comment 2•11 years ago
|
||
Obviously a super corner case, but talking to John Schoenick, he suggested I file it. We shouldn't be hanging here.
Comment 3•11 years ago
|
||
Indeed, worth looking into why we're hanging here - especially if it's an IPC fail relating to the plugin child failing to spawn properly (which could happen in other cases)
Reporter | ||
Comment 4•11 years ago
|
||
Also, the crash only happens when we force quit. This is a sample of the stack taken during the hang itself.
Comment 5•11 years ago
|
||
Could you also attach an all-thread gdb stack trace of the hang?
Reporter | ||
Comment 6•11 years ago
|
||
Steven, I can't seem to get a stack trace via gdb while it's running. Perhaps give it a go on your end and see if you can repro?
Comment 7•11 years ago
|
||
Comment on attachment 746044 [details]
ASan-built plugin, Mac
This is corrupt. All of its files are soft links to files that don't exist (except on the system where you build it).
One way around this is to do a universal build and then look in [objdir]...universal/test-package-stage/bin/plugins.
Comment 8•11 years ago
|
||
There are some regular asan builds maintained at https://people.mozilla.com/~choller/firefox/asan/ I grabbed the Test.plugin from one of these (its in the -tests.zip) and attached I tried this same scenario in linux -- there we crash in ScanPluginsDirectory, due to a segfault trying to dlopen() the asan plugin. This is less of an issue -- I'm more concerned that plugins doing something bad on startup can cause firefox to hand indefinitely waiting for a fubar plugin-container
Attachment #746044 -
Attachment is obsolete: true
Comment 9•11 years ago
|
||
This is only an indeterminate hang with debug builds, due to processLaunchTimeoutSecs being 0 there. We could possibly try to monitor the startup better though instead of having to time out on a crash on loading the module.
Priority: -- → P3
Comment 10•11 years ago
|
||
FWIW, this is hanging with the expected stack for plugin-container launch timeouts: [...] mozilla::ipc::GeckoChildProcessHost::SyncLaunch(this=0x000000011fc83f00, aExtraOpts=0x00007fff5fbfc6b8, aTimeoutMs=0, arch=PROCESS_ARCH_X86_64) + 339 at GeckoChildProcessHost.cpp:304 mozilla::plugins::PluginProcessParent::Launch(this=0x000000011fc83f00, timeoutMs=0) + 438 at PluginProcessParent.cpp:78 mozilla::plugins::PluginModuleParent::LoadModule(aFilePath=0x00000001127f15c8) + 163 at PluginModuleParent.cpp:90 GetNewPluginLibrary(aPluginTag=0x0000000112493e80) + 87 at nsNPAPIPlugin.cpp:395 nsNPAPIPlugin::CreatePlugin(aPluginTag=0x0000000112493e80, aResult=0x00007fff5fbfc9f0) + 152 at nsNPAPIPlugin.cpp:416 [...]
Comment 11•11 years ago
|
||
I did a quick check - we only create the crash reporter for the process after the child launched and loaded the plugin module: http://hg.mozilla.org/mozilla-central/annotate/b980d32c366f/dom/plugins/ipc/PluginModuleParent.cpp#l85 ... so we can't catch crashes during plugin module loading and instead time out on them. I wonder if we could just switch to starting the process, creating the crash reporter, then loading the plugin module. At least the Flash process injector shouldn't be affected by such a change.
Comment 12•11 years ago
|
||
Whether or not we have the crashreporter hooked up, SyncLaunch should immediately return if the launched process exits prematurely or crashes. I do support moving the DLL load after the synclaunch, but I don't think it's necessary nor the best way to fix this bug. From the sample, it appears that GeckoChildProcessHost::PerformAsyncLaunchInternal is not on the stack. This means that somehow it completed without notifying mMonitor at http://hg.mozilla.org/mozilla-central/annotate/e19d0885977c/ipc/glue/GeckoChildProcessHost.cpp#l758. And indeed, there are a bunch of early-returns starting at http://hg.mozilla.org/mozilla-central/annotate/e19d0885977c/ipc/glue/GeckoChildProcessHost.cpp#l666 which skip the monitor *and* don't set mProcessState = PROCESS_ERROR.
Component: Plug-ins → IPC
Keywords: helpwanted
Whiteboard: [mentor=bsmedberg][lang=c++][good next bug]
Comment 13•11 years ago
|
||
Hi I am interested in working on this bug,but it's my first time to work on with debug,can anybody guide me on how to get started with it?Thanks a lot.
Comment 14•11 years ago
|
||
Do you have a working debug build already? There are build instructions here: https://developer.mozilla.org/en-US/docs/Developer_Guide/Build_Instructions You could also get help with initial problems on the #introduction or #developer IRC channels: https://wiki.mozilla.org/IRC
Comment 15•11 years ago
|
||
I have a working debug build and would like to work on this bug. This is my first time, however. Could you please help me?
Comment 16•11 years ago
|
||
Great, have you been able to reproduce the hang with the test plugin from comment 8?
Comment 17•11 years ago
|
||
Yes, it hangs. I get this in my Terminal window: [...] ++DOMWINDOW == 16 (0x10cff0cb0) [serial = 20] [outer = 0x111fea8b0] WARNING: NS_ENSURE_TRUE(aURI) failed: file /users/andrey/FF/mozilla-central/caps/src/nsScriptSecurityManager.cpp, line 1913 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /users/andrey/FF/mozilla-central/extensions/cookie/nsPermissionManager.cpp, line 979 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /users/andrey/FF/mozilla-central/extensions/permissions/nsContentBlocker.cpp, line 250 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /users/andrey/FF/mozilla-central/extensions/permissions/nsContentBlocker.cpp, line 212 WARNING: NS_ENSURE_TRUE(currentURI) failed: file /users/andrey/FF/mozilla-central/content/base/src/ThirdPartyUtil.cpp, line 96 For application/x-test found plugin Test.plugin WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /users/andrey/FF/mozilla-central/xpcom/base/nsTraceRefcntImpl.cpp, line 141 WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /users/andrey/FF/mozilla-central/xpcom/base/nsTraceRefcntImpl.cpp, line 141 dyld: lazy symbol binding failed: Symbol not found: ___asan_init Referenced from: /Library/Internet Plug-Ins/Test.plugin/Contents/MacOS/libnptest.dylib Expected in: flat namespace dyld: Symbol not found: ___asan_init Referenced from: /Library/Internet Plug-Ins/Test.plugin/Contents/MacOS/libnptest.dylib Expected in: flat namespace [...]
Comment 18•11 years ago
|
||
Next you could investigate what is happening here, e.g. by setting up debugging [1], breaking in GeckoChildProcessHost::PerformAsyncLaunchInternal() and see if it fails in there and exits that function before setting an error [2]. Also, feel free to ping us on IRC. [1] https://developer.mozilla.org/en-US/docs/Debugging_on_Mac_OS_X [2] http://hg.mozilla.org/mozilla-central/annotate/e19d0885977c/ipc/glue/GeckoChildProcessHost.cpp#l758
Comment 19•11 years ago
|
||
Thank you! Is it ok that I'm writing it here, or should I switch to IRC? I've done that. The function returns false value here [1], before the line 758. It waits for the message for 10 seconds, does not get it, and returns false. [1] http://hg.mozilla.org/mozilla-central/annotate/e19d0885977c/ipc/glue/GeckoChildProcessHost.cpp#l669
Comment 20•11 years ago
|
||
No need to switch to IRC, just another option :) So - per comment 12, those places (i.e. the early return you hit and the ones after it) also need the error handling that we have in [1]. If you add that, the permanent hang should go away. I'd personally prefer to move the part where this happens into a separate function so that the error handling doesn't need to be written 5 times. http://hg.mozilla.org/mozilla-central/annotate/e19d0885977c/ipc/glue/GeckoChildProcessHost.cpp#l757
Comment 21•11 years ago
|
||
Here is a patch. Could you please review it?
Attachment #759395 -
Flags: review?(georg.fritzsche)
Comment 22•11 years ago
|
||
Comment on attachment 759395 [details] [diff] [review] A patch Review of attachment 759395 [details] [diff] [review]: ----------------------------------------------------------------- Maybe it would be better to move the code in the |#ifdef MOZ_WIDGET_COCOA| into a separate function, something like MachSetupPorts()? Then we could just handle it returning false in one place. bsmedberg will most likely review this, but i'm happy to give feedback :) ::: ipc/glue/GeckoChildProcessHost.cpp @@ -770,5 @@ > OpenPrivilegedHandle(base::GetProcId(process)); > { > - MonitorAutoLock lock(mMonitor); > - mProcessState = PROCESS_CREATED; > - lock.Notify(); Note that this is using PROCESS_CREATED, not PROCESS_ERROR like the rest.
Attachment #759395 -
Flags: review?(georg.fritzsche)
Comment 23•11 years ago
|
||
Thank you for your feedback! I have fixed my patch.
Attachment #759395 -
Attachment is obsolete: true
Attachment #759467 -
Flags: review?
Updated•11 years ago
|
Attachment #759467 -
Flags: review? → review?(benjamin)
Updated•11 years ago
|
Assignee: nobody → andzaytsev
Comment 24•11 years ago
|
||
I have pulled the latest version of code from the repository, and the patch seems to stop working. Could anybody please confirm this?
Comment 25•11 years ago
|
||
Comment on attachment 759467 [details] [diff] [review] A new patch Review of attachment 759467 [details] [diff] [review]: ----------------------------------------------------------------- A bit of quick feedback on this from my end. ::: ipc/glue/GeckoChildProcessHost.cpp @@ +474,5 @@ > +GeckoChildProcessHost::MachSetupPorts(std::string mach_connection_name) > +{ > + // Wait for the child process to send us its 'task_t' data. > + const int kTimeoutMs = 10000; > + You have a few whitespaces left in the patch, easily spotted if you follow the "review" link for the patch. @@ +483,5 @@ > + if (err != KERN_SUCCESS) { > + std::string errString = StringPrintf("0x%x %s", err, mach_error_string(err)); > + LOG(ERROR) << "parent WaitForMessage() failed: " << errString; > + NotifyMonitor(); > + return false; Note that the function returns task_t, so you would want to return TASK_NULL or a nullptr here. @@ +713,5 @@ > #endif > false, &process, arch); > > #ifdef MOZ_WIDGET_COCOA > + task_t child_task = MachSetupPorts(mach_connection_name); Sorry, i realize now that my name suggestion wasn't so great - maybe MachGetChildTask() would be better? @@ +719,1 @@ > return false; I think you can just do the monitor notify and setting mProcessState here directly instead of multiple times in MachSetupPorts().
Comment 26•11 years ago
|
||
(In reply to Andrey Zaytsev from comment #24) > I have pulled the latest version of code from the repository, and the patch > seems to stop working. Could anybody please confirm this? Does the patch still apply or were any sections rejected? Maybe something changed in the parts your patch addresses?
Comment 27•11 years ago
|
||
Comment on attachment 759467 [details] [diff] [review] A new patch Review of attachment 759467 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/GeckoChildProcessHost.cpp @@ +470,5 @@ > + lock.Notify(); > +} > + > +task_t > +GeckoChildProcessHost::MachSetupPorts(std::string mach_connection_name) Also, this function should be in a |#ifdef MOZ_WIDGET_COCOA| as well - this will (and should) not compile for e.g. Windows.
Comment 28•11 years ago
|
||
Comment on attachment 759467 [details] [diff] [review] A new patch I don't think that "NotifyMonitor" is a good method name for this. Because this is specifically about an error, it should be "ErrorAndNotifyMonitor" or something more descriptive. Also, I wonder if it wouldn't make more sense for the caller of MachSetupPorts to call Notify, instead of doing it several times within that method?
Attachment #759467 -
Flags: review?(benjamin) → review-
Comment 29•11 years ago
|
||
Notifying monitor before returning false does not fix the hang anymore: MonitorAutoLock lock(mMonitor); mProcessState = PROCESS_ERROR; lock.Notify(); I have cloned the last revision from the repository and compiled it, adding those three lines before return false. Browser still hangs when I open the bug file.
Comment 30•11 years ago
|
||
Have you tried to debug it again? Does it still behave the same in PerformAsyncLaunchInternal? And where does it hang now? And with what callstack (e.g. enter "bt" in the Xcode output window)?
Comment 31•11 years ago
|
||
It is very strange: when I debug it under Xcode, it does not hang. It executes those 3 lines of code, exits the function and continue running. However, when I run it from my terminal it hangs.
Comment 32•11 years ago
|
||
Have you tried verifying that it runs through that code by adding some logging (e.g. just some printf()s to see what happens)? When you attach with Xcode after it hangs and pause it, does it still hang with the same stack as in comment 10?
Comment 33•11 years ago
|
||
I've added two printf()s: before [1], and inside 'if', before 'return false'. In Xcode both of them are executed, however when I run the browser with: [./mach run -P dbg], only the first one is executed. It seems to run differently without Xcode. Is it ok? [1] https://hg.mozilla.org/mozilla-central/annotate/e19d0885977c/ipc/glue/GeckoChildProcessHost.cpp#l663
Comment 34•11 years ago
|
||
It's possible that it runs differently without a debugger attached, but it would be interesting to find where it hangs in this case. Maybe you can find out where it exits that function? And if you attach with Xcode once it hangs, pause it and look at the callstack... does it still look like the hang in comment 10?
Comment 35•11 years ago
|
||
The problem was in a different function. There was an infinite loop. Should I still add NotifyMonitor to those ifs?
Attachment #759467 -
Attachment is obsolete: true
Attachment #763855 -
Flags: review?(benjamin)
Comment 36•11 years ago
|
||
(In reply to Andrey Zaytsev from comment #35) > The problem was in a different function. There was an infinite loop. Ah, that is expected because the pref dom.ipc.plugins.processLaunchTimeoutSecs is set to 0 in debug builds. However, we shouldn't hang indefinitely there in debug builds - at some point mProcessState should be set to PROCESS_ERROR or PROCESS_CONNECTED so that the while condition - (mProcessState < PROCESS_CONNECTED) - should evaluate to false. Have you found out where it leaves PerformAsyncLaunchInternal() in that scenario? Is mProcessState set properly there and the monitor notified?
Updated•11 years ago
|
Attachment #763855 -
Flags: review?(benjamin) → review+
Comment 37•11 years ago
|
||
So given a quick reading of this code, I don't understand why this patch is necessary. In the case of an infinite timeout, the monitor should only notify when the process has either been launched successfully or died, in which case mProcessState should be set. What is actually going on that causes this patch to fix something important?
Comment 38•11 years ago
|
||
The loop is infinite when: timeoutTicks == PR_INTERVAL_NO_TIMEOUT
Comment 39•11 years ago
|
||
It leaves PerformAsyncLaunchInternal() on the last line "return true". So everything works correctly there.
Comment 40•11 years ago
|
||
I don't understand why the loop is infinite. Are you saying that mProcessState is still < PROCESS_CONNECTED but the lock isn't being notified? Why isn't the lock being notified?
Comment 41•11 years ago
|
||
@bsmedberg could this be a dupe of bug 811636 ?
Comment 42•11 years ago
|
||
It looks like everything works fine in PerformAsyncLaunchInternal(). The loop should be fixed anyway, shouldn't it?
Updated•11 years ago
|
Keywords: helpwanted → checkin-needed
Comment 43•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/09274cc38453
Flags: in-testsuite-
Keywords: checkin-needed
Comment 44•11 years ago
|
||
Backed out for OSX test crashes. https://hg.mozilla.org/integration/mozilla-inbound/rev/0c3f3de7638b https://tbpl.mozilla.org/php/getParsedLog.php?id=26651998&tree=Mozilla-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=26651962&tree=Mozilla-Inbound
Comment 45•11 years ago
|
||
And Windows.
Comment 46•11 years ago
|
||
So yeah, this turned out to be debug-only bustage on all platforms.
Assignee | ||
Updated•10 years ago
|
Mentor: benjamin
Whiteboard: [mentor=bsmedberg][lang=c++][good next bug] → [lang=c++][good next bug]
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•