Closed Bug 1305159 Opened 8 years ago Closed 8 years ago

[10.12] usage of deprecated NSAddImage causes lldb to stall forever

Categories

(Core :: XPCOM, defect, P1)

52 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: drno, Assigned: drno)

References

Details

Attachments

(2 files)

When I execute commands like './mach run --debug' or './mach gtest --debug' lldb gets started, but after issuing the 'run' command the loading stall forever in the attached stack trace.
Attached file stack.txt
Stack trace when hitting CTRL+C while lldb stalls.
Current guess is that we still use NSAddImage, an API which has been deprecated since 2007. Trying to replace NSSAddImage() with the dlopen() code we have for Linux appears to load the libs, but fails somewhere else.
So if I take out the OSX code which tries to use NSAddImage() etc and let it simply use the Linux(?) code instead which uses dlopen() and dlsym() all libs appear to get loaded, but dlsym() fails to locate _NS_GetFrozenFunctions in the loaded XUL lib, even though nm clearly shows that XUL has that symbol.
The warnings output I get when compiling nsXPCOMGlue.cpp

0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSAddImage' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:96:29: warning: 'NSAddImage' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94   LibHandleType libHandle = NSAddImage(aDependentLib,
 0:11.94                             ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:223:34: note: 'NSAddImage' has been explicitly marked deprecated here
 0:11.94 extern const struct mach_header* NSAddImage(const char* image_name, uint32_t options) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.94                                  ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSLinkEditError' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:104:5: warning: 'NSLinkEditError' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94     NSLinkEditError(&linkEditError, &errorNum, &fileName, &errorString);
 0:11.94     ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:210:13: note: 'NSLinkEditError' has been explicitly marked deprecated here
 0:11.94 extern void NSLinkEditError(NSLinkEditErrors *c, int *errorNumber, const char** fileName, const char** errorString) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.94             ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSLookupSymbolInImage' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:122:11: warning: 'NSLookupSymbolInImage' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94     sym = NSLookupSymbolInImage(aLibHandle, aSymbol,
 0:11.94           ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:175:17: note: 'NSLookupSymbolInImage' has been explicitly marked deprecated here
 0:11.94 extern NSSymbol NSLookupSymbolInImage(const struct mach_header* image, const char* symbolName, uint32_t options) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.94                 ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSIsSymbolNameDefined' is deprecated: first deprecated in macOS 10.4
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:126:9: warning: 'NSIsSymbolNameDefined' is deprecated: first deprecated in macOS 10.4 [-Wdeprecated-declarations]
 0:11.94     if (NSIsSymbolNameDefined(aSymbol)) {
 0:11.94         ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:169:17: note: 'NSIsSymbolNameDefined' has been explicitly marked deprecated here
 0:11.94 extern bool     NSIsSymbolNameDefined(const char* symbolName)                                                    __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_4,__IPHONE_NA,__IPHONE_NA);
 0:11.94                 ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSLookupAndBindSymbol' is deprecated: first deprecated in macOS 10.4
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:127:13: warning: 'NSLookupAndBindSymbol' is deprecated: first deprecated in macOS 10.4 [-Wdeprecated-declarations]
 0:11.94       sym = NSLookupAndBindSymbol(aSymbol);
 0:11.94             ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:172:17: note: 'NSLookupAndBindSymbol' has been explicitly marked deprecated here
 0:11.94 extern NSSymbol NSLookupAndBindSymbol(const char* symbolName)                                                    __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_4,__IPHONE_NA,__IPHONE_NA);
 0:11.94                 ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSAddressOfSymbol' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:134:21: warning: 'NSAddressOfSymbol' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94   return (NSFuncPtr)NSAddressOfSymbol(sym);
 0:11.95                     ^
 0:11.95 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:181:21: note: 'NSAddressOfSymbol' has been explicitly marked deprecated here
 0:11.95 extern void *       NSAddressOfSymbol(NSSymbol symbol) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.95                     ^
 0:11.96 cd legacydb; /Applications/Xcode.app/Contents/Developer/usr/bin/make libs
 0:12.00 6 warnings generated.
'./mach run' works as expected and even though it also calls NSAddImage() at the same place (verified via a printf) it does not hang as it does when running in lldb.
If I let dlsym() search for 'NS_GetFrozenFunctions' instead of '_NS_GetFrozenFunctions' it works. But then fails to locate the XRE functions next.
Marking it P1 because I assume this prevents every developer on 10.12 from using any 'mach' command with --debug.
Priority: -- → P1
Attachment #8794435 - Flags: review?(nfroyd)
Attachment #8794435 - Flags: review?(michelangelo)
Attachment #8794435 - Flags: review?(erahm)
Attachment #8794435 - Flags: review?(doug.turner)
Attachment #8794435 - Flags: review?(benjamin)
Marked all module peers, owner and emeritus for review in the hope that who ever gets to it first can quickly review this.
Summary: [10.12] firefox stalls forever when started in lldb → [10.12] usage of deprecated NSAddImage causes lldb to stall forever
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

https://reviewboard.mozilla.org/r/80906/#review79702

I'm a fan of using more cross-platform-y APIs when possible...but does this actually solve all your problems?  Comment 7 makes it sound like more than this patch is needed.
Attachment #8794435 - Flags: review?(nfroyd) → review+
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

You should also verify with your try run that OSX 10.10 debug M-2, M-e10s-4, and the Windows R-e10s crashes (R-e10s and JSReftest) aren't due to your patch via retriggers.
Attachment #8794435 - Flags: review?(erahm)
Attachment #8794435 - Flags: review?(doug.turner)
Attachment #8794435 - Flags: review?(benjamin)
In the past, it was required to use NSADDIMAGE_OPTION_MATCH_FILENAME_BY_INSTALLNAME so that dynamic XPCOM components (in particular libbrowsercomps) would find dependent libraries correctly.

Has that changed?

Or we could wait until we fold browsercomps back in...
(In reply to Nathan Froyd [:froydnj] from comment #11)
> I'm a fan of using more cross-platform-y APIs when possible...but does this
> actually solve all your problems?  Comment 7 makes it sound like more than
> this patch is needed.

If you look closely at the patch, the original code actually adds the leading '_' to all function names only (!) on OSX via the define of the underscore. So it looks like NSAddImage actually required the leading underscore, but non of the other API's for looking up the symbols need them. That's why I deleted that define all together.
(In reply to Nils Ohlmeier [:drno] from comment #14)
> (In reply to Nathan Froyd [:froydnj] from comment #11)
> > I'm a fan of using more cross-platform-y APIs when possible...but does this
> > actually solve all your problems?  Comment 7 makes it sound like more than
> > this patch is needed.
> 
> If you look closely at the patch, the original code actually adds the
> leading '_' to all function names only (!) on OSX via the define of the
> underscore. So it looks like NSAddImage actually required the leading
> underscore, but non of the other API's for looking up the symbols need them.
> That's why I deleted that define all together.

Ah, I see.  So when you wrote comment 7, you had only removed LEADING_UNDERSCORE from the lookup of NS_GetFrozenFunctions, but when you wrote the patch, you had figured out that LEADING_UNDERSCORE removal had to go a bit farther than that?  Nothing else indicated that you had fixed the XRE symbol lookup otherwise...
Flags: needinfo?(drno)
(In reply to Nathan Froyd [:froydnj] from comment #15)
> Ah, I see.  So when you wrote comment 7, you had only removed
> LEADING_UNDERSCORE from the lookup of NS_GetFrozenFunctions, but when you
> wrote the patch, you had figured out that LEADING_UNDERSCORE removal had to
> go a bit farther than that?  Nothing else indicated that you had fixed the
> XRE symbol lookup otherwise...

Exactly. In comment #7 I only had removed the LEADING_UNDERSCORE form the GetFrozenFunctionsFunc lookup in XPCOMGlueLoad(). When I started looking where the XRE symbols get the underscore added I figured out that it was caused by the define, which really only affected OSX.
Flags: needinfo?(drno)
(In reply to Nathan Froyd [:froydnj] from comment #12)
> You should also verify with your try run that OSX 10.10 debug M-2, M-e10s-4,
> and the Windows R-e10s crashes (R-e10s and JSReftest) aren't due to your
> patch via retriggers.

All of them turned at least partially green.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #13)
> In the past, it was required to use
> NSADDIMAGE_OPTION_MATCH_FILENAME_BY_INSTALLNAME so that dynamic XPCOM
> components (in particular libbrowsercomps) would find dependent libraries
> correctly.
> 
> Has that changed?
> 
> Or we could wait until we fold browsercomps back in...

So I manually ran Firefox with dlopen() and used the location bar and the search bar. Everything appears to behave normal. That is only on 10.12. I'll try to give it a spin on 10.8 as well.
So I manually tested this also on 10.9, which seems to be the latest OS X version supported by Nightly. Same result as with 10.12: location and search bar appear to working as expected.

In other words: Am I good to land this?
Flags: needinfo?(nfroyd)
Flags: needinfo?(benjamin)
(In reply to Nils Ohlmeier [:drno] from comment #20)
> So I manually tested this also on 10.9, which seems to be the latest OS X
> version supported by Nightly. Same result as with 10.12: location and search
> bar appear to working as expected.
> 
> In other words: Am I good to land this?

Via some search engine spelunking, I found:

http://opensource.apple.com/source/dyld/dyld-360.22/src/dyldAPIs.cpp

From that code (and all other versions available on opensource.apple.com), it looks like dlopen replicates the functionality of MATCH_FILENAME_BY_INSTALLNAME.  So I think we're good on that front.

The one wrinkle is that the Unix codepath would technically not be equivalent to the mac code we had before, as NSLookupSymbolInImage (called from GetSymbol) will only consider the particular library handle that we pass it, whereas dlsym will consider the image and any dependent images that would be loaded.  We'd need to use the (Mac-specific, I think) RTLD_FIRST flag to replicate NSLookupSymbolInImage's behavior, which doesn't seem like a bad idea.

But AFAICT from looking at the source code, using dlsym and using NSLookupSymbolInImage are completely equivalent here.  So please make the RTLD_FIRST change, and then if that passes testing, I think we're good to go.
Flags: needinfo?(nfroyd)
Yes, I think as long as Firefox is not broken, we no longer care about breaking other DLLs and this should be ok to land.
Flags: needinfo?(benjamin)
Looks like the remaining test failure are indeed intermittents. So I'm planing on landing this Thursday/tomorrow morning west coast time (so I can monitor how it's going).
You can land it now, and I'll monitor it.. This bug has been giving me grief for one week :)
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/b9b87a4102da
replace NSAddImage with dlopen on OSX. r=froydnj
Do you think it'd be ok to uplift this?
Flags: needinfo?(drno)
I would guess/assume so. As it would be good to get the ability to debug aurora and beta with macos Sierra. If it would be my call I would let the patch settle for some time on Nightly before uplifting. But I think it should be modules owners call if this too risky or not.
Flags: needinfo?(drno) → needinfo?(nfroyd)
https://hg.mozilla.org/mozilla-central/rev/b9b87a4102da
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
(In reply to Nils Ohlmeier [:drno] from comment #28)
> I would guess/assume so. As it would be good to get the ability to debug
> aurora and beta with macos Sierra. If it would be my call I would let the
> patch settle for some time on Nightly before uplifting. But I think it
> should be modules owners call if this too risky or not.

Uplifting would be OK by me.
Flags: needinfo?(nfroyd)
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

Approval Request Comment
[Feature/regressing bug #]: The XPCOM code on OSX is/was using an API which has been deprecated since 2007. The original code is as old the first import from CVS.
[User impact if declined]: Developers on Mac OSX 10.12 can't start Firefox or any CPP unit test in lldb debugger.
[Describe test coverage new/current, TreeHerder]: No new tests, but every existing test which starts a Firefox binary on OSX should cover this.
[Risks and why]: There is a small risk that if something isn't working as expected the Firefox binary won't start anymore. But the new API's used through this patch are well known cross platform API's which should not cause problems. Further more I assume that something fundamental as startup failure would have gotten reported by now.
[String/UUID change made/needed]: N/A
Attachment #8794435 - Flags: approval-mozilla-aurora?
Assignee: nobody → drno
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

This patch helps dev debug in macos. Take it in 51 aurora.
Attachment #8794435 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: