[10.12] usage of deprecated NSAddImage causes lldb to stall forever

RESOLVED FIXED in Firefox 51

Status

()

P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: drno, Assigned: drno)

Tracking

52 Branch
mozilla52
Points:
---

Firefox Tracking Flags

(firefox49 wontfix, firefox50 wontfix, firefox51 fixed, firefox52 fixed)

Details

Attachments

(2 attachments)

(Assignee)

Description

2 years ago
When I execute commands like './mach run --debug' or './mach gtest --debug' lldb gets started, but after issuing the 'run' command the loading stall forever in the attached stack trace.
(Assignee)

Comment 1

2 years ago
Created attachment 8794412 [details]
stack.txt

Stack trace when hitting CTRL+C while lldb stalls.
(Assignee)

Comment 2

2 years ago
Current guess is that we still use NSAddImage, an API which has been deprecated since 2007. Trying to replace NSSAddImage() with the dlopen() code we have for Linux appears to load the libs, but fails somewhere else.
(Assignee)

Comment 4

2 years ago
So if I take out the OSX code which tries to use NSAddImage() etc and let it simply use the Linux(?) code instead which uses dlopen() and dlsym() all libs appear to get loaded, but dlsym() fails to locate _NS_GetFrozenFunctions in the loaded XUL lib, even though nm clearly shows that XUL has that symbol.
(Assignee)

Comment 5

2 years ago
The warnings output I get when compiling nsXPCOMGlue.cpp

0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSAddImage' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:96:29: warning: 'NSAddImage' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94   LibHandleType libHandle = NSAddImage(aDependentLib,
 0:11.94                             ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:223:34: note: 'NSAddImage' has been explicitly marked deprecated here
 0:11.94 extern const struct mach_header* NSAddImage(const char* image_name, uint32_t options) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.94                                  ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSLinkEditError' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:104:5: warning: 'NSLinkEditError' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94     NSLinkEditError(&linkEditError, &errorNum, &fileName, &errorString);
 0:11.94     ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:210:13: note: 'NSLinkEditError' has been explicitly marked deprecated here
 0:11.94 extern void NSLinkEditError(NSLinkEditErrors *c, int *errorNumber, const char** fileName, const char** errorString) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.94             ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSLookupSymbolInImage' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:122:11: warning: 'NSLookupSymbolInImage' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94     sym = NSLookupSymbolInImage(aLibHandle, aSymbol,
 0:11.94           ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:175:17: note: 'NSLookupSymbolInImage' has been explicitly marked deprecated here
 0:11.94 extern NSSymbol NSLookupSymbolInImage(const struct mach_header* image, const char* symbolName, uint32_t options) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.94                 ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSIsSymbolNameDefined' is deprecated: first deprecated in macOS 10.4
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:126:9: warning: 'NSIsSymbolNameDefined' is deprecated: first deprecated in macOS 10.4 [-Wdeprecated-declarations]
 0:11.94     if (NSIsSymbolNameDefined(aSymbol)) {
 0:11.94         ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:169:17: note: 'NSIsSymbolNameDefined' has been explicitly marked deprecated here
 0:11.94 extern bool     NSIsSymbolNameDefined(const char* symbolName)                                                    __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_4,__IPHONE_NA,__IPHONE_NA);
 0:11.94                 ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSLookupAndBindSymbol' is deprecated: first deprecated in macOS 10.4
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:127:13: warning: 'NSLookupAndBindSymbol' is deprecated: first deprecated in macOS 10.4 [-Wdeprecated-declarations]
 0:11.94       sym = NSLookupAndBindSymbol(aSymbol);
 0:11.94             ^
 0:11.94 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:172:17: note: 'NSLookupAndBindSymbol' has been explicitly marked deprecated here
 0:11.94 extern NSSymbol NSLookupAndBindSymbol(const char* symbolName)                                                    __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_4,__IPHONE_NA,__IPHONE_NA);
 0:11.94                 ^
 0:11.94 Warning: -Wdeprecated-declarations in /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp: 'NSAddressOfSymbol' is deprecated: first deprecated in macOS 10.5
 0:11.94 /Users/nohlmeier/src/mozilla-central/xpcom/glue/standalone/nsXPCOMGlue.cpp:134:21: warning: 'NSAddressOfSymbol' is deprecated: first deprecated in macOS 10.5 [-Wdeprecated-declarations]
 0:11.94   return (NSFuncPtr)NSAddressOfSymbol(sym);
 0:11.95                     ^
 0:11.95 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/mach-o/dyld.h:181:21: note: 'NSAddressOfSymbol' has been explicitly marked deprecated here
 0:11.95 extern void *       NSAddressOfSymbol(NSSymbol symbol) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_1,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);
 0:11.95                     ^
 0:11.96 cd legacydb; /Applications/Xcode.app/Contents/Developer/usr/bin/make libs
 0:12.00 6 warnings generated.
(Assignee)

Comment 6

2 years ago
'./mach run' works as expected and even though it also calls NSAddImage() at the same place (verified via a printf) it does not hang as it does when running in lldb.
(Assignee)

Comment 7

2 years ago
If I let dlsym() search for 'NS_GetFrozenFunctions' instead of '_NS_GetFrozenFunctions' it works. But then fails to locate the XRE functions next.
Comment hidden (mozreview-request)
(Assignee)

Comment 9

2 years ago
Marking it P1 because I assume this prevents every developer on 10.12 from using any 'mach' command with --debug.
Priority: -- → P1
(Assignee)

Updated

2 years ago
Attachment #8794435 - Flags: review?(nfroyd)
Attachment #8794435 - Flags: review?(michelangelo)
Attachment #8794435 - Flags: review?(erahm)
Attachment #8794435 - Flags: review?(doug.turner)
Attachment #8794435 - Flags: review?(benjamin)
(Assignee)

Comment 10

2 years ago
Marked all module peers, owner and emeritus for review in the hope that who ever gets to it first can quickly review this.
(Assignee)

Updated

2 years ago
Summary: [10.12] firefox stalls forever when started in lldb → [10.12] usage of deprecated NSAddImage causes lldb to stall forever

Comment 11

2 years ago
mozreview-review
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

https://reviewboard.mozilla.org/r/80906/#review79702

I'm a fan of using more cross-platform-y APIs when possible...but does this actually solve all your problems?  Comment 7 makes it sound like more than this patch is needed.
Attachment #8794435 - Flags: review?(nfroyd) → review+
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

You should also verify with your try run that OSX 10.10 debug M-2, M-e10s-4, and the Windows R-e10s crashes (R-e10s and JSReftest) aren't due to your patch via retriggers.
Attachment #8794435 - Flags: review?(erahm)
Attachment #8794435 - Flags: review?(doug.turner)
Attachment #8794435 - Flags: review?(benjamin)

Comment 13

2 years ago
In the past, it was required to use NSADDIMAGE_OPTION_MATCH_FILENAME_BY_INSTALLNAME so that dynamic XPCOM components (in particular libbrowsercomps) would find dependent libraries correctly.

Has that changed?

Or we could wait until we fold browsercomps back in...
(Assignee)

Comment 14

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #11)
> I'm a fan of using more cross-platform-y APIs when possible...but does this
> actually solve all your problems?  Comment 7 makes it sound like more than
> this patch is needed.

If you look closely at the patch, the original code actually adds the leading '_' to all function names only (!) on OSX via the define of the underscore. So it looks like NSAddImage actually required the leading underscore, but non of the other API's for looking up the symbols need them. That's why I deleted that define all together.
(In reply to Nils Ohlmeier [:drno] from comment #14)
> (In reply to Nathan Froyd [:froydnj] from comment #11)
> > I'm a fan of using more cross-platform-y APIs when possible...but does this
> > actually solve all your problems?  Comment 7 makes it sound like more than
> > this patch is needed.
> 
> If you look closely at the patch, the original code actually adds the
> leading '_' to all function names only (!) on OSX via the define of the
> underscore. So it looks like NSAddImage actually required the leading
> underscore, but non of the other API's for looking up the symbols need them.
> That's why I deleted that define all together.

Ah, I see.  So when you wrote comment 7, you had only removed LEADING_UNDERSCORE from the lookup of NS_GetFrozenFunctions, but when you wrote the patch, you had figured out that LEADING_UNDERSCORE removal had to go a bit farther than that?  Nothing else indicated that you had fixed the XRE symbol lookup otherwise...
Flags: needinfo?(drno)
(Assignee)

Comment 16

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #15)
> Ah, I see.  So when you wrote comment 7, you had only removed
> LEADING_UNDERSCORE from the lookup of NS_GetFrozenFunctions, but when you
> wrote the patch, you had figured out that LEADING_UNDERSCORE removal had to
> go a bit farther than that?  Nothing else indicated that you had fixed the
> XRE symbol lookup otherwise...

Exactly. In comment #7 I only had removed the LEADING_UNDERSCORE form the GetFrozenFunctionsFunc lookup in XPCOMGlueLoad(). When I started looking where the XRE symbols get the underscore added I figured out that it was caused by the define, which really only affected OSX.
Flags: needinfo?(drno)
(Assignee)

Comment 17

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #12)
> You should also verify with your try run that OSX 10.10 debug M-2, M-e10s-4,
> and the Windows R-e10s crashes (R-e10s and JSReftest) aren't due to your
> patch via retriggers.

All of them turned at least partially green.
(Assignee)

Comment 18

2 years ago
(In reply to Benjamin Smedberg [:bsmedberg] from comment #13)
> In the past, it was required to use
> NSADDIMAGE_OPTION_MATCH_FILENAME_BY_INSTALLNAME so that dynamic XPCOM
> components (in particular libbrowsercomps) would find dependent libraries
> correctly.
> 
> Has that changed?
> 
> Or we could wait until we fold browsercomps back in...

So I manually ran Firefox with dlopen() and used the location bar and the search bar. Everything appears to behave normal. That is only on 10.12. I'll try to give it a spin on 10.8 as well.
Duplicate of this bug: 1305612
(Assignee)

Comment 20

2 years ago
So I manually tested this also on 10.9, which seems to be the latest OS X version supported by Nightly. Same result as with 10.12: location and search bar appear to working as expected.

In other words: Am I good to land this?
Flags: needinfo?(nfroyd)
Flags: needinfo?(benjamin)
(In reply to Nils Ohlmeier [:drno] from comment #20)
> So I manually tested this also on 10.9, which seems to be the latest OS X
> version supported by Nightly. Same result as with 10.12: location and search
> bar appear to working as expected.
> 
> In other words: Am I good to land this?

Via some search engine spelunking, I found:

http://opensource.apple.com/source/dyld/dyld-360.22/src/dyldAPIs.cpp

From that code (and all other versions available on opensource.apple.com), it looks like dlopen replicates the functionality of MATCH_FILENAME_BY_INSTALLNAME.  So I think we're good on that front.

The one wrinkle is that the Unix codepath would technically not be equivalent to the mac code we had before, as NSLookupSymbolInImage (called from GetSymbol) will only consider the particular library handle that we pass it, whereas dlsym will consider the image and any dependent images that would be loaded.  We'd need to use the (Mac-specific, I think) RTLD_FIRST flag to replicate NSLookupSymbolInImage's behavior, which doesn't seem like a bad idea.

But AFAICT from looking at the source code, using dlsym and using NSLookupSymbolInImage are completely equivalent here.  So please make the RTLD_FIRST change, and then if that passes testing, I think we're good to go.
Flags: needinfo?(nfroyd)
Comment hidden (mozreview-request)

Comment 23

2 years ago
Yes, I think as long as Firefox is not broken, we no longer care about breaking other DLLs and this should be ok to land.
Flags: needinfo?(benjamin)
(Assignee)

Comment 24

2 years ago
Looks like the remaining test failure are indeed intermittents. So I'm planing on landing this Thursday/tomorrow morning west coast time (so I can monitor how it's going).
You can land it now, and I'll monitor it.. This bug has been giving me grief for one week :)

Comment 26

2 years ago
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/b9b87a4102da
replace NSAddImage with dlopen on OSX. r=froydnj
Do you think it'd be ok to uplift this?
Flags: needinfo?(drno)
(Assignee)

Comment 28

2 years ago
I would guess/assume so. As it would be good to get the ability to debug aurora and beta with macos Sierra. If it would be my call I would let the patch settle for some time on Nightly before uplifting. But I think it should be modules owners call if this too risky or not.
Flags: needinfo?(drno) → needinfo?(nfroyd)

Comment 29

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b9b87a4102da
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox52: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
(In reply to Nils Ohlmeier [:drno] from comment #28)
> I would guess/assume so. As it would be good to get the ability to debug
> aurora and beta with macos Sierra. If it would be my call I would let the
> patch settle for some time on Nightly before uplifting. But I think it
> should be modules owners call if this too risky or not.

Uplifting would be OK by me.
Flags: needinfo?(nfroyd)
(Assignee)

Comment 31

2 years ago
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

Approval Request Comment
[Feature/regressing bug #]: The XPCOM code on OSX is/was using an API which has been deprecated since 2007. The original code is as old the first import from CVS.
[User impact if declined]: Developers on Mac OSX 10.12 can't start Firefox or any CPP unit test in lldb debugger.
[Describe test coverage new/current, TreeHerder]: No new tests, but every existing test which starts a Firefox binary on OSX should cover this.
[Risks and why]: There is a small risk that if something isn't working as expected the Firefox binary won't start anymore. But the new API's used through this patch are well known cross platform API's which should not cause problems. Further more I assume that something fundamental as startup failure would have gotten reported by now.
[String/UUID change made/needed]: N/A
Attachment #8794435 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

2 years ago
Assignee: nobody → drno

Updated

2 years ago
status-firefox51: --- → affected
(Assignee)

Updated

2 years ago
status-firefox49: --- → affected
status-firefox50: --- → affected
Comment on attachment 8794435 [details]
Bug 1305159: replace NSAddImage with dlopen on OSX.

This patch helps dev debug in macos. Take it in 51 aurora.
Attachment #8794435 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 33

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/bcf7dee34cd6
status-firefox51: affected → fixed
status-firefox49: affected → wontfix
status-firefox50: affected → wontfix
You need to log in before you can comment on or make changes to this bug.