Last Comment Bug 658995 - Trace malloc doesn't work on linux with standalone xpcom glue
: Trace malloc doesn't work on linux with standalone xpcom glue
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All Linux
: -- normal (vote)
: mozilla7
Assigned To: Mike Hommey [:glandium]
:
Mentors:
Depends on:
Blocks: 552864
  Show dependency treegraph
 
Reported: 2011-05-23 07:50 PDT by Mike Hommey [:glandium]
Modified: 2011-06-17 07:39 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux (4.04 KB, patch)
2011-05-23 08:01 PDT, Mike Hommey [:glandium]
dbaron: review+
Details | Diff | Splinter Review
part 2 - Use base::AtExitManager instead of atexit() (10.18 KB, patch)
2011-05-23 08:06 PDT, Mike Hommey [:glandium]
benjamin: review+
Details | Diff | Splinter Review
part 2 - Use static destructors instead of atexit() (5.31 KB, patch)
2011-05-24 04:44 PDT, Mike Hommey [:glandium]
benjamin: review+
Details | Diff | Splinter Review
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux. (3.99 KB, patch)
2011-06-14 22:32 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Splinter Review
part 2 - Use static destructors instead of atexit(). (5.95 KB, patch)
2011-06-14 22:33 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Splinter Review
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux. (4.27 KB, patch)
2011-06-16 00:43 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Splinter Review

Description Mike Hommey [:glandium] 2011-05-23 07:50:35 PDT
This was discovered after bug 552864 landed. There are basically two problems:
- Some libxul functions are registered with atexit(), and that is bound to fail after libxul is unloaded. Somehow, this is not a problem on windows and mac.
- malloc/free/... functions aren't diverted with the ones from nsTraceMalloc.c because they are already resolved to libc's when libxul is loaded.
Comment 1 Mike Hommey [:glandium] 2011-05-23 08:01:20 PDT
Created attachment 534438 [details] [diff] [review]
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux

This makes use of libc's malloc/free/... functions until libxul is loaded, and reverts to these after libxul is unloaded.
When libxul is loaded, use libxul's functions
Comment 2 Mike Hommey [:glandium] 2011-05-23 08:06:11 PDT
Created attachment 534440 [details] [diff] [review]
part 2 - Use base::AtExitManager instead of atexit()

Use base::AtExitManager it is possible to avoid problems that would arise with atexit() after libxul is unloaded.
The RegisterAtExitCallback is required in nsTraceMalloc.c because we can't call base::AtExitManager::RegisterCallback directly from C code.
We need to initialize an AtExitManager before NS_LogInit() in NS_InitXPCOM2 because NS_LogInit ends up trying to register a callback. We also add an AtExitManager in XRE_main because the profile lock registers a callback too, and it does so before NS_InitXPCOM2. Then, the one in NS_InitXPCOM2 becomes redundant, but we have a bunch of executables that do NS_InitXPCOM2 without ever calling XRE_main.
Comment 3 Mike Hommey [:glandium] 2011-05-23 08:11:35 PDT
Pushed to try:
http://tbpl.mozilla.org/?tree=Try&rev=a8522d54f80b
Comment 4 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2011-05-23 09:33:37 PDT
Under what conditions do we use standalone XPCOM glue?  i.e., what cases does this bug affect?

When does the AtExitManager stuff run (relative to when main() exits)?  For some of this, we really want it to run after main() is done, or at the very least after XRE_main() is done.

Casting function pointers is a sign that something is wrong.  Why do you need to do that in nsStandardURL.cpp in patch 2?
Comment 5 Mike Hommey [:glandium] 2011-05-23 09:48:21 PDT
(In reply to comment #4)
> Under what conditions do we use standalone XPCOM glue?  i.e., what cases
> does this bug affect?

bug 552864 switches Firefox to use the standlone XPCOM glue to allow preloading of the libraries (bug 	632404) for massive cold startup improvement.

> When does the AtExitManager stuff run (relative to when main() exits)?  For
> some of this, we really want it to run after main() is done, or at the very
> least after XRE_main() is done.

The way the AtExitManager stuff works, it runs the callbacks when destroying the AtExitManager instance. In the most common case, it means when exiting XRE_main.

> Casting function pointers is a sign that something is wrong.  Why do you
> need to do that in nsStandardURL.cpp in patch 2?

Because AtExitManager callbacks take a void * parameter we don't use. I could modify the callbacks to take one, though.
Comment 6 Mike Hommey [:glandium] 2011-05-23 09:51:44 PDT
(In reply to comment #5)
> The way the AtExitManager stuff works, it runs the callbacks when destroying
> the AtExitManager instance. In the most common case, it means when exiting
> XRE_main.

I'm wondering if using a static initializer wouldn't make more sense, actually...
Comment 7 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2011-05-23 10:51:02 PDT
Comment on attachment 534440 [details] [diff] [review]
part 2 - Use base::AtExitManager instead of atexit()

Ugh, more chromium code we'll have to rip out later :-(
Comment 8 Mike Hommey [:glandium] 2011-05-23 11:20:22 PDT
Apparently, MSVC doesn't like what chromium-config.mk adds:

nsDebugHelpWin32.cpp

e:/builds/moz2_slave/try-w32-dbg/build/tools/trace-malloc/lib/nsDebugHelpWin32.cpp(73) : error C2664: 'LoadLibraryW' : cannot convert parameter 1 from 'const char [12]' to 'LPCWSTR'

        Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

e:/builds/moz2_slave/try-w32-dbg/build/tools/trace-malloc/lib/nsDebugHelpWin32.cpp(275) : error C2664: 'lstrcmpiW' : cannot convert parameter 1 from 'const char *' to 'LPCWSTR'

        Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
Comment 9 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-05-23 14:23:03 PDT
chromium-config.mk defined UNICODE ... you can just change those calls from ::Foo() to :FooA.
Comment 10 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2011-05-23 15:13:41 PDT
(In reply to comment #5)
> > Casting function pointers is a sign that something is wrong.  Why do you
> > need to do that in nsStandardURL.cpp in patch 2?
> 
> Because AtExitManager callbacks take a void * parameter we don't use. I
> could modify the callbacks to take one, though.

Please do.  That's much better than casting function pointers and making assumptions about calling conventions.
Comment 11 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2011-05-23 15:40:12 PDT
Comment on attachment 534438 [details] [diff] [review]
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux

ok, r=dbaron
Comment 12 Mike Hommey [:glandium] 2011-05-24 04:44:18 PDT
Created attachment 534730 [details] [diff] [review]
part 2 - Use static destructors instead of atexit()

This one doesn't rely on base::AtExitManager at all. It relies on static destructors.
While this approach works very well, it's however not very obvious when reading the code what it is doing. I'd like to find a way to make that into nice templates or macros so that documentation could lie in a single location, but that might not be possible at all.
Comment 13 Mike Hommey [:glandium] 2011-05-24 05:06:16 PDT
Corresponding try build:
http://tbpl.mozilla.org/?tree=Try&rev=900ee2027179
Comment 14 Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] 2011-06-07 07:55:42 PDT
Comment on attachment 534730 [details] [diff] [review]
part 2 - Use static destructors instead of atexit()

Ugh, but ok.
Comment 15 Mike Hommey [:glandium] 2011-06-14 22:32:15 PDT
Created attachment 539435 [details] [diff] [review]
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux.

Refreshed to apply before bug 632404
Comment 16 Mike Hommey [:glandium] 2011-06-14 22:33:22 PDT
Created attachment 539436 [details] [diff] [review]
part 2 - Use static destructors instead of atexit().

As will land (added comments)
Comment 18 Matt Brubeck (:mbrubeck) 2011-06-15 08:50:29 PDT
I merged this from m-i to m-c, but it will be backed out shortly while we investigate Talos regressions:
http://hg.mozilla.org/mozilla-central/rev/6e230988614f
http://hg.mozilla.org/mozilla-central/rev/6b23b52e68d6
Comment 19 Mike Hommey [:glandium] 2011-06-16 00:43:01 PDT
Created attachment 539739 [details] [diff] [review]
part 1 - Properly divert memory allocation functions for trace malloc with standalone glue on Linux.

I figured what went wrong with the leak reports: posix_memalign wasn't diverted. I thus added posix_memalign and cfree, which are the two functions that nsTraceMalloc.c exports and that we weren't. I don't think we use cfree from anywhere, but at least, we on par with what nsTraceMalloc.c diverts. The interdiff with the previous version is simple enough (and mostly copied from nsTraceMalloc.c) that I don't need review.

Note You need to log in before you can comment on or make changes to this bug.