Closed Bug 392009 Opened 13 years ago Closed 13 years ago

reduce code duplication between Linux and Windows trace-malloc

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dbaron, Assigned: brendan)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 6 obsolete files)

33.41 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
1.53 KB, patch
Details | Diff | Splinter Review
2.10 KB, patch
brendan
: review+
brendan
: approval1.9+
Details | Diff | Splinter Review
1.80 KB, patch
dbaron
: review-
Details | Diff | Splinter Review
17.80 KB, patch
brendan
: review+
brendan
: approval1.9+
Details | Diff | Splinter Review
We should reduce code duplication between Linux and Windows trace-malloc by using the MallocCallback, etc., functions that are currently used only on Windows for all platforms.

Note that there are a few things in the Linux ones that are not in the Windows ones, but should be (calls to NS_TraceStack), and we should probably also fix bug 392008 at the same time.
Blocks: aboutmemory
Blocks: 392008
No longer depends on: 392008
No longer blocks: aboutmemory
Mine, all mine I say! ;-)

/be
Assignee: dbaron → brendan
Attachment #277828 - Flags: review?(dbaron)
I reordered a big #if defined(XP_UNIX) && !defined(XP_MACOSX)...#endif #ifdef XP_MACOSX...#endif section to follow the standard-elsewhere XP_MACOSX first, then XP_UNIX, then XP_WIN32 test order. The first such #if/#elif chain ends with #else # error ... #endif. This makes the diff less readable, but only slightly.

/be
Attachment #277828 - Attachment is obsolete: true
Attachment #277921 - Flags: review?(dbaron)
Attachment #277828 - Flags: review?(dbaron)
I am trying to set up a Linux build, but if anyone with Linux could test this patch, I'd be grateful.

/be
Attachment #277921 - Attachment is obsolete: true
Attachment #277996 - Flags: review?(dbaron)
Attachment #277921 - Flags: review?(dbaron)
Now two threads may race through NS_TraceMallocDisable and/or NS_TraceMallocEnable and only one will unhook or hook as required. The other may race ahead and call malloc, etc., with tracing of that call enabled or disabled until the unhook or hook happens on the other thread. This seems good enough for MT scenarios, better than fighting to store 0 or 1 in tracing_enabled.

/be
Attachment #277996 - Attachment is obsolete: true
Attachment #277998 - Flags: review?(dbaron)
Attachment #277996 - Flags: review?(dbaron)
This makes the above patch compile on Linux.  Haven't tested yet.
dbaron, did you test that Linux patch? I'll apply an updated patch if that works for you.

/be
Status: NEW → ASSIGNED
Seems fine on Linux (except we might want to ignore one additional stack frame; not sure if that's the case on other platforms yet).  I got a crash on shutdown on Windows, though...
Never mind the crash on Windows on shutdown.  It was a typo in my review-comment-fix for bug 391141 that I've had in my tree for ages without testing it.

But on Windows we don't want to ignore an additional stack frame, so it's probably ok as-is, at least for now.
Seems fine on Mac as well, although there I'm getting two frames of stack more than needed.
Comment on attachment 277998 [details] [diff] [review]
weak MT consistency for tracing_enabled

r=dbaron with the build fixes in the patch I provided.  Sorry for the long delay here.
Attachment #277998 - Flags: review?(dbaron) → review+
Fixed, thanks dbaron:

tools/trace-malloc/lib/nsTraceMalloc.c 1.76
tools/trace-malloc/lib/nsTraceMallocCallbacks.h 1.5

If the skip parameters need tweaking, please feel free to do so and check in with rs=me.

/be
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
So it looks like this drastically changed the amount of leaks reported by leakstats (which processes the log generated by --trace-malloc) but didn't change what shows up in the log generated by --shutdown-leaks.
And I confirmed that the two numbers matched without the patch, and don't match with the patch.  (In my own build, which has the patches above, not what was checked in.)
Seems like the problem is that realloc(NULL, n) calls both the malloc callback and the realloc callback (and we're not suppressing one of them; not sure if we could suppress one).
This fixes the regression -- or at least almost all of it.  I still see a tiny difference in the numbers (I may have seen that before, too, though).
Attachment #282291 - Flags: review?(brendan)
Attachment #282291 - Flags: approval1.9?
I didn't see the tiny difference (a few thousand bytes) before.  Not sure what's up with that, yet, but I think the above patch is good.
Comment on attachment 282291 [details] [diff] [review]
patch to fix leak stats regression

>     start = PR_IntervalNow();
>+    /*
>+     * __libc_realloc(NULL, size) recurs into my_malloc_hook, so it's
>+     * important that we've incremented t->suppress_tracing here
>+     */

Blank line before major comment would be nice. r+a=me with that -- thanks for fixing this.

/be
Attachment #282291 - Flags: review?(brendan)
Attachment #282291 - Flags: review+
Attachment #282291 - Flags: approval1.9?
Attachment #282291 - Flags: approval1.9+
Landed above patch (with blank line and "." at end of sentence).
The numbers still match exactly on Mac.
... and on Windows (although they're much smaller there).
I expect the remainder of the discrepancy is related to races saving and restoring the hooks on different threads.  We need to acquire a lock (probably tmlock should be fine) before saving/restoring hooks -- and avoiding trying to re-enter the lock might be a little tricky.
dbaron, I didn't realize multiple threads were disabling and enabling (saving and restoring the hooks) -- how does that happen? Followup bug?

/be
The hooks themselves save and restore the hooks when calling the original function.  This can race with malloc on other threads.  (And things can also be a bit confusing if allocation functions call each other, which they do sometimes.)

I'd also note that the use of __libc_* isn't needed anymore since we're not overriding them, so __libc_malloc is now always the same as malloc.
Attached patch fix hook/unhook race (obsolete) — Splinter Review
I thought this would fix the remaining inconsistency, but it didn't.  I suspect it's needed anyway, though.
I can't figure out what the remaining problem is.

The leak numbers are still oscillating a good bit more randomly than they used to.

We could try the above patch, although something is still wrong.  I tend to think it's probably time to back out; I need to work on other things.
(What I do know is that the inconsistency comes from missing frees for pointers that are later reused.  leakstats counts that as a leak, but it doesn't show up in the sdleak.log since it overwrites the entry for that address.)
Now that I think about this, there's no reason I should expect that patch to work, since while it's holding the lock other threads won't hit the hook to wait to acquire it.
This should restore the consistent numbers. I hope it compiles -- I don't have a Linux box handy. Thanks for help compiling and testing.

/be
Attachment #282364 - Attachment is obsolete: true
Attachment #282480 - Flags: review?(dbaron)
Comment on attachment 282480 [details] [diff] [review]
linux uses weak-symbol data override and always traces malloc etc.

Didn't actually try it, but this isn't going to work.  It'll overflow the stack, since the hook API requires that a malloc hook cannot call malloc (which is the same as __libc_malloc) without unsetting itself as malloc-hook.  The problem has nothing to do with StartupHooker and ShutdownHooker; it's the race of the hook assignments in my_malloc_hook, etc., themselves.
Attachment #282480 - Flags: review?(dbaron) → review-
We need to go back to overriding malloc rather than using the hook APIs.  This gets back the correct numbers, and the overhead is pretty small (although not zero) when disabled, since it should hit the early tracing_enabled check (as it did before).
Attachment #282582 - Flags: review?(brendan)
Comment on attachment 282582 [details] [diff] [review]
fix hook/unhook race by going back to overriding malloc, etc.

Too bad we have to call PR_Initialized() -- I remember asking about it a while ago but I forget the details about why it is necessary.

/be
Attachment #282582 - Flags: review?(brendan)
Attachment #282582 - Flags: review+
Attachment #282582 - Flags: approval1.9+
One tiny tweak relative to the previous patch: tracing_enabled should default to 0 so that we don't do all the logging work before NS_TraceMallocStartup.  That used to be a good thing (we'd catch more mallocs), but if we're going to ship this by default we don't want to be doing that unless requested.  (This makes the behavior more similar to other platforms as well.)
Attachment #282744 - Flags: review?(brendan)
...er, and also fix the corresponding PR_ASSERT.
Attachment #282582 - Attachment is obsolete: true
Attachment #282744 - Attachment is obsolete: true
Attachment #282750 - Flags: review?(brendan)
Attachment #282744 - Flags: review?(brendan)
Attachment #282750 - Attachment is patch: true
Attachment #282750 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 282750 [details] [diff] [review]
fix hook/unhook race by going back to overriding malloc, etc.

Would like to assert that NSPR is initialized; maybe a later rev (or perhaps because I'm not using Linux these days, I don't care enough to follow up!).

/be
Attachment #282750 - Flags: review?(brendan)
Attachment #282750 - Flags: review+
Attachment #282750 - Flags: approval1.9+
OK, above patch checked in.
You need to log in before you can comment on or make changes to this bug.