Closed Bug 1615072 Opened 5 years ago Closed 5 years ago

Leak checking for non-default processes not working on OSX

Categories

(Core :: XPCOM, defect)

73 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox-esr68 --- unaffected
firefox73 --- wontfix
firefox74 + fixed
firefox75 + fixed

People

(Reporter: mccr8, Assigned: froydnj)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, Whiteboard: [MemShrink:P1])

Attachments

(1 file)

As part of investigating a leak, I wrote a patch that adds an intentional leak to the main process and content processes. Except, when I run the browser on OSX, it only reports leaks for the main process.

It seems like the problem is that the names of the leak logs are not being set properly on OSX. The log spew contains lots of this:

### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to (some stuff)/runtests_leaks.log

Now, some of those are supposed to have file names more like runtests_leaks_tab_pid91111.log. I think the file names are messed up and every process is dumping its leak log onto a single file.

The good news here is that judging by logs on TreeHerder, things are working fine on Linux and Windows, so hopefully there's not some hideous backlog of leaks that have crept in since this broke.

I haven't figured out exactly what has gone wrong yet. Bug 1604084 is the last major change to this file, but at first glance it feels like the only thing that could cause this is XRE_IsParentProcess() somehow returning the wrong value so I don't know what might be going on here.

Also, it would be nice to have some regression testing for whatever the issue is. I'm not sure if the log file is being completely overwritten, or appended to. If it is the latter, I have a bug on file for that issue already. But from some local testing, maybe it is being overwritten?

It does look like the issue is that XRE_IsParentProcess() is always returning true...

According to my logging, it looks like XRE_SetProcessType() is getting called after InitLog(), so IsParentProcess is returning true.

It looks like we're calling MOZ_COUNT_CTOR in the ctor for nsTArray_base, before XRE_SetProcessType() is called.

Here's the stack, which does not really help me figure out what is going on:

#01: InitTraceLog()[/Users/andrewmccreight/mc/obj-dbg.noindex/toolkit/library/build/XUL +0xabe78]
#02: NS_LogCtor[/Users/andrewmccreight/mc/obj-dbg.noindex/toolkit/library/build/XUL +0xacf2a]
#03: mozilla::DeadlockDetector<mozilla::BlockingResourceBase>::Add(mozilla::BlockingResourceBase const*)[/Users/andrewmccreight/mc/obj-dbg.noindex/toolkit/library/build/XUL +0x14c38f]
#04: mozilla::RWLock::RWLock(char const*)[/Users/andrewmccreight/mc/obj-dbg.noindex/toolkit/library/build/XUL +0x155b10]
#05: __cxx_global_var_init.293[/Users/andrewmccreight/mc/obj-dbg.noindex/toolkit/library/build/XUL +0x171fa1c]

It looks like the string passed into RWLock is "rlbox", so maybe this is a regression from bug 1566288?

Flags: needinfo?(nfroyd)

The rlbox thing is behind an #ifdef XP_MACOSX, which would explain why this is OSX only.

Has Regression Range: --- → yes

[Tracking Requested - why for this release]: We might be leaking in content processes on OSX and not know it.

Sorry, I will fix this. I think I thought about the problem and/or thought about fixing it in a followup, but completely failed to do that.

Assignee: nobody → nfroyd
Flags: needinfo?(nfroyd)

This change is a little gross, because we don't totally control where
the statically initialized instances of RWLock live, due to its uses
in third-party libraries.

Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4b0bed287f1a don't trigger the deadlock detector from static initializers for RWLock; r=mccr8
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(nfroyd)

Comment on attachment 9126760 [details]
Bug 1615072 - don't trigger the deadlock detector from static initializers for RWLock; r=mccr8

Beta/Release Uplift Approval Request

  • User impact if declined: None. But our beta builds won't be able to detect leaks for OS X, which seems bad.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Changing code in a not-frequently used area of the code, and the code changes are very similar to ones we have elsewhere for e.g. StaticMutex.
  • String changes made/needed: None.
Flags: needinfo?(nfroyd)
Attachment #9126760 - Flags: approval-mozilla-beta?

Comment on attachment 9126760 [details]
Bug 1615072 - don't trigger the deadlock detector from static initializers for RWLock; r=mccr8

Fixes broken leak checking on macOS. Approved for 74.0b4.

Attachment #9126760 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Do you have a link to the original intentional leak patch, so I could do a try push with that?

(Also, holy speedy approval, Batman!)

Flags: needinfo?(continuation)

(In reply to Nathan Froyd [:froydnj] from comment #15)

Do you have a link to the original intentional leak patch, so I could do a try push with that?

You can add this to BackstagePass::BackstagePass() (I'm not sure the horrible pointer arithmetic is actually necessary):
NS_LogCtor(((char*)this) + 100, "WayBackstagePass", sizeof(*this));

Note that you'll still get leaks with this, but some should be "leakcheck | tab" in addition to having them "leakcheck | default" (which is how they show up in the main process). You could probably guard the NS_LogCtor call to only happen if we're not in the main process to cut down on the noise.

Flags: needinfo?(continuation)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: