Closed Bug 1145980 Opened 9 years ago Closed 9 years ago

LSAN leaks when Gecko 39 merges to Beta

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(firefox39 fixed, firefox40 verified)

VERIFIED FIXED
Firefox 40
Tracking Status
firefox39 --- fixed
firefox40 --- verified

People

(Reporter: RyanVM, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak)

According to a couple recent Trunk-as-Beta Try simulations the following leaks are reported in mochitest-5, mochitest-bc1/bc3, mochitest-dt, and mochitest-oth.

https://treeherder.mozilla.org/logviewer.html#?job_id=5803344&repo=try

23:15:56 INFO - SUMMARY: AddressSanitizer: 643246 byte(s) leaked in 2852 allocation(s).
23:15:56 INFO - TEST-INFO | Main app process: exit 0
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at AllocateProtoAndIfaceCache, xpc::CreateGlobalObject, XPCWrappedNative::WrapNewGlobal, nsXPConnect::InitClassesWithNewWrappedGlobal
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_realloc, zone, js::Nursery::reallocateSlots, ReallocateSlots
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, js::RegExpStatics::create, js::GlobalObject::getRegExpStatics, js::frontend::Parser
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::frontend::CreateScriptSourceObject, js::frontend::CompileScript
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_new, HashChildren, js::PropertyTree::insertChild, js::PropertyTree::getChild
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, zone, js::gc::GCRuntime::tryNewTenuredObject, JSObject::create
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, js::ForOfPIC::createForOfPICObject, js::GlobalObject::getOrCreateForOfPICObject, js::ForOfPIC::create
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::XDRScript, js::XDRState
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::Nursery::allocateHugeSlots, js::Nursery::reallocateSlots
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, _ZL17NewStringDeflatedILN2js7AllowGCE1EEP12JSFlatStringPNS0_16ExclusiveContextEPKDsm, js::NewString
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, runtime, JS_malloc, XPCConvert::NativeData2JS
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, _ZL17NewStringDeflatedILN2js7AllowGCE1EEP12JSFlatStringPNS0_16ExclusiveContextEPKDsm, NewStringCopyZ
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_calloc, zone, JSScript::makeTypes, ensureHasTypes
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, entryCount, js::Shape::hashify, js::NativeObject::toDictionaryMode
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at mozilla::dom::Promise::Create, mozilla::dom::Promise::Then, then, mozilla::dom::PromiseBinding::then_promiseWrapper
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at Alloc, nsAString_internal::MutatePrep, nsAString_internal::SetCapacity, SetLength
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, moveElementsToTenured, moveObjectToTenured
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_realloc, zone, ExtractWellSized, FinishStringFlat
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::NewStringCopyNDontDeflate, AtomizeAndCopyChars
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, SetWeakMapEntryInternal, WeakMap_set_impl
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, zone, js::ProxyObject::New
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_calloc, pod_calloc, AllocScriptData, js::CloneScript
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, moveSlotsToTenured, moveObjectToTenured
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_realloc, zone, js::Nursery::reallocateSlots, ReallocateElements
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::frontend::CreateScriptSourceObject, js::CloneScript
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::Nursery::allocateHugeSlots, AllocateElements
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at Alloc, nsAString_internal::MutatePrep, nsAString_internal::ReplacePrepInternal, ReplacePrep
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, _ZL17NewStringDeflatedILN2js7AllowGCE0EEP12JSFlatStringPNS0_16ExclusiveContextEPKDsm, AtomizeAndCopyChars
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, js::RegExpStatics::create, js::GlobalObject::getRegExpStatics, js::RegExpObjectBuilder::clone
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, zone, AllocChars, JSRope::flattenInternal
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, zone, js::Nursery::allocateSlots, AllocateSlots
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, entryCount, js::Shape::hashify, js::Shape::search
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at Alloc, nsAString_internal::MutatePrep, nsAString_internal::SetCapacity, Length
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, zone, js::Nursery::allocateSlots, AllocateElements
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at Alloc, nsAString_internal::MutatePrep, nsAString_internal::SetCapacity, nsAString_internal::SetLength
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, runtime, js::MapObject::create
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::TypeNewScript::make, js::ObjectGroup::defaultNewGroup
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, CreateFunctionPrototype, js::GlobalObject::resolveConstructor
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, pod_malloc, js::NewStringCopyNDontDeflate, NewStringCopyN
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_malloc, js::NativeIterator::allocateIterator, js::GlobalObject::initIteratorClasses, js::InitIteratorClasses
23:15:56 WARNING - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at js_pod_calloc, pod_calloc, AllocScriptData, JSScript::partiallyInit
Summary: LSAN leaks when Gecko 39 merges to Beeta → LSAN leaks when Gecko 39 merges to Beta
It looks like the JS engine is just failing to shut down properly.  I have no idea what could cause that.
Blocks: LSan
Are there leaks showing up in debug builds, or only ASan?  ASan builds are opt builds, so maybe somehow shutdown leakup mechanisms are breaking.  Is there some kind of #ifdef variable for things that shouldn't be enabled in beta or release?  I could try looking at those, in the JS engine.
ASAN only. I'm going to kick off some runs to try to bisect this further.
I've got it narrowed down to this merge from fx-team. Will keep going...
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=599b84826bf3&tochange=df3daecd381f
Though the giant FHR/Telemetry unification patches immediately stand out as suspicious.
Confirmed that it's from the FHR mega-push. Will continue bisecting down to a specific bug.
http://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=a70924d1aa0c&tochange=b83c3fa8c80f
Component: JavaScript Engine → Client: Desktop
Product: Core → Firefox Health Report
First leaking rev:
http://hg.mozilla.org/integration/fx-team/rev/a410838a0f98

Regression from bug 1131544.
Blocks: 1131544
Flags: needinfo?(gfritzsche)
Note the bug 1131544 contains two changesets. I bisected by bug number, so http://hg.mozilla.org/integration/fx-team/rev/f3d186d21ca1 could also be to blame.
Hm, Andrew, do you have any idea on how we could narrow that down?
The relevant changeset [0] is JS only and i can't really tell here what might cause this.

http://hg.mozilla.org/integration/fx-team/rev/f3d186d21ca1
Flags: needinfo?(gfritzsche) → needinfo?(continuation)
(In reply to Georg Fritzsche [:gfritzsche] from comment #9)
> Hm, Andrew, do you have any idea on how we could narrow that down?
> The relevant changeset [0] is JS only and i can't really tell here what
> might cause this.

Is there anything in the code you wrote that would be different on beta?  Or that would be different in opt builds?  Failing that, I would guess that osfile.jsm or Task.jsm isn't handling shutdown properly in either of those situations (probably the latter).
Flags: needinfo?(continuation)
I don't see any differences for beta or opt builds for the direct code changes here.
Is there any way we can trace this down further to more specific suspects?
The primary difference from a build standpoint between trunk and beta is whether the RELEASE_BUILD ifdef is set or not. So features that are enabled on trunk behind an |ifndef RELEASE_BUILD| won't be enabled on beta.
Maybe narrowing it down to a particular test would help?  Is beta doing mochitest run-by-dirs?  If it is, then you can see which directory it is running that is leaking.

That bc3 run is leaking twice, in two separate runs that end with:
  toolkit/mozapps/extensions/test/browser/test-window/browser_updateid.js

We seem to be running the same test multiple times.  I have no idea why that is.

Ryan, do you have a link to the entire try run so we can look at the failure logs for the other tests?

There's nothing obviously related to telemetry behind a RELEASE_BUILD flag.
One possibility is that the code is doing some work late in shutdown, after the shutdown GC runs.  That doesn't explain why it is LSan only, but I did see some code not too long ago that had some JS that ran when an nsITimer fired, and sometimes the timer would fire late in shutdown, and cause a leak.  The fix was to shut down the timer explicitly earlier.
Right, thanks, i currently suspect some shutdown or init code behaving differently.
I didn't find more time this week, so i will have to follow up here next week and might ping you on IRC :)
Alessio is investigating for now.
Assignee: nobody → alessio.placitelli
Depends on: 1152828
That's the offending c-set: http://hg.mozilla.org/integration/fx-team/rev/f3d186d21ca1

I was able to reduce the offending code to the one included in this c-set: https://hg.mozilla.org/try/rev/fdd449238afe . We're not adding anything, just removing stuff which shouldn't really be causing the leak.

LSAN still leaks: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a041ba262af
Digging a bit deeper into this issue, I found out why bug 1131544 is triggering the leak: the changeset from part 1 is adding a missing comma [1] after |asyncPluginInit: Preferences.get(PREF_ASYNC_PLUGIN_INIT, false)|: probably a merge artefact from bug 1120362. Without the comma, no leak is happening. This explains why my c-set uncovered the leak.

Memory seems to be leaking on environment changes triggered during those tests:

- layout/generic/test/test_selection_underline.html
- toolkit/mozapps/extensions/test/browser/browser_bug557956.js

Further tracing down the issue reveals that |TelemetryPing.doPing| [2] is keeping the payload object around somewhere, which is causing the mochitests to leak. Bug 1139460 rewrote that part and probably fixed the issue as I'm not able to reproduce the issue locally anymore.

[1] - https://hg.mozilla.org/mozilla-central/diff/357e9b9efd18/toolkit/components/telemetry/TelemetrySession.jsm#l1.169
[2] - https://hg.mozilla.org/mozilla-central/annotate/94fa5538015c/toolkit/components/telemetry/TelemetryPing.jsm#l536
Indeed, I did a run on April 10 that appears to not have the leaks. Can you please request uplift on bug 1139460?
Status: NEW → RESOLVED
Closed: 9 years ago
Depends on: 1139460
Resolution: --- → FIXED
Target Milestone: --- → Firefox 40
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #20)
> Indeed, I did a run on April 10 that appears to not have the leaks. Can you
> please request uplift on bug 1139460?

Bug 1139460 will be uplifted to 39 along with the other Telemetry fixes.
Status: RESOLVED → VERIFIED
Moving needinfo (in case this is still an issue?).
Flags: needinfo?(gfritzsche) → needinfo?(alessio.placitelli)
(In reply to Georg Fritzsche [:gfritzsche] from comment #23)
> Moving needinfo (in case this is still an issue?).

I've been talking to RyanVM about this, and it seemed to be fixed two weeks ago. Ryan, is this still an issue?
Flags: needinfo?(alessio.placitelli) → needinfo?(ryanvm)
Aurora40 isn't leaking in production, so let's just call it good.
Flags: needinfo?(ryanvm)
s/Aurora40/Beta39 of course :P
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.