Closed Bug 932781 Opened 11 years ago Closed 11 years ago

Trunk trees closed due to OOMs from leaking hundreds of DOMWindows

Categories

(Firefox :: General, defect)

x86
Windows 7
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
Firefox 28
Tracking Status
firefox27 + fixed
firefox28 + fixed

People

(Reporter: emorley, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, Whiteboard: [see comment 11 for requirements for reopening][qa-])

Attachments

(1 obsolete file)

Still trying to get context as to the full story here piecing together from the IRC logs:

03:22	philor	khuey: remember when you fixed The OOM from storage threads? could you do that again, for whatever it is now, please?
03:23	khuey	philor: oh boy
03:23	khuey	philor: that was a long time ago ... don't really remember that
03:23	khuey	got a bug #?
03:24	romaxa	glandium: which revision did you use for https://bugzilla.mozilla.org/attachment.cgi?id=823804 ?
03:25	philor	khuey: it was 18/19, yeah, and I don't have any reason to believe it's storage again, but the current inbound closure, bug 929359, and bug 932159 where I hid asan browser-chrome are all about the way we're constantly on the edge of OOM now
03:25	khuey	oh, hmm
03:25	khuey	is this the thing were we weren't shutting down the threads properly?
03:26	philor	it was, yeah, we were keeping 300 threads alive through the life of the test
03:26	khuey	yeah
03:27	khuey	802239
03:27	khuey	I believe
03:27	khuey	firebot: bug 802239
03:27	firebot	Bug https://bugzilla.mozilla.org/show_bug.cgi?id=802239 maj, --, mozilla19, khuey, RESO FIXED, mozStorage leaks one thread per connection until shutdown
03:29	philor	"we're done with looking at OOM-destroyed runs" - so brave, so foolish, so unaware of his future
03:31	khuey	philor: and I guess ASAN isn't kind enough to print out thread stacks when it decides to kill the browser?
03:33	khuey	philor: there's also the question of why ASAN is trying to allocate 51 GB of memory ...
03:33	khuey	er, sorry
03:33	khuey	32
03:33	khuey	MB
03:33	khuey	ok that's more reasonable
03:34	khuey	although still a lot ...
03:35	philor	alas, no, asan doesn't want to print anything, because it doesn't know anything's wrong, all it sees is a mozapps/extensions/ test start, and then timeout, and then another start, and then timeout, etc.
03:37	khuey	philor: fun
03:37	khuey	philor: do we have any idea what's going on?
03:37	Hughman	khuey: that i can do
03:38	khuey	njn: is 932159 related to the OOM stuff you were looking at?
03:39	njn	khuey: ASAN? doesn't sound related
03:39	khuey	njn: well it's an OOM
03:39	njn	khuey: bug 929359 is the one I've looked at, it's windows debug mochi-2 failures
03:40	khuey	right
03:40	njn	khuey: and AFAICT, we've been slightly OOMing there for some time, and a minor change just pushed us over the edge... so it seems unlikely to affect elsewhere(?)
03:40	philor	I'd like to call the asan b-c OOM the leading indicator for b-c OOMs to come, but, the tree is closed for b-c OOM so I guess it's not really "to come"
03:41	* philor looks at what we backed out for it, double-takes
03:42	philor	every single bit of that is Qt
03:42	philor	we backed out a pure-Qt patch for Windows browser-chrome OOM?
03:42	njn	woo
03:42	* philor lights a candle, sacrifices a chicken, and looks for a goat
03:43	KWierso	philor: yeah
03:43	philor	people-waiting-for-inbound-to-open: go to bed or go get drunk, it ain't gonna happen
03:44	glandium	philor: it might in 22 minutes
03:45	glandium	romaxa: you can get 68c4d885d6dd from try
03:45	froydnj	I feel like that belongs on a meme somewhere
03:46	philor	if backing out that patch causes Win7 debug browser-chrome to not OOM, I can solve all of our testing capacity problems
03:46	philor	stop running tests, buy magic 8-balls
03:46	philor	!8ball did I pass tests?
03:46	firebot	philor: Of course.
03:46	philor	sweet, ship me
03:47	khuey	lol
03:47	glandium	philor: wait, it was ooming?
03:48	khuey	lol
03:50	philor	glandium: which was? the inbound Win7 debug b-c closure? (NS_ERROR_FAILURE) [nsIZipReader.open], one of our common OOM markers, yeah
03:50	philor	asan b-c? yeah
03:50	khuey	it's kind of impressively how thoroughly we die on oom
03:51	khuey	assertions, crashes, js exceptions
03:51	khuey	everything
03:51	philor	win7's m2 leak? yeah, it's OOM from the cycle collector
03:51	khuey	hmm
03:51	khuey	we do have 700 DOM windows open
03:51	khuey	that is not good
03:52	bz	That bug sounds like for some reason the popup is using the parent's refresh driver or something????
03:52	glandium	philor: asan b-c is hidden
03:52	philor	in fact, right after zipreader fails, "ASSERTION: Ran out of memory while building cycle collector graph"
03:52	bz	Oh, are these animated gifs?
03:52	bz	OK
03:52	khuey	we are clearly GCing
03:52	philor	glandium: yeah, I hid it because it got a case of the permaorange, decoder tried to fix it by making it use less memory, it's picked up another incomprehensible failure during the day today, I haven't looked at the new one
03:53	khuey	18:05:02 INFO - --DOMWINDOW == 686 (3DC59B38) [serial = 18590] [outer = 5D391770] [url = about:blank]
03:53	khuey	so wtf
03:53	khuey	how did we accumulate 700 windows
03:53	njn	khuey: it goes over 1000 sometimes
03:53	khuey	philor: is this bc thing windows 8 only?
03:53	bz	in a test run?
03:53	khuey	bz: yeah
03:53	philor	khuey: win7 only
03:53	bz	tests end up with tons of windows....
03:53	* bz is not quite sure how
03:53	bz	"not enough gc"
03:53	khuey	er
03:53	khuey	yeah
03:53	philor	as is the mochitest-2 thing
03:53	khuey	win7 only
03:54	njn	khuey: ~800 is the threshold for the win7 debug mochi-2 failure
03:54	roc	could some test be leaving a window open
03:54	njn	khuey: below that, we're usually ok, above that we fail
03:54	dolske	I will laugh if test are slow because they're swapping. :)
03:55	philor	yeah, maybe that's why Mac debug is surviving, but taking twice as long as it did on 18
03:55	philor	though it does, come to think of it, have a very frequent mochitest-2 shutdown timeout...
03:56	khuey	bz: 700 DOM windows that we don't optimize out of the CC graph seems like the kind of thing that would cause a CC oom
03:57	njn	philor: my ideas so far haven't worked, so now I'm trying two of them in tandem
03:57	njn	philor: oh, and I'm facing East this time
03:57	roc	Looks like on Mac we hit 638 DOMWindows
03:57	roc	in b-c
03:57	roc	or thereabouts
03:57	njn	roc: it probably varies significantly
03:57	philor	njn: pour a glass of rum for Jambo
03:57	njn	roc: on mochi-2, I've seen it range from ~600 to ~1050
03:57	roc	njn: regardless, this number is way way way too large
03:58	philor	the Mac m2 shutdown timeout I'm looking at happens just after we finish GCing 704 domwindows
03:58	njn	roc: indeed; I'd love it if someone fixed that instead of me trying to tweak my pldhash tweaks
03:58	dolske	http://i.imgur.com/cjzOhdO.jpg
03:58	roc	either there are a lot of windows actually open, in which case tests need to be fixed, or we have a leak, in which case code needs to be fixed
04:00	khuey	looks like the number skyrockets during the pb tests
04:01	roc	Linux 64 debug has 661 windows
04:01	roc	let's see if I can reproduce this
04:02	khuey	hmm, or maybe not
04:03	khuey	maybe it is just a slow increase
04:03	roc	actually Linux 64 debug peaks at over 900 windows in the log I looked at
04:05	khuey	roc: which one are you looking at?
04:05	khuey	oh
04:05	khuey	bc and oth are not the same thing anymore
04:05	khuey	I'm looking at the wrong type of log
04:06	njn	in mochi-2, test_Range-deleteContents.html seems to open lots of windows
04:06	roc	I looked through the mochitest and reftest logs for Linux64 debug. The window counts look reasonable except for mochitest-2 and browser-chrome
04:06	khuey	so almost all of these late windows are chrome://global/content/mozilla.xhtml
04:06	* khuey wonders wtf that is
04:07	khuey	is that really the about:mozilla page?
04:07	khuey	wtf
04:07	njn	test_Range-extractContents.html, too
04:07	khuey	aha
04:07	njn	khuey: you mentioned test_Range previously, IIRC
04:07	khuey	and a ton of devtools stuff
04:08	khuey	https://khuey.pastebin.mozilla.org/3378934
04:08	khuey	that's a successful bc run
04:08	khuey	notice all the devtools stuff not getting GCd until the end
04:08	njn	is the devtools pane being opened and then not closed?
04:09	khuey	well the number of windows seems to suggest no
04:09	khuey	since that would be 50 or more devtools pages
04:09	khuey	*panes
04:09	philor	and we're still orange on the tip
04:09	glandium	khuey: as well as about:mozilla and about:blank
04:09	khuey	glandium: yeah ...
04:09	philor	so the world isn't quite completely insane, that's nice
04:10	njn	in mochi-2 shutdown, iframe-between-tests.html shows up a lot
04:10	khuey	njn: the good news is if I can reproduce locally I can use my "dump the CC graph and then conditionally breakpoint on release" trick
04:11	glandium	heh, it *is* intermittent
04:11	njn	khuey: tons of Range-test-iframe.html#[blah blah blah]
04:11	glandium	2 greens out of 5 on m-i tip
04:13	njn	yah, those range tests look like they're killing mochi-2
04:13	philor	so we just disable Ms2ger/* and devtools/*, and we're good?
04:14	glandium	disable all the tests
04:14	philor	oh, yeah
04:14	philor	since we actually *did* disable devtools/* for the storage threads thing, and just wound up OOM elsewhere
04:16	njn	glandium: just disable the range tests, for mochi-2
04:20	roc	in b-c, some of the leaked windows are coming from browser/base/content/test/social/browser_social_activation.js
04:20	roc	I suspect via http://lxr.mozilla.org/mozilla-ce...ents/social/FrameWorker.jsm#195
04:22	glandium	roc: haha, social
04:22	* philor should have blamed markh from the start
04:23	khuey	njn: if I run the devtools tests locally I get assertions about extra shutdown CCs
04:23	* markh ducks
04:23	khuey	anyways, lunch time
04:23	roc	well
04:23	roc	devtools leaks a lot too
04:37	* njn finishes the shortest MemShrink report yet
04:51	njn	I like how somebody retriggered win7-mochi-2 twice in the most recent m-i push to get it to fail
04:52	njn	"oh, you think you can pass, hey? Take that! Er... take that! Yeah, thought so."
04:52	nigelb	njn: heh
04:53	nigelb	Isn't that normal? I mean, I've seen sheriffs do that.
05:06	philor	I like big retriggers
05:07	philor	thus the multiple tens I've got spread around right now
05:10	khuey	njn: so after running the inspector tests I have 96 surviving nsGlobalWindows
05:11	njn	khuey: is that higher or lower than expected?
05:11	khuey	njn: well if we weren't leaking we'd expect ~12
05:12	njn	khuey: I see
05:12	njn	khuey: is this relevant for the bc OOMs?
05:12	njn	khuey: because the mochi-2 OOMs seem to be range-related
05:12	khuey	njn: right
05:37	nigelb	philor: what's going on with inbound now? are we waiting for the tests to pass after the backout or something?
05:38	philor	nigelb: no, the backout did nothing, we're waiting to see whether any of my deeper and wider retriggers show anything
05:38	nigelb	Ah, ok
06:01	glandium	njn: do you know much about the gdbserver feature of valgrind?
06:01	njn	glandium: very little, unfortunately
06:01	njn	why do you ask?
06:02	glandium	njn: because sewardj is not around and because valgrind doesn't allow me to attach when it hits invalid reads
06:03	dholbert	KWierso, not that you should still be working, but on the off chance that you are: any idea about the prospects for inbound reopening? :)
06:03	njn	doesn't allow? how
06:03	njn	?
06:03	KWierso	dholbert: 302 philor :)
06:03	philor	dholbert: see above, where I recommended either going to bed or getting drunk, your choice
06:04	dholbert	philor, I suspected as much. :) sounds like a plan
06:04	philor	funzors, our first OOM in browser_dragdrop.js was October 1st
06:04	KWierso	I suggest both options
06:05	philor	this will make it more difficult to find a truly all-green push only on inbound to revert to
06:05	njn	philor: good thing we branched yesterday
06:05	* philor backspaces repeatedly
06:06	philor	yeah, that would rather be the story of our branching lives
06:06	Mook	may I suggest 9b2a99adc05e?
06:07	philor	heh
06:07	glandium	njn: it doesn't leave me time to do it
06:08	philor	bwc has absolutely no idea where I live, right?
06:08	philor	let's keep it that way
06:08	philor	because I'm starting to like the idea of backing him out for the second day in a row
06:09	glandium	njn: forget it, i'm dumb
06:14	philor	actually, the right parallel probably isn't the storage threads, it's the Win PGO linker memory
06:15	philor	somebody's going to have to fix this, it won't be quick, and in the meantime we can probably land approved patches that cause us to use less memory, and nothing else
06:16	philor	sweet, got one leak on m-c tip that's probably from a CC OOM
06:16	glandium	philor: so, can i land build system changes? :)
06:16	philor	glandium: no, you may never land another thing
06:16	glandium	boohoo
06:16	philor	no, on second thought, you can land build system changes that cause us to not build things
06:17	glandium	philor: i can land build system changes that cause us to not test things ;)
06:18	philor	leaked mozapps/extensions/ stuff, strike one
06:18	philor	ASSERTION: Extra shutdown CC
06:19	philor	and the stuff that appears to be killing asan, wonder what decoder's try parent actually was
06:21	philor	but a piddly 658 domwindows
06:23	philor	so, did the first thing on inbound above the last merge to central push us over the cliff?
06:23	philor	or did bwc?
06:24	philor	or are we already over the edge on central and running on air without realizing we're about to fall?
06:40	capella	the sky is falling?
06:43	njn	philor: I've got some ACME brand rockets in a wooden box over here
06:43	njn	just strap one of those to your back
06:43	philor	njn: coincidence, I've got some ACME roller skates!
06:49	philor	does anyone have a persuasive reason why m-c and fx-team should stay open?
06:49	khuey	no
06:50	bz	leprechauns?
06:50	khuey	bah no roc
06:50	philor	guess I'll file something with a crappy comment 0, to have something to point at in the closure message
06:51	bz	"leprechauns" sounds like a good comment 0
06:51	khuey	bz: are you in CA?
06:51	khuey	or just awake very late? :-P
06:51	bz	Latter
06:51	* bz was debugging some matlab code and rocking a baby to sleep
06:51	bz	She's finally asleep
06:51	bz	now to emulate her.... ;)
06:52	bz	The good news is my patches are green on try and have reviews
06:52	bz	The bad news....
06:52	glandium	bz: yeah, i have 10 build system patches waiting to land too
06:53	glandium	that are green on try
06:53	khuey	glandium: philor probably hasn't remembered to close b-s
06:54	bz	aurora is open, I bet!
06:54	philor	don't think it's even in treestatus
06:54	glandium	khuey: i don't want to have to merge b-s to m-c or m-i afterwards. I don't think it's in a state to be merged gracefully.
06:55	philor	yeah, aurora's oddly green, despite having every reason to have permaorange asan b-c, and nearly permaorange win7 debug m2 and b-c
06:55	bz	All these worries about whether the code will compile after merging....
06:55	khuey	glandium: have we even used b-s in the last 6 months?
06:55	bz_sleep	It's clearly bedtime. ;)
06:55	glandium	khuey: it was used last month
06:55	khuey	ok
06:56	glandium	khuey: i'd rather not pollute m-c with its history
06:56	khuey	oh what did we do to it?
06:57	glandium	khuey: probably nothing shocking, but i hate to merge separate branches that add nothing just for fun
06:58	glandium	that being said, i really don't see why build system only changes couldn't get approval to land on m-i
06:58	glandium	philor: ^
06:59	philor	glandium: if you want to take over, just say the word
06:59	glandium	philor: take over what?
06:59	philor	it's midnight, and I'm typing up a bug that I totally don't understand because nobody else will
07:00	philor	if you want to sheriff the trees, and retrigger enough to verify that you haven't fucked a fucked tree even worse than it's already fucked, because you just have to get something landed and out of your tree, take over
07:00	philor	all of it
07:00	philor	good night, fuck this
07:01	khuey	so I fixed one path where the devtools tests are leaking
07:01	khuey	but it didn't help and I can't find any more :-(
...
07:35	roc	khuey: did you get anywhere with the leak?
07:35	khuey	roc: well I fixed a couple things
07:35	khuey	still dealing with more ...
07:35	roc	cool
07:37	roc	maybe tests should fail if we hit 500 DOMWindows or something like that
07:38	khuey	roc: we really need to hueyfix chrome->chrome stuff
07:43	Tomcat|sheriffduty	so the leaks are stopping us from reopen inbound right
07:44	khuey	yes
07:44	Tomcat|sheriffduty	like bug 929359 and friends
07:44	Tomcat|sheriffduty	thx khuey was trying to get a picture what happened over hte night here
07:45	khuey	bad things :-)
07:45	* khuey is poking at some leaks now
...
10:21	Tomcat|sheriffduty	wow leaked 273 DOMWINDOW(s) on the last run from m-c before the closure
10:22	khuey	well that's cause the cycle collector started working
10:22	khuey	Tomcat|sheriffduty: ok, so your first mission is to disable the testRange tests from m2 again
Depends on: 929359
Depends on: 932159
Some bits from bug 929359:

(In reply to Nicholas Nethercote [:njn] from comment #72)
> So, it appears that we're really close to OOM (or even hit it, benignly) at
> shutdown on Windows Mochitest-2.  In fact, on successful runs you can see
> "Ran out of memory in ScanRoots".  The threshold appears to be about 800 DOM
> windows;  if we have more than that at shutdown, we probably fail.  If we
> have fewer, we probably succeed (though there's some noise around 790--800).
> 
> And my changes in bug 927705 may have pushed us into problem territory. 
> Part 2b is the obvious candidate, because it changed things so that we don't
> overload pldhashes when we can't grow them.
> 
> So I tried reverting that change and pushing to try, but I'm still getting
> failures most of the time.  Um.

(In reply to Nicholas Nethercote [:njn] from comment #112)
> I tried reverting all the patches from bug 927705 and I still get frequent
> failures:  https://tbpl.mozilla.org/?tree=Try&rev=40a150e43dbb.  Hmm.  Let's
> hope that my other idea (to resurrect bug 815467) has better luck.

(In reply to Nicholas Nethercote [:njn] from comment #113)
> Actually, those failures are the ones in bug 919856, which are slight
> different to this bug's failures.  I suspect it's not a coincidence that we
> stopped getting those ones just before we started getting these ones.  Those
> ones weren't as common, though.

(In reply to Nicholas Nethercote [:njn] from comment #114)
> Resurrecting bug 815467 didn't really help: 
> https://tbpl.mozilla.org/?tree=Try&rev=bbae81bb572d.  Well, I got 11
> failures out of 21 runs, which is a slightly better rate than I got on
> previous runs.. could be real, could be noise.

(In reply to Andrew McCreight [:mccr8] from comment #116)
> (In reply to Nicholas Nethercote [:njn] from comment #113)
> > Actually, those failures are the ones in bug 919856, which are slight
> > different to this bug's failures.  I suspect it's not a coincidence that we
> > stopped getting those ones just before we started getting these ones.  Those
> > ones weren't as common, though.
> 
> Ah, good catch.  Yeah, I think if we had that behavior it would be an
> improvement from the orange factor perspective.  That one's only a month old.

(In reply to Nicholas Nethercote [:njn] from comment #117)
> Looking through the DOMWINDOW lines, it looks like the test_Range-foo tests
> are responsible for a lot of them.

(In reply to Nicholas Nethercote [:njn] from comment #118)
> I tried the ideas from comment 72 (undo part 2b of bug 927705) and comment
> 114 (resurrect bug 815467) in tandem:
> https://tbpl.mozilla.org/?tree=Try&rev=73bb40f72806
> 
> I still get 15 failures out of 21, but they're the failures from bug 919856
> ("where leaked URL list includes DOMCOre range tests") instead of the
> failures from this bug ("also leaked leaked 7 DOMWINDOW(s)").  Does that
> count as progress?
> 
> At this point it's clear that mochi-2's real problems predate bug 927705. 
> (Bug 919856's title even mentions the dom range tests...)

(In reply to Nicholas Nethercote [:njn] from comment #119)
> > At this point it's clear that mochi-2's real problems predate bug 927705. 
> 
> To steal shamelessly from philor on IRC:  we were already off the edge of
> the cliff, hanging in the air, legs spinning wildly, and bug 927705 just
> made us look down...

(In reply to Ed Morley [:edmorley UTC+1] from comment #123)
> > <khuey> Tomcat|sheriffduty: ok, so your first mission is to disable the testRange tests from m2 again
> 
> I don't think that would help.
> 
> We have leaks during both mochitest-2 and mochitest-browser-chrome that look
> to at least my naive eyes pretty similar:
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=29839989&tree=Mozilla-
> Central
> https://tbpl.mozilla.org/php/getParsedLog.php?id=29872321&tree=Mozilla-
> Central
Keywords: mlk
So one interesting question is what happened to our leak logging infrastructure ...
14:51:28 - khuey: did we stop running your cc analyzer stuff?
14:51:56 - ttaubert: hmm. not that I know of unless someone ripped that out?
14:52:03 - ttaubert: or unknowingly disabled it
14:53:09 - khuey: ttaubert: well if we were running it presumably we would have noticed before we started leaking 500 DOM windows to shutdown
14:53:43 - ttaubert: yeah... looks like we should look into why that's not running anymore
14:53:55 - khuey: indeed
...
14:54:31 - khuey: did we move the social crap OOP?
14:55:49 - khuey: edmorley|sheriffduty: idk what I can really do here other than try to fix a few bugs ...
14:55:55 - ttaubert: I thought that's all implemented with workers and stuff but idk
14:55:56 - khuey: not clear what we would back out
14:56:07 - khuey: no it uses the fake frame worker stuff
14:56:20 - khuey: none of the tests remove the frame from the hideen window ...
14:56:32 - khuey: so that leaks a ton
...
14:57:40 - khuey: edmorley|sheriffduty: no, ttaubert has special code to track down window leaks
14:57:49 - khuey: which apparently isn't running anymore ...
14:57:59 - khuey: it would have yelled when we were at 1
14:58:01 - khuey: instead of 500 ...
14:58:17 - ttaubert: still there at least http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#379
What platforms and tests are having problems?  Is this just Win7 debug M2 and BC?
If M2 is the problem, I'd wager test_Range* is the problem.  See bug 875585.  Bug 778011 has a list of these tests that are disabled on Android for OOMiness.  I can do a little analysis later today to find out what tests are leaking windows, if nobody has done that already.
Some analysis by njn in bug 929359 confirms that test_Range is involved, so somebody should disable those tests on Windows.
(In reply to Andrew McCreight [:mccr8] from comment #6)
> Some analysis by njn in bug 929359 confirms that test_Range is involved, so
> somebody should disable those tests on Windows.

See last quote in comment 1, I don't think this will help sadly.
It'll fix M2 ... we still have to fix Mbc of course.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #8)
> It'll fix M2 ... we still have to fix Mbc of course.

I mean more that unless the tests are inherently bad (which given they haven't changed recently seems unlikely to explain the recent regression), they are surely just exposing a platform issue, the same issue being tickled by other tests during mochitest-browser-chrome. As such, disabling the tests doesn't seem like the correct solution (and you know how much I normally like disabling misbehaving tests!).
(From #developers, for those watching this bug waiting for the trees to open):

16:05:54 - mccr8: what tests and platforms are OOMing?
16:12:25 - philor: counting things where we've just called it a timeout, maybe with a leak, but if you look it timed out just after opening the 605th domwindow, or only counting things that actually say they were OOM?
16:12:54 - mccr8: well, I'm trying to figure out what I should look at in order to fix so we can reopen the trees.
16:13:03 - philor: the very narrowest view possible would be "it's only about Win7 debug M2 and b-c"
16:13:29 - mccr8: ok, thanks.  for M2, we should just disable test_rangeFoo.
16:13:33 - mccr8: I can put together a patch.
16:13:39 - philor: but probably it's the reason ASan b-c was breaking, and maybe the reason we wound up disabling all of dom-level* on Mac
16:13:55 - mccr8: yeah b-c has some problems...
16:14:03 - philor: what are these tests that we'll be disabling doing wrong?
16:14:15 - khuey: leaking
16:14:20 - philor: how will we avoid writing new tests that do the same wrong thing?
16:14:25 - khuey: actually idk about range
16:14:29 - khuey: but bc stuff is leaking
16:14:43 - mccr8: they try a ton of permutations, but maybe they could be fixed to not hold onto things for so long, or something.
16:15:11 - mccr8: the people working on the tests already reduced the number of permutations, which bought us time, but apparently that just let us walk closer to the edge of doom...
Depends on: 932852
Depends on: 932880
Depends on: 932898
(In reply to Ed Morley [:edmorley UTC+1] from comment #3)
> 14:51:28 - khuey: did we stop running your cc analyzer stuff?
> 14:51:56 - ttaubert: hmm. not that I know of unless someone ripped that out?
> 14:52:03 - ttaubert: or unknowingly disabled it
> 14:53:09 - khuey: ttaubert: well if we were running it presumably we would
> have noticed before we started leaking 500 DOM windows to shutdown
> 14:53:43 - ttaubert: yeah... looks like we should look into why that's not
> running anymore

Filed bug 932898 for this.

Requirements to reopen the tree:
* Bug 932867 fixed (has patch, awaiting final review and landing).
* Bug 932880 fixed (needs owner).
* Bug 932898 fixed (needs owner).
** Bugs filed for any/all new leaks discovered once the analysis is working.
** Those bugs fixed, so bug 932898 can land without turning the tree orange (or else a way to whitelist these failures similar to how we did the first time around).
Whiteboard: [see comment 11 for requirements for reopening]
The stuff in comment 11 will only help bc, not m2.
Here's a possible fix for the M2 problem.
Any obstacles to 27 left here?  We'll be looking to get builds to QA today so it can be vetted for unthrottling on Friday.
Flags: needinfo?(emorley)
We wouldn't know whether someone introduced a new intermittent leak in Win7 mochitest-2, because it would just get covered up by the existing leak of the world, but that should be it, and with enough retriggering on the tip of aurora we can at least be sure nobody has introduced new non-intermittent leaks there.
Flags: needinfo?(emorley)
Hi a little status update:

-> Mozilla-inbound/Central and Fx-Team trees still closed

-> A patch from bug 932898 has been pushed. This should fix the leaking tests. But it does not fix the missing leak analysis functionality.

-> The push for https://bugzilla.mozilla.org/show_bug.cgi?id=932880 was still resulting in Memory Leaks

-> A try server run with a backout for bug 927705 is underway -> https://tbpl.mozilla.org/?tree=Try&rev=687a285acbe0 to check if this fixes any of the current problems.

Also discussion/fixing is still continuing in #developers on irc
The Win7 debug m2 shutdown leak (bug 929359) seems to have been fixed for real by roc's patch in bug 933072 to make some tests not hold onto so many windows.
(In reply to Carsten Book [:Tomcat] from comment #17)
> -> A try server run with a backout for bug 927705 is underway ->
> https://tbpl.mozilla.org/?tree=Try&rev=687a285acbe0 to check if this fixes
> any of the current problems.

Sadly there are still browser-chrome failures present there:
https://tbpl.mozilla.org/php/getParsedLog.php?id=29922616&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=29922742&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=29922959&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=29923548&tree=Try
Depends on: 933551
The fix from bug 933226 seem to have "fixed" the bc failures, so I merged that around to the other branches and reopened things. 

May god have mercy on my soul.
Well the tree is open.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 28
No longer depends on: 932852
Attachment #824783 - Attachment is obsolete: true
As and when the dependent bugs land, we'll get them backported to aurora/beta if affected; leaving status-firefox27 set to affected until the last of that is complete.
NI on :edmorley to confirm if anything else is needed here for Firefox 27 ?
Flags: needinfo?(emorley)
(In reply to bhavana bajaj [:bajaj] from comment #23)
> NI on :edmorley to confirm if anything else is needed here for Firefox 27 ?

All done :-)
Flags: needinfo?(emorley)
Whiteboard: [see comment 11 for requirements for reopening] → [see comment 11 for requirements for reopening][qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: