Closed Bug 937997 Opened 11 years ago Closed 11 years ago

Trunk trees closed due to virtual address space fragmentation on Win 7 debug mochitest-BC (and M2?)

Categories

(Firefox :: General, defect)

x86
Windows 7
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
Firefox 28

People

(Reporter: philor, Unassigned)

References

(Depends on 4 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [MemShrink])

Attachments

(11 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #932781 +++

When bug 920978 starts with "uncaught exception - NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIZipReader.open] at chrome://mochikit/content/chrome-harness.js:271" that means we're OOM.

When the stuff we're piling onto bug 935419 says "Ran out of memory while building cycle collector graph" that pretty clearly means we're OOM.

Yesterday, we backed out bug 936143 and bug 933882 and the rest of the stuff in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=ae6f2151610f for ASan OOM failures, though there were also non-ASan OOM failures while it was in.

Today it relanded in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=9f3212effb9f, seemed sort of okayish, and we merged it to mozilla-central (though not yet on to fx-team and b2g-inbound).

We're hitting lots of both bug 920978 starting with OOM nsIZipReader.open, and bug 935419 again.

We can say yet again that it's all shu's fault, back him out, and go back to frequently hitting those two OOM failures, but maybe not so frequently that we have to notice, or we can fix the underlying fact that we're constantly on the edge of OOM even when we don't go over.

mozilla-inbound, mozilla-central and fx-team are closed.
Win7 b-c retriggers running in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&tochange=9f3212effb9f&fromchange=475bd77c3400 to maybe see just how much we should try to scapegoat shu for being the cause of all our problems, though we've certainly had both of the comment 0 signs of OOM while he was out as well as while he's in.
I was thinking of disabling the assertion in bug 935419, but I hadn't gotten around to it yet.  It does seem to have kicked up in frequency today for some reason.

I just pushed a patch to try with some debug messages that maybe will help figure out what is going on (hopefully it will compile): https://tbpl.mozilla.org/?tree=Try&rev=50d39c93fc73
The thing is, the assertion tells us that we are OOM, right? And without the assertion, we are OOM, we just don't know it?
Could fragmentation be killing us here?
Flags: needinfo?(continuation)
This is very frustrating for me, because I don't feel like there's much I can do to remedy the situation.

For our memory people who are debugging, you can back out https://hg.mozilla.org/integration/mozilla-inbound/rev/c6981912ff87 to trigger OOMs even faster. That patch was papering over the guaranteed ASan OOMs.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #5)
> Could fragmentation be killing us here?

I suppose.  The CC is failing to grow a hash table, and the hash table is a big array.  I'm not sure how we could figure out what is fragmentation or not.

(In reply to Shu-yu Guo [:shu] from comment #6)
> This is very frustrating for me, because I don't feel like there's much I
> can do to remedy the situation.

Well, M2 almost certainly can't be the fault of your patch, or at least I assume there's no debugger stuff in it.
Flags: needinfo?(continuation)
> I suppose.  The CC is failing to grow a hash table, and the hash table is a
> big array.  I'm not sure how we could figure out what is fragmentation or
> not.

During the last round of OOMs, the CC hash table allocations that were failing were only 8 MB, i.e. pretty small.  I don't know if that's the case this time around.  Instrumented try runs would be illuminating!
So we changed how pldhashtable works, right? Do we end up requiring larger continuous memory
areas or something?
Flags: needinfo?(n.nethercote)
> So we changed how pldhashtable works, right? Do we end up requiring larger
> continuous memory areas or something?

Nope.  The only change of note was that if we tried to grow a table (because it reached 75% full) and failed, previously we would allow it to fill up to 100% full.  I briefly changed that to just fail at the 75% OOM mark (bug 927705) but then shortly after I changed it again to allow up to 97% full (bug 933074).  I chose 97% because performance drops drastically if you get too close to 100%.

Don't focus on the pldhash failure.  Focus on all the stuff that happens earlier that causes us to reach the point that a small allocation can fail.  That's how we fixed the last round of TBPL OOMs.
Flags: needinfo?(n.nethercote)
So last time (Bug 932781) we had a list of bugs that we need to fix in order to reopen the turnk trees. I guess this time its this bug and maybe also bug 935419.
I have a local build which dumps about:memory for all devtools tests, since that's where the GCs disappeared from.

What path should I dump the about:memory .json.gz's to so that if I push this to try, those files can be downloaded?
01:12:18     INFO -  out of memory with graph entry count 507904

That seems way too low to be hitting the pldhash cap.  So we must be actually running out of memory (presumably out of virtual address space)?
Depends on: 932898
On a browser-chrome run on my local machine with about:memory dumping for devtools tests, the highest RSS I saw is about 660 MB. That doesn't seem high enough to warrant going OOM.
Attached patch memdump-bc.patch (obsolete) — Splinter Review
Dump about:memory to /tmp for every bc test.
Attached patch memdump-bc.patchSplinter Review
Dump about:memory to /tmp for every test in bc.
Attachment #831417 - Attachment is obsolete: true
Remove Cu.forceGC() from debugger tests to hit OOM faster.
Here is the sorted list by rss of browser-chrome tests on a local linux64 debug run. The numbers correspond to the index in the manifest. The errors are logs that weren't dumped correctly and had truncated JSON for some reason.

Now it's starting to look a little more like actual OOM territory. I will upload the .json.gz's for the top 3.
This is browser/devtools/shadereditor/test/browser_webgl-actor-test-09.js
This is browser/devtools/shadereditor/test/browser_webgl-actor-test-08.js
This is browser/devtools/shadereditor/test/browser_webgl-actor-test-14.js
It looks like the best lead so far are the WebGL tests. Large amount of heap-unclassified. What is that? Leaked shaders? Contexts?
Those are really high heap-unclassified numbers...is there any way you can build with DMD and get a DMD report right after each of those tests?  Maybe include the memory reporters in bug 915940 in those builds to try and understand what's using the graphics memory?
(In reply to Nathan Froyd (:froydnj) from comment #23)
> Those are really high heap-unclassified numbers...is there any way you can
> build with DMD and get a DMD report right after each of those tests?  Maybe
> include the memory reporters in bug 915940 in those builds to try and
> understand what's using the graphics memory?

I am falling asleep, so I won't be able to get to this for 8 hours.

Also, I don't know what DMD is or how to dump it from chrome JS. If you point me to instructions to how to build with it + how to dump it, I can collect that info.
(In reply to Shu-yu Guo [:shu] from comment #24)
> (In reply to Nathan Froyd (:froydnj) from comment #23)
> > Those are really high heap-unclassified numbers...is there any way you can
> > build with DMD and get a DMD report right after each of those tests?  Maybe
> > include the memory reporters in bug 915940 in those builds to try and
> > understand what's using the graphics memory?
> 
> I am falling asleep, so I won't be able to get to this for 8 hours.
> 
> Also, I don't know what DMD is or how to dump it from chrome JS. If you
> point me to instructions to how to build with it + how to dump it, I can
> collect that info.

Instructions for building, running, and analyzing with DMD can be found here: https://wiki.mozilla.org/Performance/MemShrink/DMD
Attached file sorted-timeline
Timeline of rss change
I haven't been able to convince my computer to run WebGL tests yet and running tests locally suffers from bug 938163, which clutters logs, slows things down, and all sorts of other problems.  DMD isn't turning up any interesting stacks, either.  The largest, most interesting one has been:

Unreported: 4 blocks in stack trace record 1 of 4,216
 9,584,640 bytes (9,584,640 requested / 0 slop)
 1.53% of the heap (1.53% cumulative);  14.22% of unreported (14.22% cumulative)
 Allocated at
   replace_malloc (/home/froydnj/src/mozilla-central-official/memory/replace/dmd/DMD.cpp:1227) 0x7ffeafcf4260
   mozilla::gfx::SurfaceToPackedBGRA(mozilla::gfx::SourceSurface*) (/opt/build/froydnj/build-mc/content/canvas/src/../../../dist/include/mozilla/gfx/DataSurfaceHelpers.h:50) 0x7ffeabc26abe
   mozilla::dom::CanvasRenderingContext2D::GetImageBuffer(unsigned char**, int*) (/home/froydnj/src/mozilla-central-official/content/canvas/src/CanvasRenderingContext2D.cpp:1078) 0x7ffeabc240e7
   mozilla::dom::CanvasRenderingContext2D::GetInputStream(char const*, char16_t const*, nsIInputStream**) (/home/froydnj/src/mozilla-central-official/content/canvas/src/CanvasRenderingContext2D.cpp:1090) 0x7ffeabc1ccea
   ~nsPromiseFlatString (/opt/build/froydnj/build-mc/content/canvas/src/../../../dist/include/nsTPromiseFlatString.h:61) 0x7ffeabc28bb9
   ~nsCOMPtr (/opt/build/froydnj/build-mc/content/canvas/src/../../../dist/include/nsCOMPtr.h:469) 0x7ffeabc28d25
   mozilla::dom::HTMLCanvasElement::ExtractData(nsAString_internal&, nsAString_internal const&, nsIInputStream**) (/home/froydnj/src/mozilla-central-official/content/html/content/src/HTMLCanvasElement.cpp:395) 0x7ffeabc7a553
   mozilla::dom::HTMLCanvasElement::ToDataURLImpl(JSContext*, nsAString_internal const&, JS::Value const&, nsAString_internal&) (/home/froydnj/src/mozilla-central-official/content/html/content/src/HTMLCanvasElement.cpp:469) 0x7ffeabc7aeca
   toDataURL (/opt/build/froydnj/build-mc/dom/bindings/HTMLCanvasElementBinding.cpp:253) 0x7ffeac3ed6d0
   genericMethod (/opt/build/froydnj/build-mc/dom/bindings/HTMLCanvasElementBinding.cpp:606) 0x7ffeac3f06a1
   CallJSNative (/home/froydnj/src/mozilla-central-official/js/src/jscntxtinlines.h:220) 0x7ffeacd5933b
   Interpret (/home/froydnj/src/mozilla-central-official/js/src/vm/Interpreter.cpp:2505) 0x7ffeacd4c325
   RunScript (/home/froydnj/src/mozilla-central-official/js/src/vm/Interpreter.cpp:420) 0x7ffeacd58f97
   SendToGenerator (/home/froydnj/src/mozilla-central-official/js/src/jsiter.cpp:1654) 0x7ffeacc8f435
   NativeMethod<js::LegacyGeneratorObject, legacy_generator_next> (/home/froydnj/src/mozilla-central-official/js/src/jsiter.cpp:1813) 0x7ffeacc8fa00
   CallJSNative (/home/froydnj/src/mozilla-central-official/js/src/jscntxtinlines.h:220) 0x7ffeacd5933b
   js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) (/home/froydnj/src/mozilla-central-official/js/src/vm/Interpreter.cpp:513) 0x7ffeacd5b4ed
   js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (/home/froydnj/src/mozilla-central-official/js/src/jsproxy.cpp:468) 0x7ffeacccb518
   js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (/home/froydnj/src/mozilla-central-official/js/src/jswrapper.cpp:457) 0x7ffeacd17ecf
   js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) (/home/froydnj/src/mozilla-central-official/js/src/jsproxy.cpp:2657) 0x7ffeaccd2ef9
   proxy_Call (/home/froydnj/src/mozilla-central-official/js/src/jsproxy.cpp:3065) 0x7ffeaccd2fef
   CallJSNative (/home/froydnj/src/mozilla-central-official/js/src/jscntxtinlines.h:220) 0x7ffeacd5947d
   Interpret (/home/froydnj/src/mozilla-central-official/js/src/vm/Interpreter.cpp:2505) 0x7ffeacd4c325
   RunScript (/home/froydnj/src/mozilla-central-official/js/src/vm/Interpreter.cpp:420) 0x7ffeacd58f97

which appears about halfway through the test run.  Possibly a leak, not sure.  (Apologies for the bogus frames in there, I'm not quite sure what the problem is there...I have a suspicion that the linker is screwing with line information, need to go track that down.)

bug 922094 also could have some memory reporter love applied to it, we seem to be allocating quite a number of proto/iface caches.

None of these look significant enough to cause a leak.  I have to go put together some interview questions, but I'll return to coercing my machine into running WebGL tests later this afternoon.
The about:memory dumps look to be hovering at around 660MB, not nearly enough to cause issues.
Thanks to whoever retriggered my try push a bunch of times.  There were no M2 failures, which is odd.  There are a total of 3 failures on BC, all in chrome://mochitests/content/browser/toolkit/mozapps/extensions/test/browser/browser_discovery.js
  out of memory with graph entry count 126976
  out of memory with graph entry count 126976
  out of memory with graph entry count 507904

Those first two are particularly small.  The CC had succeeded many times prior to that with hundreds of thousands of more things in the graph, so it does sound like a general problem with running out of memory, rather than the cycle collector in particular using a bizarrely huge amount of memory.

I'm still confused by why Win7 is showing problems while WinXP isn't.
Depends on: 938016
I'm currently waiting on an instrumented linux64 debug browser-chrome run that will hopefully dump DMD reports for the WebGL tests.
Depends on: 938310
Depends on: 938311
I filed bug 938310 and bug 938311 for the memory usage of the shader editor and tilt tests.  From shu's logs, these are the only BC tests that go above 1gig of RSS memory, aside from a few tests after the shadereditor where the memory remains high.

The one consistent CC OOM we're hitting in BC is in toolkit/mozapps/extensions/test/browser/browser_discovery.js, which is not right after either of those tests.  It is near the end of the test sequence, and it is a little ways after the tilt tests, but it isn't obvious why this would we would consistently fail here.  According to shu's logs, RSS is only around 650mb then, which isn't particularly high, so we're probably failing due to accumulated fragmentation or something.

bsmedberg said that with a minidump we could examine how bad the fragmentation is.
philor points out that the shadereditor tests aren't run on Linux (bug 931344), which would explain why we wouldn't see failures there, if that's a cause.

Also this:

[1:14pm] philor: "browser_tilt_gl08.js | Skipping tilt_gl08 because WebGL isn't supported on this hardware."
If you talk about virtual memory fragmentation, do not forget about the (recently disabled) gpu-committed reporter or simply just GPU committed memory. GPU allocations in windows take out reservation in the calling process's virtual memory in case another program requires the GPU to page its memory out.

So here you have been talking about WebGL and other graphics tests so think there is a high chance these allocation are also helping lead to this OOM (on Win7/8 at least. WinXP did things a little differently).
These failures are only happening on Win7, not XP.
Depends on: 938350
Try run with all Tilt and Shadereditor tests disabled:
  https://tbpl.mozilla.org/?tree=Try&rev=f99d4854b126
You can find the about:memory and DMD dumps for the top 3 browser-chrome tests sorted by RSS, starting w/ the shadereditor tests, here: http://rfrn.org/~shu/dmddump.tar

DMD of these 3 tests (browser_webgl-actor-test-10.js and others) shows that 70+% of the heap is unreported, and that most of them come from the *GL driver* code. Here's an excerpt:

Unreported: 216 blocks in stack trace record 8 of 5,307
 10,616,832 bytes (10,468,224 requested / 148,608 slop)
 1.39% of the heap (73.40% cumulative);  1.51% of unreported (79.65% cumulative)
 Allocated at
   replace_memalign (/home/shu/moz/inbound/memory/replace/dmd/DMD.cpp:1302) 0x7f5c7dba913b
   replace_posix_memalign (/home/shu/moz/inbound/obj-x86_64-unknown-linux-gnu/memory/replace/dmd/../../../dist/include/replace_malloc.h:119) 0x7f5c7dba90a7
   _mesa_align_malloc (/usr/lib/libdricore9.2.2.so.1) 0x7f5c559fde21
   _mesa_vector4f_alloc (/usr/lib/libdricore9.2.2.so.1) 0x7f5c55a67930
   ??? (/usr/lib/libdricore9.2.2.so.1) 0x7f5c55a9ccfd
   _tnl_install_pipeline (/usr/lib/libdricore9.2.2.so.1) 0x7f5c55a9013b
   _tnl_CreateContext (/usr/lib/libdricore9.2.2.so.1) 0x7f5c55a8fc4c
   ??? (/usr/lib/xorg/modules/dri/i965_dri.so) 0x7f5c55f39142
   ??? (/usr/lib/xorg/modules/dri/i965_dri.so) 0x7f5c55f53bac
   ??? (/usr/lib/xorg/modules/dri/i965_dri.so) 0x7f5c55f3fb16
   ??? (/usr/lib/xorg/modules/dri/i965_dri.so) 0x7f5c55fd38b0
   ??? (/usr/lib/xorg/modules/dri/i965_dri.so) 0x7f5c55fd3a55
   ??? (/usr/lib/libGL.so.1) 0x7f5c70d0f877
   ??? (/usr/lib/libGL.so.1) 0x7f5c70cea233
   glXCreateNewContext (/usr/lib/libGL.so.1) 0x7f5c70cea4aa
   mozilla::gl::GLXLibrary::xCreateNewContext(_XDisplay*, __GLXFBConfigRec*, int, __GLXcontextRec*, int) (/home/shu/moz/inbound/gfx/gl/GLContextProviderGLX.cpp:591) 0x7f5c79515c76
   mozilla::gl::GLContextGLX::CreateGLContext(mozilla::gfx::SurfaceCaps const&, mozilla::gl::GLContextGLX*, bool, _XDisplay*, unsigned long, __GLXFBConfigRec*, bool, mozilla::gl::GLXLibrary::LibraryType, gfxXlibSurface*) (/home/shu/moz/inbound/gfx/gl/GLContextProviderGLX.cpp:798) 0x7f5c79516b3f
   mozilla::gl::CreateOffscreenPixmapContext(nsIntSize const&, mozilla::gl::GLXLibrary::LibraryType) (/home/shu/moz/inbound/gfx/gl/GLContextProviderGLX.cpp:1401) 0x7f5c795168e8
   mozilla::gl::GLContextProviderGLX::CreateOffscreen(nsIntSize const&, mozilla::gfx::SurfaceCaps const&, mozilla::gl::ContextFlags) (/home/shu/moz/inbound/gfx/gl/GLContextProviderGLX.cpp:1417) 0x7f5c7951659c
   nsRefPtr<mozilla::gl::GLContext>::assign_assuming_AddRef(mozilla::gl::GLContext*) (/home/shu/moz/inbound/obj-x86_64-unknown-linux-gnu/content/canvas/src/../../../dist/include/nsAutoPtr.h:872) 0x7f5c7837268f
   mozilla::dom::HTMLCanvasElement::UpdateContext(JSContext*, JS::Handle<JS::Value>) (/home/shu/moz/inbound/content/html/content/src/HTMLCanvasElement.cpp:802) 0x7f5c783f1811
   mozilla::dom::HTMLCanvasElement::GetContext(JSContext*, nsAString_internal const&, JS::Handle<JS::Value>, mozilla::ErrorResult&) (/home/shu/moz/inbound/content/html/content/src/HTMLCanvasElement.cpp:714) 0x7f5c783f3331
   mozilla::dom::HTMLCanvasElementBinding::getContext(JSContext*, JS::Handle<JSObject*>, mozilla::dom::HTMLCanvasElement*, JSJitMethodCallArgs const&) (/home/shu/moz/inbound/obj-x86_64-unknown-linux-gnu/dom/bindings/./HTMLCanvasElementBinding.cpp:203) 0x7f5c790956b2
   mozilla::dom::HTMLCanvasElementBinding::genericMethod(JSContext*, unsigned int, JS::Value*) (/home/shu/moz/inbound/obj-x86_64-unknown-linux-gnu/dom/bindings/./HTMLCanvasElementBinding.cpp:605) 0x7f5c79094c7c
Note that the above is on Linux, using the open source Intel drivers. If Win7's handling of this allocates more memory and it goes unreported, it is likely our culprit.
Depends on: 938411
New try run that hopefully actually compiles that disables the two test suites:
  https://tbpl.mozilla.org/?tree=Try&rev=defffd947938
Another patch that serializes an about:memory dump into the TBPL log so we can get some insight into memory on Windows when it fails:
  https://tbpl.mozilla.org/?tree=Try&rev=67343bf0c70f
The goal here would be to confirm that there is high heap-unclassified, like we see on Linux.

dmajor is working on getting things running on an actual Windows test slave, so we can get a minidump to analyze, or attach a debugger or something, and try to figure out what is going on.

Kyle is looking at making us clear out memory used by WebGL more aggressively in bug 938411, which might help avoid the OOMs.
Depends on: 923614
Depends on: 920978
Since Andrew's try push (https://tbpl.mozilla.org/?tree=Try&rev=defffd947938) showed that disabling the WebGL-related devtools tests doesn't fix the OOM, I'm going to rerun my instrumented build locally with those tests disabled as well, to see if there's anything new.
(In reply to Shu-yu Guo [:shu] from comment #37)
> Note that the above is on Linux, using the open source Intel drivers. If
> Win7's handling of this allocates more memory and it goes unreported, it is
> likely our culprit.

I Win7 machines do use some amount of extra memory that goes unreported, very strongly dependent upon which GPU drivers they're running.
Does it really make sense to keep the tree close when we're pretty certain that backing out bug 933882 would make the problem go away? I actually think bug 933882 is really important and we need it to land soon. But I don't think it makes sense to hold up everyone else's work on that.

As far as I can tell, the only purpose served by keeping the tree closed it to keep up the pressure to ensure that leaks are fixed. But bug 933882 will have to land somehow eventually. That should serve as sufficient motivation in my opinion.
(In reply to Bill McCloskey (:billm) from comment #41)
> Does it really make sense to keep the tree close when we're pretty certain
> that backing out bug 933882 would make the problem go away? I actually think
> bug 933882 is really important and we need it to land soon. But I don't
> think it makes sense to hold up everyone else's work on that.
> 
> As far as I can tell, the only purpose served by keeping the tree closed it
> to keep up the pressure to ensure that leaks are fixed. But bug 933882 will
> have to land somehow eventually. That should serve as sufficient motivation
> in my opinion.

That won't fix M2 though.
M2 isn't too badly off (see https://tbpl.mozilla.org/?tree=Try&rev=50d39c93fc73 ).  Note also nobody has done any investigation of M2.  Its failures are rare enough it may be difficult to investigate, without landing some diagnostic code on an open tree.
As long as someone is willing to continue looking at this if we reopen so I can reland bug 933882. I don't think I can fix whatever the problem is by myself.
With all shadereditor and tilt tests disabled, the maximum RSS reached on linux64 debug is 822947840 bytes. Doesn't seem that high; kind of out of ideas.
Since it looks to be almost ready, let's see if Kyle's patch fixes the problem. If so, great. If not, then I think we should back out.
Try run that prints out RSS after every bc test: https://tbpl.mozilla.org/?tree=Try&rev=852720fd512c
My patch is unfortunately not almost ready.  Perhaps we should proceed with the backout.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #48)
> My patch is unfortunately not almost ready.  Perhaps we should proceed with
> the backout.

ok taking as sheriff-on-duty and working on the backout
I am very much uncomfortable with reopening - as philor so excellently phrased it when this issue first occurred, Shu's landing was merely the breeze that sent us over the OOM cliff on the edge of which we were already teetering. 

Until we resolve the root cause, every new intermittent hang/timeout is going to either be blamed on this issue (and thus potentially ignored, even if unrelated), or else result in another tree closure - in which case giving ourselves a false sense of security by reopening gains us little.

At the least, we should wait until: 
* The high heap-unclassified in bug 938310 is correctly identified (and a followup bug filed to add the necessary about:memory reporters).
* Either bug 938411 or similar fixed to improve the high RSS found in bug 938310 and dupes.
* We land something similar to parts of https://hg.mozilla.org/try/rev/50d39c93fc73#l1.13 so we assert in debug builds on OOM rather than having to remember which random browser-chrome tests fail when we reach OOM conditions (a la comment 0).
Note: To anyone trying to debug this, if using inbound, use 15c617927012 (pre comment 50 backout) rather than tip to increase the likelihood of reproducing.

dmajor - any luck using the machine borrowed from releng?
Flags: needinfo?(dmajor)
(In reply to Nathan Froyd (:froydnj) from comment #25)
> Instructions for building, running, and analyzing with DMD can be found
> here: https://wiki.mozilla.org/Performance/MemShrink/DMD

just a note on this. For anyone trying to build a Windows7 Debug Build with DMD this will fail with a build failure but Bug 938526 (and the patch in there) is supposed to fix this
No longer depends on: 938310
Depends on: 938587
I did a ton of retriggers for M2 and BC on inbound tip to see where we're at.

(In reply to Ed Morley [:edmorley UTC+1] from comment #51)
> I am very much uncomfortable with reopening - as philor so excellently
> phrased it when this issue first occurred, Shu's landing was merely the
> breeze that sent us over the OOM cliff on the edge of which we were already
> teetering. 
> 
> Until we resolve the root cause, every new intermittent hang/timeout is
> going to either be blamed on this issue (and thus potentially ignored, even
> if unrelated), or else result in another tree closure - in which case giving
> ourselves a false sense of security by reopening gains us little.
> 
> At the least, we should wait until: 
> * The high heap-unclassified in bug 938310 is correctly identified (and a
> followup bug filed to add the necessary about:memory reporters).

Comment 36 identified the source of high heap-unclassified for the WebGL devtools tests.  I agree it would be good to have a followup bug about it.  We still don't really know what the situation looks like on Windows, in terms of heap-unclassified.  Windows could be better or worse.  (Relatedly, I filed bug 938574 to try to come up with a way to get about:memory dumps from TBPL without having to run things locally like shu was forced to do.)

> * Either bug 938411 or similar fixed to improve the high RSS found in bug
> 938310 and dupes.

Disabling the WebGL developer tools tests didn't seem to help much.  In addition, the failures in bug 938016 aren't happening anywhere near those tests.  (I need to spend some time looking at where the other failures are happening.)  So, while it might be nice to improve somehow, I don't think the WebGL devtools tests have too much effect on the health of the tree, and I removed the dependency.

> * We land something similar to parts of
> https://hg.mozilla.org/try/rev/50d39c93fc73#l1.13 so we assert in debug
> builds on OOM rather than having to remember which random browser-chrome
> tests fail when we reach OOM conditions (a la comment 0).

For that particular assert, we already do that:
  http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#2157
It is just a matter of whether we assert immediately or later on.

The other common failure is bug 920978.  It looks like we're failing in the test harness.  I'm not sure what can be done to make it clearer what is failing there.  I filed bug 938581 to make the test fail immediately rather than timing out, though I think for our purposes here that doesn't matter too much, as TBPL makes it clear the nsIZipReader open is the underlying problem.  I filed bug 938587 for making the error message more useful.  It isn't clear to me what to do there, but maybe somebody who understands the code can figure something out.
No longer depends on: 938587
Depends on: 938587
Bug 923614 may be related, but in my extensive retriggering campaign, I haven't seen it more than once or twice.

My retriggering on tip hit bug 920978 once on bc, but that's the only symptom from this latest round of problems that I've seen.
No longer depends on: 923614
So, can we reopen now?
No - we still haven't found or fixed the root cause as I understand it
(In reply to comment #57)
> No - we still haven't found or fixed the root cause as I understand it

I'm not sure if I understand this correctly, but did the backouts help?  If so, what is the justification of keeping the tree closed before we investigate the source of the OOMs with the backed out patches applied?
Well, personally, I don't really see the value in holding the trees closed for longer.  I'll continue investigating Mochitest memory usage, but shu's patch altered how GC and the debugger interact, so it isn't too surprising that BC (the test suite that deals with the debugger) started running out of memory.  It didn't seem to affect anything else.

The only reason the assertion in bug 935419 is happening now is because I landed a patch to produce that assertion.  We're better off there then we were a month ago, as it was permaorange when it initially landed.
1 and 3 are fixed, and there's no evidence that the high RSS from the WebGL devtools is involved with any of the failures we've seen.
(by "fixed", I mean "have been addressed")
(Comment 60 was directed at comment 58, got mid-aired by comment 59)

Are we absolutely certain the issues in the first two paragraphs of comment 51 are not the case? This is now the second OOM tree closure in ~1 week and I find it unlikely that we're not just going to have a 3rd in a few days if we decide to plough on regardless.

The onus shouldn't be on people having to justify why to keep a broken tree closed, but on the others to justify why to insist on reopening it regardless.
(In reply to comment #63)
> (Comment 60 was directed at comment 58, got mid-aired by comment 59)
> 
> Are we absolutely certain the issues in the first two paragraphs of comment 51
> are not the case? This is now the second OOM tree closure in ~1 week and I find
> it unlikely that we're not just going to have a 3rd in a few days if we decide
> to plough on regardless.
> 
> The onus shouldn't be on people having to justify why to keep a broken tree
> closed, but on the others to justify why to insist on reopening it regardless.

What's your definition of a broken tree?  One that has bugs?  If yes, then we can never reopen the tree!  Otherwise, can we reopen now and close it again when the next symptom of the OOM shows up?
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #64)
> What's your definition of a broken tree?  One that has bugs?  If yes, then
> we can never reopen the tree!  Otherwise, can we reopen now and close it
> again when the next symptom of the OOM shows up?

I would like to think us above straw man arguments please.

I don't see anything in this bug so far that implies we've fixed this OOM - so it's not a case of waiting for the next one to show up. If that's not the case, please may someone summarise how the concerns in comment 0 and comment 51 are no longer the case and why?
Depends on: 938612
s/fixed this OOM/fixed our-dangerously-near-OOM state/
Depends on: 938682
It would be nice if we came up with a criteria on when to reopen the tree here then.
How do we know which tests are dangerously near OOM?  Is a memory high-water-mark reported for most tests?
I filed bug 938682 for making OOMs easier to distinguish from other oranges.
(In reply to Jesse Ruderman from comment #68)
> How do we know which tests are dangerously near OOM?  Is a memory
> high-water-mark reported for most tests?

It isn't clear any of these failures have anything to do with high water marks.  Bug 920978 and bug 938016 happen 20 minutes after the WebGL devtools tests, which are the points of highest memory usage in BC, judging by Linux64.

Interestingly, browser_dragdrop.js (bug 920978) happens right after browser_discovery.js (bug 938016), so something is clearly going wrong during those tests.  Why don't we see anything one test earlier or later?
(In reply to Ed Morley [:edmorley UTC+0] from comment #52)
> Note: To anyone trying to debug this, if using inbound, use 15c617927012
> (pre comment 50 backout) rather than tip to increase the likelihood of
> reproducing.
> 
> dmajor - any luck using the machine borrowed from releng?

No hits yet. I've only been able to do about 4 runs on the loaner so far -- my overnight runs got tripped up by a debugger issue. I am using Andrew's try run from https://tbpl.mozilla.org/?rev=50d39c93fc73&tree=Try.
Flags: needinfo?(dmajor)
> just a note on this. For anyone trying to build a Windows7 Debug Build with
> DMD this will fail with a build failure but Bug 938526 (and the patch in
> there) is supposed to fix this

Even with that patch applied, I don't think DMD will work on Windows.  Bug 819839 is open about that.  I suspect it wouldn't require many changes to get it working, but I'm not certain.
Depends on: 915940
FWIW I knocked together this script to help me visualize how the resident memory is changing as tests run in one of the logs philor pointed me to:

http://people.mozilla.org/~jwatt2/resident.html

The memory steps down to 7/8% of its peak at various points, which seems to correspond with the points in the log when there are additional "+++++++ RESIDENT" lines at the /start/ of a test load (in addition to the "+++++++ RESIDENT" lines that all tests get when they /finish/). (Why don't we dump out the resident memory at the start of all test loads?) Anyway, the thing that I found interesting is that even for later tests the resident memory is still dropping down to 7% of peak, which seems to indicate that some tests are just using a lot of memory rather than there actually being leaks...at least for this particular test run.
Since this seems to be Windows only, I wonder if this is caused by some odd Anti-Virus software interaction.
I doubt the test slaves have anti-virus installed.  My current theory is the problem is Win7 only because it is 32-bit, and the GPU committed memory mentioned in comment 33 that is not in XP, and the combination of the two is causing heap fragmentation which is make large-ish allocations of a few megabytes fail towards the end of the run.
If the problem is really heap fragmentation, I'm not sure how we can fix it, besides splitting up bc.  Maybe forcing more GCs or something.
(In reply to Jonathan Watt [:jwatt] from comment #73)
> FWIW I knocked together this script to help me visualize how the resident
> memory is changing as tests run in one of the logs philor pointed me to:
> 
> http://people.mozilla.org/~jwatt2/resident.html
> 
> The memory steps down to 7/8% of its peak at various points, which seems to
> correspond with the points in the log when there are additional "+++++++
> RESIDENT" lines at the /start/ of a test load (in addition to the "+++++++
> RESIDENT" lines that all tests get when they /finish/). (Why don't we dump
> out the resident memory at the start of all test loads?) Anyway, the thing
> that I found interesting is that even for later tests the resident memory is
> still dropping down to 7% of peak, which seems to indicate that some tests
> are just using a lot of memory rather than there actually being leaks...at
> least for this particular test run.

Some of those ++++++++++ RESIDENT lines are from a test that spawns a new child process, and those sudden drops shouldn't be counted. The log doesn't print the process id (and probably should, but it's a pretty dirty hack).
(In reply to Andrew McCreight [:mccr8] from comment #76)
> If the problem is really heap fragmentation, I'm not sure how we can fix it,
> besides splitting up bc.

Bug 819963

Maybe we should speed this up if we think it'd resolve the problem?
That just takes us from "we have to fix an OOM failure that we can't figure out how to fix" to "we have to fix an incomprehensible leak that only happens when we run the last one-third of browser-chrome without having run the first two-thirds," https://tbpl.mozilla.org/php/getParsedLog.php?id=30572971&tree=Cedar
If the problem is really heap fragmentation, hiding the problem on tbpl by splitting bc won't make it go away on user machines. Because after all, if it happens on bc, there's no reason it wouldn't happen to a user. Has someone looked at crash stats to identify crashes that could be related to this?
Attached file vadumps-from-bc.zip
Virtual address space maps from !vadump at various points along a BC run on a Windows 7 releng slave. The numbers in the filenames are my estimate of how far along the run is.

I haven't dug into this too deeply yet. This stuff might be more easily interpreted using bsmedberg's graphing tools.

But from what I've seen so far: at 40% through the test, there are a good number of dozens-of-MB free VA regions, and a few hundreds-of-MB regions. At the 90% mark, the largest contiguous free virtual region is 40MB, and the next largest are 7MB, 5MB, 3MB. That could certainly make it difficult to perform large allocations.

(This is only looking at virtual memory regions; there may be mapped regions that are "available" from an allocator's perspective, though presumably allocators return huge chunks if they're not needed)
> At the 90% mark, the largest contiguous free virtual region is 40MB, and the
> next largest are 7MB, 5MB, 3MB.

This is smelling a lot like bug 859955.
(In reply to Mike Hommey [:glandium] from comment #80)
> If the problem is really heap fragmentation, hiding the problem on tbpl by
> splitting bc won't make it go away on user machines.

User machines don't open and close multiple pages a second continuously for 80 minutes.
(In reply to Andrew McCreight [:mccr8] from comment #75)
> I doubt the test slaves have anti-virus installed.  My current theory is the
> problem is Win7 only because it is 32-bit, and the GPU committed memory
> mentioned in comment 33 that is not in XP, and the combination of the two is
> causing heap fragmentation which is make large-ish allocations of a few
> megabytes fail towards the end of the run.

Well if the slaves do not have A/V software installed and running during tests, then exactly how is the A/V integration to scan downloaded files adequately tested?
(In reply to Bill Gianopoulos [:WG9s] from comment #84)
> (In reply to Andrew McCreight [:mccr8] from comment #75)
> > I doubt the test slaves have anti-virus installed.  My current theory is the
> > problem is Win7 only because it is 32-bit, and the GPU committed memory
> > mentioned in comment 33 that is not in XP, and the combination of the two is
> > causing heap fragmentation which is make large-ish allocations of a few
> > megabytes fail towards the end of the run.
> 
> Well if the slaves do not have A/V software installed and running during
> tests, then exactly how is the A/V integration to scan downloaded files
> adequately tested?

It was never adequately tested, is no longer called (see https://mail.mozilla.org/pipermail/firefox-dev/2013-August/000848.html for why), and should be deleted.
(In reply to Andrew McCreight [:mccr8] from comment #83)
> (In reply to Mike Hommey [:glandium] from comment #80)
> > If the problem is really heap fragmentation, hiding the problem on tbpl by
> > splitting bc won't make it go away on user machines.
> 
> User machines don't open and close multiple pages a second continuously for
> 80 minutes.

Sure, but you can easily imagine this sort of thing affecting desktop or phone users over a longer time period.
(In reply to Nicholas Nethercote [:njn] from comment #82)
> > At the 90% mark, the largest contiguous free virtual region is 40MB, and the
> > next largest are 7MB, 5MB, 3MB.
> 
> This is smelling a lot like bug 859955.

bsmedberg, does anything in vadumps-from-bc.zip stick out to you? I didn't see any of the extreme "waste 15/16ths" patterns that have come up before, although there are a few regions where it alternates 1MBused/1mbfree back and forth for a while.
Oops - should have set needinfo for comment 87
Flags: needinfo?(benjamin)
(In reply to Nicholas Nethercote [:njn] from comment #82)
> > At the 90% mark, the largest contiguous free virtual region is 40MB, and the
> > next largest are 7MB, 5MB, 3MB.
> 
> This is smelling a lot like bug 859955.

Indeed.
Summary: Trunk trees closed due to OOMs → Trunk trees closed due to virtual address space fragmentation on Win 7 debug mochitest-BC (and M2?)
1) I am perversely super duper excited that this is hitting machines that we have direct access to. Can anyone provide details on the hardware and software setup of these machines, especially graphics card/driver/version and Windows D2D/D3D library versions?
2) I'm happy to create some charts of the vadump data as I've done in the past. Is this machine running win32, because the memory info appears to stop at 2G.
3) Visualizations are unlikely to actually help much, though. What we really want to have is the stack at the time the "leaked" shared-memory block is mapped into our process. I've tried to do this using breakpoint logging or hooking the virtualalloc/mapviewoffile functions in the past and haven't gotten much success, but I'm hoping dmajor will be able to pull off the necessary miracles!
Flags: needinfo?(benjamin)
Depends on: 859955
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #90)
> 1) I am perversely super duper excited that this is hitting machines that we
> have direct access to. Can anyone provide details on the hardware and
> software setup of these machines, especially graphics card/driver/version
> and Windows D2D/D3D library versions?

cc'ing Releng since i guess they have all this data
Blocks: 857427
Attached file DxDiag.txt
DxDiag from my releng loaner
The patch on 859955 didn't help. It should be noted that this does show that somewhat mysterious memory usage can come from GPU memory usage on some driver/device combinations. So it could still be that something like the image cache is holding on to too many images or something like that.

We do expect to see problems with this more quickly on windows 7, as explained in bug 859955 we use twice as much address space there for cached images on the effected driver/device combinations compared to WinXP.
OK another interrupt with a probably stupid question.  Are these Win 7 tests running on a 32 or 64 bit Win 7 OS?  I realize they are running 32-bit Firefox, but the overwhelming majority of users actually running Windows 7 are doing so on 64-bit hardware running a 64-bit OS.  If that is not how are tests are running then this might need a re-think.  However since 32-bit Firefox is what is supported, 32-bit builds are what should be tested just on a 64-bit OS.
The Win7 test slaves are 32bit.
Whiteboard: [MemShrink]
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #95)
> The Win7 test slaves are 32bit.

well then, by the time Windows 7 came out the systems being sold were all 64-bit with Windows 7 pre-installed.  If we are testing on 32-bit Swindows 7 then we are not really testing anything real users actually run.

This is like testing 64-bit Windows/XP yet another thing no one ever really ran.
So, it seems we have no test on what real users are running which is 32-bit Firefox under 64-bit Windows 7 operating system.  But we have the tree closed on things very few run which is test failures on Firefox 32-bit on 32-bit Windows 7???  Sorry, but seems we are concentrating on the wrong issue here.
Oh and the reason this is significant is that the GPU memory will subtract from a 32-bit app memory available on a 32-bit OS.  But on a 64-bit OS will be independent.
Is the info that dmajor provided on comment 92 sufficient?
For hardware specs is better to ask arich or Q, even though I believe dmajor can obtain that from inspecting the device manager.
Flags: needinfo?(benjamin)
(In reply to Bill Gianopoulos [:WG9s] from comment #97)
> So, it seems we have no test on what real users are running which is 32-bit
> Firefox under 64-bit Windows 7 operating system.  But we have the tree
> closed on things very few run which is test failures on Firefox 32-bit on
> 32-bit Windows 7???  Sorry, but seems we are concentrating on the wrong
> issue here.

The operating system doesn't seem relevant here: what's relevant is how much address space the process can have.  And even though (IIRC) 32-bit processes have a larger usable address space under 64-bit windows, it looks likely that we'd be hitting address space fragmentation there too.  And whether that'd be right now or some point down the road, we still have something in the software that needs to be fixed.
(In reply to Nathan Froyd (:froydnj) from comment #100)
> (In reply to Bill Gianopoulos [:WG9s] from comment #97)
> > So, it seems we have no test on what real users are running which is 32-bit
> > Firefox under 64-bit Windows 7 operating system.  But we have the tree
> > closed on things very few run which is test failures on Firefox 32-bit on
> > 32-bit Windows 7???  Sorry, but seems we are concentrating on the wrong
> > issue here.
> 
> The operating system doesn't seem relevant here: what's relevant is how much
> address space the process can have.  And even though (IIRC) 32-bit processes
> have a larger usable address space under 64-bit windows, it looks likely
> that we'd be hitting address space fragmentation there too.  And whether
> that'd be right now or some point down the road, we still have something in
> the software that needs to be fixed.

Well you missed my point on a 32-biot os the memory assigned to the graphics card makes less memory available to the application.  On a 64 -bit OS it would not be.

Because since the graphics memory needs to be available to both the graphics adapter and your app because it is trying to talk directly to the GPU to accelerate graphics.
Bill:  bad stuff is happening.  It needs to be fixed.  Please stop the distraction.
But if we have the tree closed for an issue under a scenario that real users do not use because the number of Win 7 32-bit os users is fairly close to zero and are not doing tests under Win 7 64-bit that most win 7 users actually run, that is a real issue.
You can raise it somewhere else.  Please leave this bug alone.
No longer blocks: 857427
Depends on: 939137
(In reply to Hugh Nougher [:Hughman] from comment #33)
> If you talk about virtual memory fragmentation, do not forget about the
> (recently disabled) gpu-committed reporter or simply just GPU committed
> memory. GPU allocations in windows take out reservation in the calling
> process's virtual memory in case another program requires the GPU to page
> its memory out.
> 
> So here you have been talking about WebGL and other graphics tests so think
> there is a high chance these allocation are also helping lead to this OOM
> (on Win7/8 at least. WinXP did things a little differently).

Just for the record, this is only true in a select subset of drivers/devices. On most up to date drivers (sadly not on some stock drivers), the driver will reserve physical memory directly and not actually map into the calling process' address space.
Flags: needinfo?(benjamin)
In the 90.txt case, the largest available block of memory is 0x2d20000 bytes (47MB) large. There does not appear to be the huge number of identical mapped-memory blocks that I saw in bug 859955, but there are a large number of PRIVATE memory blocks that may represent graphics buffers. As dmajor mentioned, there is some significant VM fragmentation (1MB free, 1MB allocated) in several large blocks.
Attached image vmmap.png
Here's a vmmap timeline of a Win7 BC run on a releng machine. It seems that the VA fragmentation is itself just a symptom of high memory usage. Here are some things I noticed:

- Heap VA usage creeps up over time and almost never shrinks. Can't tell from this data whether that's due to long-held allocations or fragmentation or what.

- Private Data seems to have a ton of 1MB and 64KB blocks going in and out, but many of them stick around and accumulate. In yesterday's !vadumps we saw that the 1MB blocks are padded by 1MB free blocks, so really they consume 2MB of VA.

- The sawtooth is Mapped File data: half of the increase comes from tests.jar files that are cleaned up at the cliff, but the other half is 58MB worth of large font files (mincho etc) that stick around. I only see each font once, so it's not the duplication that came up in bug 617266.

- Image is pretty stable except for one uptick when D3D/EGL/Nvidia modules are loaded.

- I haven't gotten an actual OOM on this machine, but I'm told they happen almost at the end. That seems to coincide with the high point on this timeline. At peak usage, the largest available VA blocks in this run were 130MB, 23MB, 13MB.

Next I'm going to try getting some stacks on those heap and private allocations.
Attached file vmmap-data.zip
Here's the full vmmap log that generated the timeline above. I forgot to mention: I missed the first minute or two of the run.
Trees were reopened just before 7PM Pacific Time. Things seem to be looking okay so far.
So we will try to fix this while tree is open?
So, just to ask yet another stupid question.  Is it known if this is an issue with the browser running out of memory, or is it a test harness out of memory issue?
No longer depends on: 932898
(In reply to Wes Kocher (:KWierso) from comment #110)
> Trees were reopened just before 7PM Pacific Time. Things seem to be looking
> okay so far.

Please can you give some more context on this?
We generally try to keep bugzilla up to date for both reasons for closure and reasons for openings, since not everyone is on IRC is US timezones...
s/is US/in US/

I can only see a sentence or two on IRC in #developers:
http://logbot.glob.com.au/?c=mozilla%23developers&s=15+Nov+2013&e=18+Nov+2013#c814030

...guessing the conversations happened in #memshrink or similar (which aren't logged ) :-(

Nathan, could you give some context - since it seems like you requested that Wes reopen the tree?
Flags: needinfo?(nfroyd)
Flags: needinfo?(kwierso)
The postmortem for this tree closure/the issues in this bug has been started at:
https://etherpad.mozilla.org/LPgqYuvFJn
(In reply to Ed Morley [:edmorley UTC+0] from comment #114)
> s/is US/in US/
> 
> I can only see a sentence or two on IRC in #developers:
> http://logbot.glob.com.au/
> ?c=mozilla%23developers&s=15+Nov+2013&e=18+Nov+2013#c814030
> 
> ...guessing the conversations happened in #memshrink or similar (which
> aren't logged ) :-(
> 
> Nathan, could you give some context - since it seems like you requested that
> Wes reopen the tree?

My understanding of the situation was that we needed two things:

- We needed M-2 and M-bc to stop being so orange;
- We needed some semblance of instrumentation so we could tell what was going on with those tests, and that could drive further fixes.

smaug's fixes satisfied the orange-ness requirement, as far as I can see.

RyanVM had requested #2 as a prereq for opening the tree, since just saying "welp, tests are green again, let's reopen!" doesn't really move us to addressing the issues.  And I don't think it would have been reasonable to try to fix all the tests on a closed tree.

So once bug 939137 had landed and those two test suites looked greener, the tree was reopened.
Flags: needinfo?(nfroyd)
That's great - thank you :-)

Which specific bugs fixed #1 - just bug 938945?
(In reply to Ed Morley [:edmorley UTC+0] from comment #117)
> That's great - thank you :-)
> 
> Which specific bugs fixed #1 - just bug 938945?

That's correct.
(In reply to Ed Morley [:edmorley UTC+0] from comment #117)
> Which specific bugs fixed #1 - just bug 938945?

The major thing was backing out shu's patch.  smaug's patch probably helped with the M2 shutdown problem.
I pushed a try build today that relanded bug 933882 *and* took out forceGCs in the devtools/debugger test harness. This resulted in a bunch of "ran out of memory in CC" failures: https://tbpl.mozilla.org/?tree=Try&rev=53b2b69cadc6

Here's what the first one looks like under the viewer: http://people.mozilla.org/~sguo/mochimem/viewer.html?url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/shu@rfrn.org-53b2b69cadc6/try-win32-debug/try_win7-ix-debug_test-mochitest-browser-chrome-bm73-tests1-windows-build1369.txt.gz&

You can see largestContiguousVMBlock falling pretty low: 19 MB.

For comparison, a green Win7 debug BC with the forceGCs left in is fairly green, after copious retriggering: https://tbpl.mozilla.org/?tree=Try&rev=d673d64f6021

For comparison, here's what the first green BC from that try overlaid with the OOM run looks like: http://people.mozilla.org/~sguo/mochimem/viewer.html?url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/shu@rfrn.org-53b2b69cadc6/try-win32-debug/try_win7-ix-debug_test-mochitest-browser-chrome-bm73-tests1-windows-build1369.txt.gz&url=http%3A//ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/shu@rfrn.org-d673d64f6021/try-win32-debug/try_win7-ix-debug_test-mochitest-browser-chrome-bm74-tests1-windows-build1019.txt.gz&

The failing run has a dip in largestContiguousVMBlock starting in the middle of the tabview bugs that isn't present in the passing one. The failing run's dip from the debugger view-variable tests is also much larger than the dip in the passing run. Do some machines have bad allocators? What's going on there?
To clarify, what's confusing to me is why the memory patterns are different way before the debugger tests even kick in.
Depends on: 941837
Tree is open.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 28
Depends on: defrag
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: