Last Comment Bug 767337 - Layers hardware acceleration causing high memory consumption when playing a game using canvas and js
: Layers hardware acceleration causing high memory consumption when playing a g...
Status: VERIFIED FIXED
[Memshrink:P2]
: crash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: 12 Branch
: x86_64 Windows 7
: -- minor with 2 votes (vote)
: mozilla17
Assigned To: Nick Cameron [:nrc]
:
: Milan Sreckovic [:milan]
Mentors:
http://upsidedownturtle.com/boredbore...
: 785643 (view as bug list)
Depends on:
Blocks: 651858 775215
  Show dependency treegraph
 
Reported: 2012-06-22 05:58 PDT by nretrain
Modified: 2015-03-07 09:00 PST (History)
20 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
memory use snapshot (109.21 KB, image/jpeg)
2012-07-05 04:02 PDT, Loic
no flags Details
crash test (1.26 KB, patch)
2012-07-11 21:20 PDT, Nick Cameron [:nrc]
no flags Details | Diff | Splinter Review
patch (13.75 KB, patch)
2012-07-11 21:22 PDT, Nick Cameron [:nrc]
no flags Details | Diff | Splinter Review
crash test (1.26 KB, patch)
2012-07-11 21:24 PDT, Nick Cameron [:nrc]
roc: review+
Details | Diff | Splinter Review
patch (14.02 KB, patch)
2012-07-16 03:50 PDT, Nick Cameron [:nrc]
bas: review-
Details | Diff | Splinter Review
crash test (1.34 KB, patch)
2012-07-16 03:52 PDT, Nick Cameron [:nrc]
ncameron: review+
Details | Diff | Splinter Review
Bas's patch (1.03 KB, patch)
2012-07-18 09:50 PDT, Nick Cameron [:nrc]
roc: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta-
akeybl: approval‑mozilla‑esr10-
Details | Diff | Splinter Review
GC log (with Memchaser add-on) (31.59 KB, text/plain)
2012-07-26 15:03 PDT, Loic
no flags Details

Description nretrain 2012-06-22 05:58:19 PDT
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Build ID: 20120420145725

Steps to reproduce:

windows 7
firefox 12

1) go to http://upsidedownturtle.com/boredboredbored/
2) play few minutes (2 or 3 is enough)


Actual results:

Firefox uses the all ram available (3,7Go) and stops running the current page


Expected results:

Play without a firefox crash
Comment 1 Loic 2012-06-22 06:38:25 PDT
Indeed, it appears only when the tab with the game is active, if you're using another tab, memory use stays 'correct'.
In addition, explicit memory is pretty low but private memory grows up very fast.

Log during a memory peak:

Other Measurements
139 (100.0%) -- js-compartments
├──137 (98.56%) ── system
└────2 (01.44%) ── user

0 (100.0%) -- low-memory-events
├──0 (100.0%) ── physical
└──0 (100.0%) ── virtual

2,085,204 B (100.0%) -- window-objects
├──1,360,448 B (65.24%) -- layout
│  ├────769,888 B (36.92%) ── style-sets
│  ├────381,048 B (18.27%) ── pres-shell
│  ├─────79,640 B (03.82%) ── rule-nodes
│  ├─────50,080 B (02.40%) ── frames
│  ├─────46,288 B (02.22%) ── style-contexts
│  ├─────19,488 B (00.93%) ── pres-contexts
│  ├─────13,536 B (00.65%) ── line-boxes
│  └────────480 B (00.02%) ── text-runs
├────433,960 B (20.81%) ── style-sheets
├────288,172 B (13.82%) -- dom
│    ├──188,856 B (09.06%) ── element-nodes
│    ├───69,600 B (03.34%) ── other
│    ├───26,196 B (01.26%) ── text-nodes
│    ├────3,520 B (00.17%) ── comment-nodes
│    └────────0 B (00.00%) ── cdata-nodes
└──────2,624 B (00.13%) ── property-tables

    9,329,792 B ── canvas-2d-pixel-bytes
   81,262,431 B ── explicit
        7,168 B ── gfx-d2d-surfacecache
   16,540,068 B ── gfx-d2d-surfacevram
   28,034,096 B ── gfx-d2d-vram-drawtarget
      155,272 B ── gfx-d2d-vram-sourcesurface
      220,160 B ── gfx-surface-image
              0 ── ghost-windows
   57,337,416 B ── heap-allocated
   61,456,384 B ── heap-committed
    4,106,778 B ── heap-committed-unused
          7.15% ── heap-committed-unused-ratio
      139,264 B ── heap-dirty
   20,252,550 B ── heap-unused
        4,288 B ── images-content-used-uncompressed
   22,020,096 B ── js-gc-heap
    5,757,312 B ── js-main-runtime-analysis-temporary
    8,064,712 B ── js-main-runtime-gc-heap-allocated
    6,971,704 B ── js-main-runtime-gc-heap-arena-unused
            0 B ── js-main-runtime-gc-heap-chunk-clean-unused
            0 B ── js-main-runtime-gc-heap-chunk-dirty-unused
   15,036,416 B ── js-main-runtime-gc-heap-committed
    6,971,704 B ── js-main-runtime-gc-heap-committed-unused
         86.44% ── js-main-runtime-gc-heap-committed-unused-ratio
    6,983,680 B ── js-main-runtime-gc-heap-decommitted
    1,034,828 B ── js-main-runtime-mjit
    3,732,928 B ── js-main-runtime-objects
    3,985,264 B ── js-main-runtime-scripts
    4,289,176 B ── js-main-runtime-shapes
    2,054,958 B ── js-main-runtime-strings
      294,528 B ── js-main-runtime-type-inference
              0 ── low-commit-space-events
  902,127,616 B ── private
  826,380,288 B ── resident
    4,910,648 B ── storage-sqlite
1,436,499,968 B ── vsize
Comment 2 Nicholas Nethercote [:njn] 2012-06-22 19:19:31 PDT
I'd guess it's something to do with hardware acceleration.  Can you try going to about:config and setting layers.acceleration.disabled to true and see if that makes a difference?
Comment 3 Loic 2012-06-23 06:08:45 PDT
(In reply to Nicholas Nethercote [:njn] from comment #2)
> I'd guess it's something to do with hardware acceleration.  Can you try
> going to about:config and setting layers.acceleration.disabled to true and
> see if that makes a difference?

1st test: I disabled completely HWA in advanced options in FF.
Result: no high use of memory, normal behavior.

2nd test: HWA was on but layers.acceleration.disabled = true.
Result: high use of memory as I reported in comment #1.
Comment 4 Nicholas Nethercote [:njn] 2012-06-24 07:32:32 PDT
> 1st test: I disabled completely HWA in advanced options in FF.
> Result: no high use of memory, normal behavior.
> 
> 2nd test: HWA was on but layers.acceleration.disabled = true.
> Result: high use of memory as I reported in comment #1.

Thanks, Loic.  So, HWA of layers is causing the problem.  Looks like this is either a layout or graphics problem, I've CC'd some relevant people.
Comment 5 Nicholas Nethercote [:njn] 2012-06-26 16:28:58 PDT
Nick, will you have a chance to look at this soon?
Comment 6 Nick Cameron [:nrc] 2012-06-26 17:01:18 PDT
Yes, I'll start today
Comment 7 Nick Cameron [:nrc] 2012-06-27 21:55:12 PDT
njn: sorry, didn't get onto this yesterday or today, hopefully I should be on it tomorrow or Monday, is that OK with you, timing-wise?
Comment 8 Nicholas Nethercote [:njn] 2012-06-27 22:55:13 PDT
> njn: sorry, didn't get onto this yesterday or today, hopefully I should be
> on it tomorrow or Monday, is that OK with you, timing-wise?

Sure!
Comment 9 Nick Cameron [:nrc] 2012-07-03 17:55:17 PDT
I confirmed the reported behaviour on FF 12, memory use rises until FF crashes (annoyingly taking my display driver with it). But this is resolved on version 13 (current release) - here memory rises but falls again with GC, admittedly with some pretty brutal GC pauses. No crash, no excessive memory use. Things are even better on 14, GC pauses are smaller and so is memory use, still no crashes. Works fine on nightly (16) too.
Comment 10 Loic 2012-07-05 04:02:38 PDT
Created attachment 639279 [details]
memory use snapshot

(In reply to Nick Cameron [:nrc] from comment #9)
> I confirmed the reported behaviour on FF 12, memory use rises until FF
> crashes (annoyingly taking my display driver with it). But this is resolved
> on version 13 (current release) - here memory rises but falls again with GC,
> admittedly with some pretty brutal GC pauses. No crash, no excessive memory
> use. Things are even better on 14, GC pauses are smaller and so is memory
> use, still no crashes. Works fine on nightly (16) too.

That's not really what I'm observing on Win 7 with the latest Nightly build.
During the game after 15-20 sec, Nightly's process hit the barrier of 3GB, my computer starts to freeze.
Comment 11 Nick Cameron [:nrc] 2012-07-05 04:09:07 PDT
(In reply to Loic from comment #10)
> Created attachment 639279 [details]
> memory use snapshot
> 
> (In reply to Nick Cameron [:nrc] from comment #9)
> > I confirmed the reported behaviour on FF 12, memory use rises until FF
> > crashes (annoyingly taking my display driver with it). But this is resolved
> > on version 13 (current release) - here memory rises but falls again with GC,
> > admittedly with some pretty brutal GC pauses. No crash, no excessive memory
> > use. Things are even better on 14, GC pauses are smaller and so is memory
> > use, still no crashes. Works fine on nightly (16) too.
> 
> That's not really what I'm observing on Win 7 with the latest Nightly build.
> During the game after 15-20 sec, Nightly's process hit the barrier of 3GB,
> my computer starts to freeze.

could you copy the graphics and Javascript sections from your about:config please? It is odd to have such different behaviour. Do you have the same behaviour from release or beta? Thanks!
Comment 12 Loic 2012-07-05 05:11:53 PDT
I guess you meant about:support.

I tested with a new Firefox profile each time.

+ Beta: FF14 eats the memory (close to 2.3-2.5GB), starts being unresponsive then crashes.
CR: https://crash-stats.mozilla.com/report/index/bp-fd3638c3-8d61-4d94-97a6-44f332120705

+ Aurora: Same behavior than FF14, but after being unresponsive a 1st time, FF15 returns control a 2nd time but it's almost impossible to move the mouse cursor. Then FF15 crashes.
CR: https://crash-stats.mozilla.com/report/index/bp-2f90de70-9dd5-496a-9864-76d0f2120705

+ Nightly: FF16 eats more memory (~3GB) than FF14/FF15 but becames unresponsive then returns control and it's smooth at 90% after that. No crash.
I think FF16 eats more memory so it's more smooth after the 1st hang.

I have to say the computer didn't really appreciate these tests, I had to reboot because applications were slow even if testing was over.

about:support for FF16:

Graphics
      
Adapter Description Intel(R) HD Graphics 3000
Vendor ID 0x8086
Device ID 0x0116
Adapter RAM Unknown
Adapter Drivers igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32
Driver Version 8.15.10.2696
Driver Date 3-19-2012
Adapter Description (GPU #2) NVIDIA GeForce GT 550M
Vendor ID (GPU #2) 0x10de
Device ID (GPU #2) 0x0df6
Adapter RAM (GPU #2) 2047
Adapter Drivers (GPU #2) nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Driver Version (GPU #2) 8.17.13.142
Driver Date (GPU #2) 5-15-2012
Direct2D Enabled true
DirectWrite Enabled true (6.1.7601.17789)
ClearType Parameters Gamma: 2200 Pixel Structure: RGB ClearType Level: 100 Enhanced Contrast: 50 
WebGL Renderer Google Inc. -- ANGLE (Intel(R) HD Graphics 3000) -- OpenGL ES 2.0 (ANGLE 1.0.0.1041)
GPU Accelerated Windows 1/1 Direct3D 10
AzureBackend direct2d      
      
JavaScript         
            
Incremental GC 1
Comment 13 Nick Cameron [:nrc] 2012-07-05 15:18:14 PDT
Thanks for the data Loic, and yes I meant about:support, sorry. So the only difference I see here is that you are using integrated graphics card (the Intel one). I use only discrete graphics (nvidia), so I rebooted with the integrated Intel GPU, and indeed, I get the same behaviour as you report - quickly escalating memory usage, leading to a crash. On Nightly I get the occaisional GC which slows things down, but not by much - I still get the crash, it looks like not one GC happens before I crash on beta (14).

So it looks like this is caused by some combination of HWA rendering and integrated graphics card, or possibly some issue with a specific Intel card or its driver (about:support shows "Intel(R) HD Graphics Family" for me, not sure if that is due to an older driver and we have the same card, or we have different cards).

At least I can recreate the bug now, I will look into it.

(If you feel the need, you could try running this on your machine with Optimus disabled and using your NVIDIA card, to confirm that this fixes the problem, but it is not really necessary)
Comment 14 Nick Cameron [:nrc] 2012-07-05 15:45:23 PDT
So, it seems that things aren't being leaked as such, when you pause the game or force a GC from about:memory, then the memory use goes down to reasonable levels. It just seems that GC isn't getting a chance to run or something. I have no idea why changing graphics cards or layers backends would affect this though.
Comment 15 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-05 19:23:45 PDT
We call JS_updateMallocCounter in nsCanvasRenderingContext2D(Azure) to tell JS that we've allocated non-JS-heap memory for canvas buffers that it needs to take account of in its GC-triggering heuristics so that GCs will be forced to happen so we can release these buffers.

It sounds like something related is happening here, maybe, and something is allocating a lot of video memory and not telling JS so we're not doing timely JS GCs.
Comment 16 Nick Cameron [:nrc] 2012-07-11 21:20:46 PDT
Created attachment 641343 [details] [diff] [review]
crash test
Comment 17 Nick Cameron [:nrc] 2012-07-11 21:22:51 PDT
Created attachment 641344 [details] [diff] [review]
patch
Comment 18 Nick Cameron [:nrc] 2012-07-11 21:24:42 PDT
Created attachment 641345 [details] [diff] [review]
crash test
Comment 19 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-11 22:28:46 PDT
Comment on attachment 641344 [details] [diff] [review]
patch

Review of attachment 641344 [details] [diff] [review]:
-----------------------------------------------------------------

You need a comment somewhere explaining why you're doing this.
Comment 20 Nick Cameron [:nrc] 2012-07-14 07:19:45 PDT
Try push: https://tbpl.mozilla.org/?tree=Try&rev=b2ca7fbf80b7
Comment 21 Nick Cameron [:nrc] 2012-07-16 03:50:35 PDT
Created attachment 642526 [details] [diff] [review]
patch

added a comment
Comment 22 Nick Cameron [:nrc] 2012-07-16 03:52:06 PDT
Created attachment 642527 [details] [diff] [review]
crash test

increased the number of drawn gradients (still doesn't actually cause a crash on Try, but it does when run locally with the right (wrong?) graphics card). Carrying r=roc.
Comment 23 Bas Schouten (:bas.schouten) 2012-07-16 07:34:15 PDT
Comment on attachment 642526 [details] [diff] [review]
patch

Review of attachment 642526 [details] [diff] [review]:
-----------------------------------------------------------------

So, I think this is the wrong solution. First of all this will create a global record without any intelligence which could definitely cause performance issues and pitfalls once we start re-using gradients more actively. More importantly once we get other backends that also cache information for GradientStops they'll need similar logic. The mistake of creating too many GradientStops here is really in the user of the API. As such I believe this sort of logic should go inside the canvas implementation. A dumb record there might be okay as web authors can then just improve their implementation, and it also can't interfere with UI responsiveness.
Comment 24 Nick Cameron [:nrc] 2012-07-16 18:48:12 PDT
I tried Bas's suggestion of removing the D3D10_RESOURCE_MISC_GDI_COMPATIBLE flag from the D3D10 Layer manager code. This prevented the crash and the explosive memory growth, but it did not cure the appalling GC pauses. The fix is therefore OK by itself if we assume GC will naturally improve. Otherwise, we could combine it with the cache limit patch, but with a higher limit. Opinions?
Comment 25 Bas Schouten (:bas.schouten) 2012-07-16 18:57:39 PDT
(In reply to Nick Cameron [:nrc] from comment #24)
> I tried Bas's suggestion of removing the D3D10_RESOURCE_MISC_GDI_COMPATIBLE
> flag from the D3D10 Layer manager code. This prevented the crash and the
> explosive memory growth, but it did not cure the appalling GC pauses. The
> fix is therefore OK by itself if we assume GC will naturally improve.
> Otherwise, we could combine it with the cache limit patch, but with a higher
> limit. Opinions?

I'm personally in favor of just removing D3D10_RESOURCE_MISC_GDI_COMPATIBLE and going from there.

I'm not 100% sure how I feel about the GC pauses.
Comment 26 Andrew McCreight [:mccr8] 2012-07-16 19:47:55 PDT
(In reply to Bas Schouten (:bas) from comment #25)
> I'm not 100% sure how I feel about the GC pauses.
What are the pauses like?  If you turn on javascript.options.mem.log then you will get a log of what the GC is doing.  If you copy a few of those entries here we may get some insight into what the GC is doing.

Maybe Bill or Gregor have some ideas how the GC performance could be improved?  roc's comment 15 about JS_updateMallocCounter sounds right to me, not that I know a huge amount about it.
Comment 27 Andrew McCreight [:mccr8] 2012-07-16 19:48:18 PDT
(In reply to Andrew McCreight [:mccr8] from comment #26)
> If you turn on javascript.options.mem.log then
> you will get a log of what the GC is doing.
(it will be logged in the error console)
Comment 28 Gregor Wagner [:gwagner] 2012-07-16 20:50:06 PDT
Hm I can't reproduce the OOM situation with a current nightly. After a few min I am still at around 220MB resident size.
Comment 29 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-16 21:24:21 PDT
Gregor: you need to have the particular Intel graphics hardware, and maybe some particular driver.

Nick: If you remove the D3D10_RESOURCE_MISC_GDI_COMPATIBLE but change nothing else on trunk (no JS_updateMallocCounter or other GC-related changes), does the problem just go away?

Bas: what are the consequences of removing D3D10_RESOURCE_MISC_GDI_COMPATIBLE?
Comment 30 Bas Schouten (:bas.schouten) 2012-07-17 00:03:47 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #29)
> Bas: what are the consequences of removing
> D3D10_RESOURCE_MISC_GDI_COMPATIBLE?

In reality, nothing. As a matter of fact I should have done this a long time ago and I should really do it in more places. We don't actually use this anywhere with Azure-D2D.
Comment 31 Nick Cameron [:nrc] 2012-07-17 04:32:01 PDT
Without the fix GC pauses were up to 9sec. With Bas's fix, with a debug build, initially there is no problem, but GC pauses get longer as you play the game, getting up to 1.5sec, but many shorter pauses frequent enough to make the game unplayable. With an optimised build, there is no noticeable GC (I played for about 7-8mins, for debug, it is unplayable after 2-3). The longest logged GC was about 25ms. With a limit on the number of gradients, there was no GC issue, even on debug.
Comment 32 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-17 05:19:45 PDT
How about taking the D3D10_RESOURCE_MISC_GDI_COMPATIBLE fix until/unless we have more data to justify taking the approach in your patch?
Comment 33 Nick Cameron [:nrc] 2012-07-17 05:46:01 PDT
sounds good to me
Comment 34 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-17 07:49:14 PDT
I'm pretty confused here. First, are there enough calls to JS_updateMallocCounter right now? Does removing this D3D10_RESOURCE_MISC_GDI_COMPATIBLE flag just cover up another problem where we allocate a lot of memory and don't GC?

Also, do we need to do anything to fix GC pauses? Do the pauses only happen with the special graphics card, or with any card? Did the pauses ever happen with an optimized build? Measuring GC pauses in a debug build is pretty useless, since we do all sorts of extra heap walking after the GC finishes to verify correctness.
Comment 35 Nick Cameron [:nrc] 2012-07-17 09:00:42 PDT
(In reply to Bill McCloskey (:billm) from comment #34)
> I'm pretty confused here. First, are there enough calls to
> JS_updateMallocCounter right now?

No, I think we need more, the problem is that we have no idea how much memory is being used - it changes depending on the driver.

> Does removing this
> D3D10_RESOURCE_MISC_GDI_COMPATIBLE flag just cover up another problem where
> we allocate a lot of memory and don't GC?

I'm not exactly sure what removing that flag does. I think leads to much less memory being allocated.

> 
> Also, do we need to do anything to fix GC pauses? Do the pauses only happen
> with the special graphics card, or with any card? Did the pauses ever happen
> with an optimized build? Measuring GC pauses in a debug build is pretty
> useless, since we do all sorts of extra heap walking after the GC finishes
> to verify correctness.

Yes, we did get pauses on optimised builds without the fix (around 1/2 second), but only on specific drivers, and actually it usually crashed before we saw many.
Comment 36 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-17 10:31:57 PDT
(In reply to Nick Cameron [:nrc] from comment #35)
> (In reply to Bill McCloskey (:billm) from comment #34)
> > I'm pretty confused here. First, are there enough calls to
> > JS_updateMallocCounter right now?
> 
> No, I think we need more, the problem is that we have no idea how much
> memory is being used - it changes depending on the driver.

Is that because the driver is the one allocating the memory?

It's not really that important that the number passed to JS_updateMallocCounter be correct. As long as you can make up a number that's close, it's better than nothing. All this call does is decrement a counter that starts at 100MB. When it reaches zero, we GC.

> Yes, we did get pauses on optimised builds without the fix (around 1/2
> second), but only on specific drivers, and actually it usually crashed
> before we saw many.

Could that have been due to paging?
Comment 37 Nick Cameron [:nrc] 2012-07-17 10:53:36 PDT
(In reply to Bill McCloskey (:billm) from comment #36)
> (In reply to Nick Cameron [:nrc] from comment #35)
> > (In reply to Bill McCloskey (:billm) from comment #34)
> > > I'm pretty confused here. First, are there enough calls to
> > > JS_updateMallocCounter right now?
> > 
> > No, I think we need more, the problem is that we have no idea how much
> > memory is being used - it changes depending on the driver.
> 
> Is that because the driver is the one allocating the memory?

I assume so.

> 
> It's not really that important that the number passed to
> JS_updateMallocCounter be correct. As long as you can make up a number
> that's close, it's better than nothing. All this call does is decrement a
> counter that starts at 100MB. When it reaches zero, we GC.
> 

The amount would vary from 16k to 200k, and possibly no memory at all if the gradient is never created. Usually in graphics memory, not main memory, not sure if that affects things.

> > Yes, we did get pauses on optimised builds without the fix (around 1/2
> > second), but only on specific drivers, and actually it usually crashed
> > before we saw many.
> 
> Could that have been due to paging?

Do you mean the pauses? They were GC pauses - logged at 0.5s.
Comment 38 Nick Cameron [:nrc] 2012-07-18 09:50:40 PDT
Created attachment 643422 [details] [diff] [review]
Bas's patch
Comment 39 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-18 09:55:32 PDT
Comment on attachment 643422 [details] [diff] [review]
Bas's patch

Review of attachment 643422 [details] [diff] [review]:
-----------------------------------------------------------------

Are you going to remove the other uses of this flag? I think Bas wanted to.
Comment 40 Nick Cameron [:nrc] 2012-07-18 12:05:26 PDT
Try push: https://tbpl.mozilla.org/?tree=Try&rev=0b44ab1599ed
Comment 42 Nick Cameron [:nrc] 2012-07-18 12:10:23 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39)
> Comment on attachment 643422 [details] [diff] [review]
> Bas's patch
> 
> Review of attachment 643422 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Are you going to remove the other uses of this flag? I think Bas wanted to.

Yes: bug 775215
Comment 44 Nick Cameron [:nrc] 2012-07-24 17:24:49 PDT
Comment on attachment 643422 [details] [diff] [review]
Bas's patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: horrific memory leak leading to crash for some Intel graphics drivers 
Fix Landed on Version: 17
Risk to taking this patch (and alternatives if risky): Low, we never make use of the flag that is removed
String or UUID changes made by this patch: None
Comment 45 Loic 2012-07-24 18:09:11 PDT
I don't really know the purpose of this patch but with the latest Nightly, I got a crash:
https://crash-stats.mozilla.com/report/index/bp-a02d455f-4197-4b9f-b910-0b1252120725

I observed more or less the same behavior before the patch landed, maybe slowings appear later, anyway the memory use is still growing up and Firefox finishes to become unresponsive. My laptop fan runs more and the Intel chipset becomes hot.

Check the log of Memchaser add-on: resident memory is horribly high and CC time is horribly long
http://i.imgur.com/llf7I.png
Comment 46 Nick Cameron [:nrc] 2012-07-24 19:32:50 PDT
That is worrying, it sounds exactly like the pre-patch behaviour. I'll try with a nightly again when I re-config my graphics cards. The behaviour was definitely fixed for me when I tested last week.
Comment 47 Nick Cameron [:nrc] 2012-07-25 00:13:00 PDT
I just tried again with a nightly build, built from m-c, pulled 25/7/12, and the memory use was fine, never got above 150k (checked with task manager), and no crash.

Are you using Optimus? That is the only difference (that you can see a 2nd graphics card) I can see from your about:support dump further up. If you are, could you test with Optimus disabled and just the integrated graphics card?

If it is not that, then I am out of ideas. Bas, any clue?
Comment 48 Loic 2012-07-25 00:53:50 PDT
No, I don't use Optimus with Firefox because Firefox has blocked the GPU drivers (NVIDIA GeForce GT 550M / 8.17.13.142 / 5-15-2012) and as I want to have HWA enabled, I have selected the integrated graphics for Firefox in Nvidia config manager.
Comment 49 Nick Cameron [:nrc] 2012-07-25 01:51:56 PDT
Hmm, I have no idea then. You have more recent drivers than me, by a few months, and a different (but similar) device id. I don't see the information about GPU #2 (I've only seen this with Optimus on, which is why I thought of that, but you are right that you wouldn't get HWA with Optimus). Otherwise our info is identical.

Bas, roc: any ideas what could be causing Loic to still have problems when I don't?
Comment 50 Loic 2012-07-25 02:01:19 PDT
For the record, about:support > graphics: http://i.imgur.com/wgKJ2.png
Comment 51 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-25 14:56:27 PDT
Loic, can you try a build from the try-push in comment #20 and see if that fixes the problem for you?
Comment 52 Loic 2012-07-25 15:07:26 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> Loic, can you try a build from the try-push in comment #20 and see if that
> fixes the problem for you?

I can try but I'm not familiar with try-push. Is there a link to download a zipped try-build (standalone version)?
Comment 53 Nick Cameron [:nrc] 2012-07-25 15:29:07 PDT
(In reply to Loic from comment #52)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> > Loic, can you try a build from the try-push in comment #20 and see if that
> > fixes the problem for you?
> 
> I can try but I'm not familiar with try-push. Is there a link to download a
> zipped try-build (standalone version)?

Loic: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ncameron@mozilla.com-b2ca7fbf80b7/try-win32/firefox-16.0a1.en-US.win32.installer.exe

(Opt Win32 build)
Comment 54 Loic 2012-07-25 16:14:19 PDT
(In reply to Nick Cameron [:nrc] from comment #53)
> (In reply to Loic from comment #52)
> > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> > > Loic, can you try a build from the try-push in comment #20 and see if that
> > > fixes the problem for you?
> > 
> > I can try but I'm not familiar with try-push. Is there a link to download a
> > zipped try-build (standalone version)?
> 
> Loic:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ncameron@mozilla.
> com-b2ca7fbf80b7/try-win32/firefox-16.0a1.en-US.win32.installer.exe
> 
> (Opt Win32 build)

Good news! :)
I tried with this try-build (the zipped version, firefox-16.0a1.en-US.win32.zip) and the resident/private memories stay normal (~ 280-300 MB during 2-3 min of game).
Comment 55 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-25 16:15:55 PDT
So it sounds like we still need that patch.
Comment 56 Bas Schouten (:bas.schouten) 2012-07-25 17:09:14 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #55)
> So it sounds like we still need that patch.

I'd rather like to understand why this is still happening for Loic, to make sure only affected hardware gets this path.
Comment 57 Nick Cameron [:nrc] 2012-07-25 17:11:05 PDT
Bas, roc - as the only other Windows users I know of, could you try and reproduce this bug please?
Comment 58 Bas Schouten (:bas.schouten) 2012-07-25 17:11:28 PDT
(In reply to Nick Cameron [:nrc] from comment #49)
> Hmm, I have no idea then. You have more recent drivers than me, by a few
> months, and a different (but similar) device id. I don't see the information
> about GPU #2 (I've only seen this with Optimus on, which is why I thought of
> that, but you are right that you wouldn't get HWA with Optimus). Otherwise
> our info is identical.
> 
> Bas, roc: any ideas what could be causing Loic to still have problems when I
> don't?

Is there an intel driver version difference?
Comment 59 Nick Cameron [:nrc] 2012-07-25 17:15:31 PDT
(In reply to Bas Schouten (:bas) from comment #58)
> (In reply to Nick Cameron [:nrc] from comment #49)
> > Hmm, I have no idea then. You have more recent drivers than me, by a few
> > months, and a different (but similar) device id. I don't see the information
> > about GPU #2 (I've only seen this with Optimus on, which is why I thought of
> > that, but you are right that you wouldn't get HWA with Optimus). Otherwise
> > our info is identical.
> > 
> > Bas, roc: any ideas what could be causing Loic to still have problems when I
> > don't?
> 
> Is there an intel driver version difference?

Yes, we have slightly different hardware, Loic has 0x0116 and I have 0x0126. We have the same drivers, but he has a more up to date version (I guess they could have regressed, I'll try to update my drivers).
Comment 60 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-25 17:20:35 PDT
I don't have an Intel GPU, only Nvidia. I don't see the bug.
Comment 61 Bas Schouten (:bas.schouten) 2012-07-25 19:01:55 PDT
On my intel box:

Intel(R) HD Graphics 4000
0x0166
8.15.10.2725

A very subjective measurement resulted in the following:

Before the patch, at worst memory usage creeps up to ~500 MB, GC takes several seconds (5-10).
After the patch, at worst memory usage creeps up to ~400 MB, GC takes a couple of seconds (1-2).

Note I manually applied the patch to my local tree to make sure nothing else interfered.

In both cases the experience of the game is affected by the GC obviously. I can't get memory to go anywhere near 1GB on this machine.
Comment 62 Nick Cameron [:nrc] 2012-07-25 19:04:54 PDT
So, it looks like you weren't affected by the bug in the first place, which means it is only some Intel drivers which have the problem. We need more data...
Comment 63 Nick Cameron [:nrc] 2012-07-26 14:20:28 PDT
I updated my drivers and tried testing again with current nightly - still no bug.
Comment 64 Nick Cameron [:nrc] 2012-07-26 14:34:54 PDT
(In reply to Bas Schouten (:bas) from comment #61)
> On my intel box:
> 
> Intel(R) HD Graphics 4000
> 0x0166
> 8.15.10.2725
> 
> A very subjective measurement resulted in the following:
> 
> Before the patch, at worst memory usage creeps up to ~500 MB, GC takes
> several seconds (5-10).
> After the patch, at worst memory usage creeps up to ~400 MB, GC takes a
> couple of seconds (1-2).
> 
> Note I manually applied the patch to my local tree to make sure nothing else
> interfered.
> 
> In both cases the experience of the game is affected by the GC obviously. I
> can't get memory to go anywhere near 1GB on this machine.

Was that with a debug build? Otherwise those GC pauses are huge (and worrying).
Comment 65 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-26 14:35:16 PDT
Could you guys post GC logs? Even a one second GC pause is terrible. I don't understand why this is happening. GC times should be related to the number of JS objects allocated, which should be independent of graphics hardware. When I run the game myself, it seems to allocate almost nothing in the JS heap.

When you run the game, do you also have tons of other tabs open? Do GCs normally take multiple seconds for you?
Comment 66 Loic 2012-07-26 15:03:33 PDT
Created attachment 646361 [details]
GC log (with Memchaser add-on)

(In reply to Bill McCloskey (:billm) from comment #65)
> Could you guys post GC logs? Even a one second GC pause is terrible.

See attachment.
 
> When you run the game, do you also have tons of other tabs open? Do GCs
> normally take multiple seconds for you?

No, the log has been recorded with the latest Nighlty, only one tab open, this one with the game. 
When I'm moving the game cursor with the arrow keys, resident use increases, sometimes memory use relaxes but after 1 or 2 min, resident use grows up to the max and Nightly starts to not respond.
Comment 67 Alex Keybl [:akeybl] 2012-07-26 15:03:52 PDT
Comment on attachment 643422 [details] [diff] [review]
Bas's patch

[Triage Comment]
It doesn't appear that we have enough data about user pain to take this past Aurora 16. Not worth the possibility of regression (even though we feel this is safe).
Comment 68 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-26 15:16:52 PDT
Thanks, Loic! This clears up some of my confusion. All of the time spent in these GCs is in the end_callback phase. That's where we call destructors on C++ objects whose JS wrappers have been swept during the GC. So there are some Gecko objects involved here that are very expensive to clean up.

Bas, if you're able to reproduce the slow GCs, you should be able to figure out which destructors are hurting us by profiling. The stacks of interest will have XPCJSRuntime::GCCallback in them.
Comment 69 Nick Cameron [:nrc] 2012-07-26 15:35:55 PDT
We know that the GC pauses are caused by tidying up gradients. The patch which landed mostly fixes this, and GC is back down to reasonable levels (unless Bas was running an optimised build, in which case, more mystery). The only problem is why Loic still sees the old (bad) behaviour even with the patch.
Comment 70 [PTO to Dec5] Bill McCloskey (:billm) 2012-07-26 15:42:31 PDT
Oh, OK, thanks. I hadn't understood that.
Comment 71 Nick Cameron [:nrc] 2012-07-26 19:52:21 PDT
Aurora push for the simple fix: https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=c24fa8d2d008
Comment 72 Scoobidiver (away) 2012-07-27 08:08:49 PDT
Landed in Aurora:
http://hg.mozilla.org/releases/mozilla-aurora/rev/c24fa8d2d008
Comment 73 Nick Cameron [:nrc] 2012-08-26 22:08:38 PDT
Loic: are you still getting memory problems from this game? (Sorry I've lost track of this bug for a while, if you are still having problems, lets reopen it).
Comment 74 Loic 2012-08-27 07:04:07 PDT
I tested with the latest Nightly: with HWA enabled, it's not fixed but if I disabled HWA, the resident memory use is normal (even if CPU charge growns up).
Probably an issue with Intel drivers.
Comment 75 Nick Cameron [:nrc] 2012-09-04 16:40:01 PDT
*** Bug 785643 has been marked as a duplicate of this bug. ***
Comment 76 Nick Cameron [:nrc] 2012-09-04 16:40:47 PDT
Re-opening because this is still a problem.
Comment 77 Nick Cameron [:nrc] 2012-09-20 15:10:32 PDT
We should land the patch which limits the number of cached gradients, but we should only do this where necessary. That depends on the driver, so we need to know which drivers are affected. Worst case, we can use the code for all Intel drivers, but it would be nice to be more precise because not all Intel drivers are affected, and there might be performance downsides to limiting caching unnecessarily.
Comment 78 Virtual_ManPL [:Virtual] - (ni? me) 2015-02-10 09:52:04 PST
Is it still the problem or we can close this bug as RESOLVED?
Comment 79 Nick Cameron [:nrc] 2015-02-10 14:17:54 PST
(In reply to Virtual_ManPL [:Virtual] from comment #78)
> Is it still the problem or we can close this bug as RESOLVED?

I'm afraid I don't know, sorry (I no longer work on graphics).
Comment 80 Virtual_ManPL [:Virtual] - (ni? me) 2015-02-10 14:27:42 PST
Thank you very much for reply.
Maybe Robert O'Callahan [:roc] will know.
Comment 81 Robert O'Callahan (:roc) (email my personal email if necessary) 2015-03-07 02:55:52 PST
If I understand correctly, only Loic could still reproduce this.
Comment 82 Loic 2015-03-07 04:18:34 PST
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #81)
> If I understand correctly, only Loic could still reproduce this.

I can't reproduce the memory use anymore when playing the game.

Note You need to log in before you can comment on or make changes to this bug.