Closed Bug 767337 Opened 12 years ago Closed 9 years ago

Layers hardware acceleration causing high memory consumption when playing a game using canvas and js

Categories

(Core :: Graphics, defect)

12 Branch
x86_64
Windows 7
defect
Not set
minor

Tracking

()

VERIFIED FIXED
mozilla17
Tracking Status
firefox16 --- fixed

People

(Reporter: nicolas.retrain, Assigned: nrc)

References

()

Details

(Keywords: crash, Whiteboard: [Memshrink:P2])

Attachments

(4 files, 4 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Build ID: 20120420145725

Steps to reproduce:

windows 7
firefox 12

1) go to http://upsidedownturtle.com/boredboredbored/
2) play few minutes (2 or 3 is enough)


Actual results:

Firefox uses the all ram available (3,7Go) and stops running the current page


Expected results:

Play without a firefox crash
Severity: normal → minor
Keywords: crash
Indeed, it appears only when the tab with the game is active, if you're using another tab, memory use stays 'correct'.
In addition, explicit memory is pretty low but private memory grows up very fast.

Log during a memory peak:

Other Measurements
139 (100.0%) -- js-compartments
├──137 (98.56%) ── system
└────2 (01.44%) ── user

0 (100.0%) -- low-memory-events
├──0 (100.0%) ── physical
└──0 (100.0%) ── virtual

2,085,204 B (100.0%) -- window-objects
├──1,360,448 B (65.24%) -- layout
│  ├────769,888 B (36.92%) ── style-sets
│  ├────381,048 B (18.27%) ── pres-shell
│  ├─────79,640 B (03.82%) ── rule-nodes
│  ├─────50,080 B (02.40%) ── frames
│  ├─────46,288 B (02.22%) ── style-contexts
│  ├─────19,488 B (00.93%) ── pres-contexts
│  ├─────13,536 B (00.65%) ── line-boxes
│  └────────480 B (00.02%) ── text-runs
├────433,960 B (20.81%) ── style-sheets
├────288,172 B (13.82%) -- dom
│    ├──188,856 B (09.06%) ── element-nodes
│    ├───69,600 B (03.34%) ── other
│    ├───26,196 B (01.26%) ── text-nodes
│    ├────3,520 B (00.17%) ── comment-nodes
│    └────────0 B (00.00%) ── cdata-nodes
└──────2,624 B (00.13%) ── property-tables

    9,329,792 B ── canvas-2d-pixel-bytes
   81,262,431 B ── explicit
        7,168 B ── gfx-d2d-surfacecache
   16,540,068 B ── gfx-d2d-surfacevram
   28,034,096 B ── gfx-d2d-vram-drawtarget
      155,272 B ── gfx-d2d-vram-sourcesurface
      220,160 B ── gfx-surface-image
              0 ── ghost-windows
   57,337,416 B ── heap-allocated
   61,456,384 B ── heap-committed
    4,106,778 B ── heap-committed-unused
          7.15% ── heap-committed-unused-ratio
      139,264 B ── heap-dirty
   20,252,550 B ── heap-unused
        4,288 B ── images-content-used-uncompressed
   22,020,096 B ── js-gc-heap
    5,757,312 B ── js-main-runtime-analysis-temporary
    8,064,712 B ── js-main-runtime-gc-heap-allocated
    6,971,704 B ── js-main-runtime-gc-heap-arena-unused
            0 B ── js-main-runtime-gc-heap-chunk-clean-unused
            0 B ── js-main-runtime-gc-heap-chunk-dirty-unused
   15,036,416 B ── js-main-runtime-gc-heap-committed
    6,971,704 B ── js-main-runtime-gc-heap-committed-unused
         86.44% ── js-main-runtime-gc-heap-committed-unused-ratio
    6,983,680 B ── js-main-runtime-gc-heap-decommitted
    1,034,828 B ── js-main-runtime-mjit
    3,732,928 B ── js-main-runtime-objects
    3,985,264 B ── js-main-runtime-scripts
    4,289,176 B ── js-main-runtime-shapes
    2,054,958 B ── js-main-runtime-strings
      294,528 B ── js-main-runtime-type-inference
              0 ── low-commit-space-events
  902,127,616 B ── private
  826,380,288 B ── resident
    4,910,648 B ── storage-sqlite
1,436,499,968 B ── vsize
Whiteboard: [Memshrink]
I'd guess it's something to do with hardware acceleration.  Can you try going to about:config and setting layers.acceleration.disabled to true and see if that makes a difference?
(In reply to Nicholas Nethercote [:njn] from comment #2)
> I'd guess it's something to do with hardware acceleration.  Can you try
> going to about:config and setting layers.acceleration.disabled to true and
> see if that makes a difference?

1st test: I disabled completely HWA in advanced options in FF.
Result: no high use of memory, normal behavior.

2nd test: HWA was on but layers.acceleration.disabled = true.
Result: high use of memory as I reported in comment #1.
Component: Untriaged → Graphics
Product: Firefox → Core
QA Contact: untriaged → thebes
> 1st test: I disabled completely HWA in advanced options in FF.
> Result: no high use of memory, normal behavior.
> 
> 2nd test: HWA was on but layers.acceleration.disabled = true.
> Result: high use of memory as I reported in comment #1.

Thanks, Loic.  So, HWA of layers is causing the problem.  Looks like this is either a layout or graphics problem, I've CC'd some relevant people.
Summary: memory is growing up disproportionately when playing a game using canvas and js → Layers hardware acceleration causing high memory consumption when playing a game using canvas and js
Assignee: nobody → ncameron
Nick, will you have a chance to look at this soon?
Whiteboard: [Memshrink] → [Memshrink:P2]
Yes, I'll start today
njn: sorry, didn't get onto this yesterday or today, hopefully I should be on it tomorrow or Monday, is that OK with you, timing-wise?
> njn: sorry, didn't get onto this yesterday or today, hopefully I should be
> on it tomorrow or Monday, is that OK with you, timing-wise?

Sure!
I confirmed the reported behaviour on FF 12, memory use rises until FF crashes (annoyingly taking my display driver with it). But this is resolved on version 13 (current release) - here memory rises but falls again with GC, admittedly with some pretty brutal GC pauses. No crash, no excessive memory use. Things are even better on 14, GC pauses are smaller and so is memory use, still no crashes. Works fine on nightly (16) too.
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Attached image memory use snapshot
(In reply to Nick Cameron [:nrc] from comment #9)
> I confirmed the reported behaviour on FF 12, memory use rises until FF
> crashes (annoyingly taking my display driver with it). But this is resolved
> on version 13 (current release) - here memory rises but falls again with GC,
> admittedly with some pretty brutal GC pauses. No crash, no excessive memory
> use. Things are even better on 14, GC pauses are smaller and so is memory
> use, still no crashes. Works fine on nightly (16) too.

That's not really what I'm observing on Win 7 with the latest Nightly build.
During the game after 15-20 sec, Nightly's process hit the barrier of 3GB, my computer starts to freeze.
(In reply to Loic from comment #10)
> Created attachment 639279 [details]
> memory use snapshot
> 
> (In reply to Nick Cameron [:nrc] from comment #9)
> > I confirmed the reported behaviour on FF 12, memory use rises until FF
> > crashes (annoyingly taking my display driver with it). But this is resolved
> > on version 13 (current release) - here memory rises but falls again with GC,
> > admittedly with some pretty brutal GC pauses. No crash, no excessive memory
> > use. Things are even better on 14, GC pauses are smaller and so is memory
> > use, still no crashes. Works fine on nightly (16) too.
> 
> That's not really what I'm observing on Win 7 with the latest Nightly build.
> During the game after 15-20 sec, Nightly's process hit the barrier of 3GB,
> my computer starts to freeze.

could you copy the graphics and Javascript sections from your about:config please? It is odd to have such different behaviour. Do you have the same behaviour from release or beta? Thanks!
I guess you meant about:support.

I tested with a new Firefox profile each time.

+ Beta: FF14 eats the memory (close to 2.3-2.5GB), starts being unresponsive then crashes.
CR: https://crash-stats.mozilla.com/report/index/bp-fd3638c3-8d61-4d94-97a6-44f332120705

+ Aurora: Same behavior than FF14, but after being unresponsive a 1st time, FF15 returns control a 2nd time but it's almost impossible to move the mouse cursor. Then FF15 crashes.
CR: https://crash-stats.mozilla.com/report/index/bp-2f90de70-9dd5-496a-9864-76d0f2120705

+ Nightly: FF16 eats more memory (~3GB) than FF14/FF15 but becames unresponsive then returns control and it's smooth at 90% after that. No crash.
I think FF16 eats more memory so it's more smooth after the 1st hang.

I have to say the computer didn't really appreciate these tests, I had to reboot because applications were slow even if testing was over.

about:support for FF16:

Graphics
      
Adapter Description Intel(R) HD Graphics 3000
Vendor ID 0x8086
Device ID 0x0116
Adapter RAM Unknown
Adapter Drivers igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32
Driver Version 8.15.10.2696
Driver Date 3-19-2012
Adapter Description (GPU #2) NVIDIA GeForce GT 550M
Vendor ID (GPU #2) 0x10de
Device ID (GPU #2) 0x0df6
Adapter RAM (GPU #2) 2047
Adapter Drivers (GPU #2) nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Driver Version (GPU #2) 8.17.13.142
Driver Date (GPU #2) 5-15-2012
Direct2D Enabled true
DirectWrite Enabled true (6.1.7601.17789)
ClearType Parameters Gamma: 2200 Pixel Structure: RGB ClearType Level: 100 Enhanced Contrast: 50 
WebGL Renderer Google Inc. -- ANGLE (Intel(R) HD Graphics 3000) -- OpenGL ES 2.0 (ANGLE 1.0.0.1041)
GPU Accelerated Windows 1/1 Direct3D 10
AzureBackend direct2d      
      
JavaScript         
            
Incremental GC 1
Thanks for the data Loic, and yes I meant about:support, sorry. So the only difference I see here is that you are using integrated graphics card (the Intel one). I use only discrete graphics (nvidia), so I rebooted with the integrated Intel GPU, and indeed, I get the same behaviour as you report - quickly escalating memory usage, leading to a crash. On Nightly I get the occaisional GC which slows things down, but not by much - I still get the crash, it looks like not one GC happens before I crash on beta (14).

So it looks like this is caused by some combination of HWA rendering and integrated graphics card, or possibly some issue with a specific Intel card or its driver (about:support shows "Intel(R) HD Graphics Family" for me, not sure if that is due to an older driver and we have the same card, or we have different cards).

At least I can recreate the bug now, I will look into it.

(If you feel the need, you could try running this on your machine with Optimus disabled and using your NVIDIA card, to confirm that this fixes the problem, but it is not really necessary)
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: WORKSFORME → ---
So, it seems that things aren't being leaked as such, when you pause the game or force a GC from about:memory, then the memory use goes down to reasonable levels. It just seems that GC isn't getting a chance to run or something. I have no idea why changing graphics cards or layers backends would affect this though.
We call JS_updateMallocCounter in nsCanvasRenderingContext2D(Azure) to tell JS that we've allocated non-JS-heap memory for canvas buffers that it needs to take account of in its GC-triggering heuristics so that GCs will be forced to happen so we can release these buffers.

It sounds like something related is happening here, maybe, and something is allocating a lot of video memory and not telling JS so we're not doing timely JS GCs.
Attached patch crash test (obsolete) — Splinter Review
Attachment #641343 - Flags: review?(roc)
Attached patch patch (obsolete) — Splinter Review
Attachment #641344 - Flags: review?(roc)
Attached patch crash test (obsolete) — Splinter Review
Attachment #641343 - Attachment is obsolete: true
Attachment #641343 - Flags: review?(roc)
Attachment #641345 - Flags: review?(roc)
Comment on attachment 641344 [details] [diff] [review]
patch

Review of attachment 641344 [details] [diff] [review]:
-----------------------------------------------------------------

You need a comment somewhere explaining why you're doing this.
Attachment #641344 - Flags: review?(roc) → review?(bas.schouten)
Attached patch patch (obsolete) — Splinter Review
added a comment
Attachment #641344 - Attachment is obsolete: true
Attachment #641344 - Flags: review?(bas.schouten)
Attachment #642526 - Flags: review?(bas.schouten)
Attached patch crash testSplinter Review
increased the number of drawn gradients (still doesn't actually cause a crash on Try, but it does when run locally with the right (wrong?) graphics card). Carrying r=roc.
Attachment #641345 - Attachment is obsolete: true
Attachment #642527 - Flags: review+
Comment on attachment 642526 [details] [diff] [review]
patch

Review of attachment 642526 [details] [diff] [review]:
-----------------------------------------------------------------

So, I think this is the wrong solution. First of all this will create a global record without any intelligence which could definitely cause performance issues and pitfalls once we start re-using gradients more actively. More importantly once we get other backends that also cache information for GradientStops they'll need similar logic. The mistake of creating too many GradientStops here is really in the user of the API. As such I believe this sort of logic should go inside the canvas implementation. A dumb record there might be okay as web authors can then just improve their implementation, and it also can't interfere with UI responsiveness.
Attachment #642526 - Flags: review?(bas.schouten) → review-
I tried Bas's suggestion of removing the D3D10_RESOURCE_MISC_GDI_COMPATIBLE flag from the D3D10 Layer manager code. This prevented the crash and the explosive memory growth, but it did not cure the appalling GC pauses. The fix is therefore OK by itself if we assume GC will naturally improve. Otherwise, we could combine it with the cache limit patch, but with a higher limit. Opinions?
(In reply to Nick Cameron [:nrc] from comment #24)
> I tried Bas's suggestion of removing the D3D10_RESOURCE_MISC_GDI_COMPATIBLE
> flag from the D3D10 Layer manager code. This prevented the crash and the
> explosive memory growth, but it did not cure the appalling GC pauses. The
> fix is therefore OK by itself if we assume GC will naturally improve.
> Otherwise, we could combine it with the cache limit patch, but with a higher
> limit. Opinions?

I'm personally in favor of just removing D3D10_RESOURCE_MISC_GDI_COMPATIBLE and going from there.

I'm not 100% sure how I feel about the GC pauses.
(In reply to Bas Schouten (:bas) from comment #25)
> I'm not 100% sure how I feel about the GC pauses.
What are the pauses like?  If you turn on javascript.options.mem.log then you will get a log of what the GC is doing.  If you copy a few of those entries here we may get some insight into what the GC is doing.

Maybe Bill or Gregor have some ideas how the GC performance could be improved?  roc's comment 15 about JS_updateMallocCounter sounds right to me, not that I know a huge amount about it.
(In reply to Andrew McCreight [:mccr8] from comment #26)
> If you turn on javascript.options.mem.log then
> you will get a log of what the GC is doing.
(it will be logged in the error console)
Hm I can't reproduce the OOM situation with a current nightly. After a few min I am still at around 220MB resident size.
Gregor: you need to have the particular Intel graphics hardware, and maybe some particular driver.

Nick: If you remove the D3D10_RESOURCE_MISC_GDI_COMPATIBLE but change nothing else on trunk (no JS_updateMallocCounter or other GC-related changes), does the problem just go away?

Bas: what are the consequences of removing D3D10_RESOURCE_MISC_GDI_COMPATIBLE?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #29)
> Bas: what are the consequences of removing
> D3D10_RESOURCE_MISC_GDI_COMPATIBLE?

In reality, nothing. As a matter of fact I should have done this a long time ago and I should really do it in more places. We don't actually use this anywhere with Azure-D2D.
Without the fix GC pauses were up to 9sec. With Bas's fix, with a debug build, initially there is no problem, but GC pauses get longer as you play the game, getting up to 1.5sec, but many shorter pauses frequent enough to make the game unplayable. With an optimised build, there is no noticeable GC (I played for about 7-8mins, for debug, it is unplayable after 2-3). The longest logged GC was about 25ms. With a limit on the number of gradients, there was no GC issue, even on debug.
How about taking the D3D10_RESOURCE_MISC_GDI_COMPATIBLE fix until/unless we have more data to justify taking the approach in your patch?
sounds good to me
I'm pretty confused here. First, are there enough calls to JS_updateMallocCounter right now? Does removing this D3D10_RESOURCE_MISC_GDI_COMPATIBLE flag just cover up another problem where we allocate a lot of memory and don't GC?

Also, do we need to do anything to fix GC pauses? Do the pauses only happen with the special graphics card, or with any card? Did the pauses ever happen with an optimized build? Measuring GC pauses in a debug build is pretty useless, since we do all sorts of extra heap walking after the GC finishes to verify correctness.
(In reply to Bill McCloskey (:billm) from comment #34)
> I'm pretty confused here. First, are there enough calls to
> JS_updateMallocCounter right now?

No, I think we need more, the problem is that we have no idea how much memory is being used - it changes depending on the driver.

> Does removing this
> D3D10_RESOURCE_MISC_GDI_COMPATIBLE flag just cover up another problem where
> we allocate a lot of memory and don't GC?

I'm not exactly sure what removing that flag does. I think leads to much less memory being allocated.

> 
> Also, do we need to do anything to fix GC pauses? Do the pauses only happen
> with the special graphics card, or with any card? Did the pauses ever happen
> with an optimized build? Measuring GC pauses in a debug build is pretty
> useless, since we do all sorts of extra heap walking after the GC finishes
> to verify correctness.

Yes, we did get pauses on optimised builds without the fix (around 1/2 second), but only on specific drivers, and actually it usually crashed before we saw many.
(In reply to Nick Cameron [:nrc] from comment #35)
> (In reply to Bill McCloskey (:billm) from comment #34)
> > I'm pretty confused here. First, are there enough calls to
> > JS_updateMallocCounter right now?
> 
> No, I think we need more, the problem is that we have no idea how much
> memory is being used - it changes depending on the driver.

Is that because the driver is the one allocating the memory?

It's not really that important that the number passed to JS_updateMallocCounter be correct. As long as you can make up a number that's close, it's better than nothing. All this call does is decrement a counter that starts at 100MB. When it reaches zero, we GC.

> Yes, we did get pauses on optimised builds without the fix (around 1/2
> second), but only on specific drivers, and actually it usually crashed
> before we saw many.

Could that have been due to paging?
(In reply to Bill McCloskey (:billm) from comment #36)
> (In reply to Nick Cameron [:nrc] from comment #35)
> > (In reply to Bill McCloskey (:billm) from comment #34)
> > > I'm pretty confused here. First, are there enough calls to
> > > JS_updateMallocCounter right now?
> > 
> > No, I think we need more, the problem is that we have no idea how much
> > memory is being used - it changes depending on the driver.
> 
> Is that because the driver is the one allocating the memory?

I assume so.

> 
> It's not really that important that the number passed to
> JS_updateMallocCounter be correct. As long as you can make up a number
> that's close, it's better than nothing. All this call does is decrement a
> counter that starts at 100MB. When it reaches zero, we GC.
> 

The amount would vary from 16k to 200k, and possibly no memory at all if the gradient is never created. Usually in graphics memory, not main memory, not sure if that affects things.

> > Yes, we did get pauses on optimised builds without the fix (around 1/2
> > second), but only on specific drivers, and actually it usually crashed
> > before we saw many.
> 
> Could that have been due to paging?

Do you mean the pauses? They were GC pauses - logged at 0.5s.
Attached patch Bas's patchSplinter Review
Attachment #642526 - Attachment is obsolete: true
Attachment #643422 - Flags: review?(roc)
Comment on attachment 643422 [details] [diff] [review]
Bas's patch

Review of attachment 643422 [details] [diff] [review]:
-----------------------------------------------------------------

Are you going to remove the other uses of this flag? I think Bas wanted to.
Attachment #643422 - Flags: review?(roc) → review+
Blocks: 775215
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39)
> Comment on attachment 643422 [details] [diff] [review]
> Bas's patch
> 
> Review of attachment 643422 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Are you going to remove the other uses of this flag? I think Bas wanted to.

Yes: bug 775215
https://hg.mozilla.org/mozilla-central/rev/4394691ba481
https://hg.mozilla.org/mozilla-central/rev/50a9cfae125a
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Comment on attachment 643422 [details] [diff] [review]
Bas's patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: horrific memory leak leading to crash for some Intel graphics drivers 
Fix Landed on Version: 17
Risk to taking this patch (and alternatives if risky): Low, we never make use of the flag that is removed
String or UUID changes made by this patch: None
Attachment #643422 - Flags: approval-mozilla-esr10?
Attachment #643422 - Flags: approval-mozilla-beta?
Attachment #643422 - Flags: approval-mozilla-aurora?
I don't really know the purpose of this patch but with the latest Nightly, I got a crash:
https://crash-stats.mozilla.com/report/index/bp-a02d455f-4197-4b9f-b910-0b1252120725

I observed more or less the same behavior before the patch landed, maybe slowings appear later, anyway the memory use is still growing up and Firefox finishes to become unresponsive. My laptop fan runs more and the Intel chipset becomes hot.

Check the log of Memchaser add-on: resident memory is horribly high and CC time is horribly long
http://i.imgur.com/llf7I.png
That is worrying, it sounds exactly like the pre-patch behaviour. I'll try with a nightly again when I re-config my graphics cards. The behaviour was definitely fixed for me when I tested last week.
I just tried again with a nightly build, built from m-c, pulled 25/7/12, and the memory use was fine, never got above 150k (checked with task manager), and no crash.

Are you using Optimus? That is the only difference (that you can see a 2nd graphics card) I can see from your about:support dump further up. If you are, could you test with Optimus disabled and just the integrated graphics card?

If it is not that, then I am out of ideas. Bas, any clue?
No, I don't use Optimus with Firefox because Firefox has blocked the GPU drivers (NVIDIA GeForce GT 550M / 8.17.13.142 / 5-15-2012) and as I want to have HWA enabled, I have selected the integrated graphics for Firefox in Nvidia config manager.
Hmm, I have no idea then. You have more recent drivers than me, by a few months, and a different (but similar) device id. I don't see the information about GPU #2 (I've only seen this with Optimus on, which is why I thought of that, but you are right that you wouldn't get HWA with Optimus). Otherwise our info is identical.

Bas, roc: any ideas what could be causing Loic to still have problems when I don't?
For the record, about:support > graphics: http://i.imgur.com/wgKJ2.png
Loic, can you try a build from the try-push in comment #20 and see if that fixes the problem for you?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> Loic, can you try a build from the try-push in comment #20 and see if that
> fixes the problem for you?

I can try but I'm not familiar with try-push. Is there a link to download a zipped try-build (standalone version)?
(In reply to Loic from comment #52)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> > Loic, can you try a build from the try-push in comment #20 and see if that
> > fixes the problem for you?
> 
> I can try but I'm not familiar with try-push. Is there a link to download a
> zipped try-build (standalone version)?

Loic: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ncameron@mozilla.com-b2ca7fbf80b7/try-win32/firefox-16.0a1.en-US.win32.installer.exe

(Opt Win32 build)
(In reply to Nick Cameron [:nrc] from comment #53)
> (In reply to Loic from comment #52)
> > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51)
> > > Loic, can you try a build from the try-push in comment #20 and see if that
> > > fixes the problem for you?
> > 
> > I can try but I'm not familiar with try-push. Is there a link to download a
> > zipped try-build (standalone version)?
> 
> Loic:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ncameron@mozilla.
> com-b2ca7fbf80b7/try-win32/firefox-16.0a1.en-US.win32.installer.exe
> 
> (Opt Win32 build)

Good news! :)
I tried with this try-build (the zipped version, firefox-16.0a1.en-US.win32.zip) and the resident/private memories stay normal (~ 280-300 MB during 2-3 min of game).
So it sounds like we still need that patch.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #55)
> So it sounds like we still need that patch.

I'd rather like to understand why this is still happening for Loic, to make sure only affected hardware gets this path.
Bas, roc - as the only other Windows users I know of, could you try and reproduce this bug please?
(In reply to Nick Cameron [:nrc] from comment #49)
> Hmm, I have no idea then. You have more recent drivers than me, by a few
> months, and a different (but similar) device id. I don't see the information
> about GPU #2 (I've only seen this with Optimus on, which is why I thought of
> that, but you are right that you wouldn't get HWA with Optimus). Otherwise
> our info is identical.
> 
> Bas, roc: any ideas what could be causing Loic to still have problems when I
> don't?

Is there an intel driver version difference?
(In reply to Bas Schouten (:bas) from comment #58)
> (In reply to Nick Cameron [:nrc] from comment #49)
> > Hmm, I have no idea then. You have more recent drivers than me, by a few
> > months, and a different (but similar) device id. I don't see the information
> > about GPU #2 (I've only seen this with Optimus on, which is why I thought of
> > that, but you are right that you wouldn't get HWA with Optimus). Otherwise
> > our info is identical.
> > 
> > Bas, roc: any ideas what could be causing Loic to still have problems when I
> > don't?
> 
> Is there an intel driver version difference?

Yes, we have slightly different hardware, Loic has 0x0116 and I have 0x0126. We have the same drivers, but he has a more up to date version (I guess they could have regressed, I'll try to update my drivers).
I don't have an Intel GPU, only Nvidia. I don't see the bug.
On my intel box:

Intel(R) HD Graphics 4000
0x0166
8.15.10.2725

A very subjective measurement resulted in the following:

Before the patch, at worst memory usage creeps up to ~500 MB, GC takes several seconds (5-10).
After the patch, at worst memory usage creeps up to ~400 MB, GC takes a couple of seconds (1-2).

Note I manually applied the patch to my local tree to make sure nothing else interfered.

In both cases the experience of the game is affected by the GC obviously. I can't get memory to go anywhere near 1GB on this machine.
So, it looks like you weren't affected by the bug in the first place, which means it is only some Intel drivers which have the problem. We need more data...
I updated my drivers and tried testing again with current nightly - still no bug.
(In reply to Bas Schouten (:bas) from comment #61)
> On my intel box:
> 
> Intel(R) HD Graphics 4000
> 0x0166
> 8.15.10.2725
> 
> A very subjective measurement resulted in the following:
> 
> Before the patch, at worst memory usage creeps up to ~500 MB, GC takes
> several seconds (5-10).
> After the patch, at worst memory usage creeps up to ~400 MB, GC takes a
> couple of seconds (1-2).
> 
> Note I manually applied the patch to my local tree to make sure nothing else
> interfered.
> 
> In both cases the experience of the game is affected by the GC obviously. I
> can't get memory to go anywhere near 1GB on this machine.

Was that with a debug build? Otherwise those GC pauses are huge (and worrying).
Could you guys post GC logs? Even a one second GC pause is terrible. I don't understand why this is happening. GC times should be related to the number of JS objects allocated, which should be independent of graphics hardware. When I run the game myself, it seems to allocate almost nothing in the JS heap.

When you run the game, do you also have tons of other tabs open? Do GCs normally take multiple seconds for you?
(In reply to Bill McCloskey (:billm) from comment #65)
> Could you guys post GC logs? Even a one second GC pause is terrible.

See attachment.
 
> When you run the game, do you also have tons of other tabs open? Do GCs
> normally take multiple seconds for you?

No, the log has been recorded with the latest Nighlty, only one tab open, this one with the game. 
When I'm moving the game cursor with the arrow keys, resident use increases, sometimes memory use relaxes but after 1 or 2 min, resident use grows up to the max and Nightly starts to not respond.
Comment on attachment 643422 [details] [diff] [review]
Bas's patch

[Triage Comment]
It doesn't appear that we have enough data about user pain to take this past Aurora 16. Not worth the possibility of regression (even though we feel this is safe).
Attachment #643422 - Flags: approval-mozilla-esr10?
Attachment #643422 - Flags: approval-mozilla-esr10-
Attachment #643422 - Flags: approval-mozilla-beta?
Attachment #643422 - Flags: approval-mozilla-beta-
Attachment #643422 - Flags: approval-mozilla-aurora?
Attachment #643422 - Flags: approval-mozilla-aurora+
Thanks, Loic! This clears up some of my confusion. All of the time spent in these GCs is in the end_callback phase. That's where we call destructors on C++ objects whose JS wrappers have been swept during the GC. So there are some Gecko objects involved here that are very expensive to clean up.

Bas, if you're able to reproduce the slow GCs, you should be able to figure out which destructors are hurting us by profiling. The stacks of interest will have XPCJSRuntime::GCCallback in them.
We know that the GC pauses are caused by tidying up gradients. The patch which landed mostly fixes this, and GC is back down to reasonable levels (unless Bas was running an optimised build, in which case, more mystery). The only problem is why Loic still sees the old (bad) behaviour even with the patch.
Oh, OK, thanks. I hadn't understood that.
Loic: are you still getting memory problems from this game? (Sorry I've lost track of this bug for a while, if you are still having problems, lets reopen it).
I tested with the latest Nightly: with HWA enabled, it's not fixed but if I disabled HWA, the resident memory use is normal (even if CPU charge growns up).
Probably an issue with Intel drivers.
Re-opening because this is still a problem.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
We should land the patch which limits the number of cached gradients, but we should only do this where necessary. That depends on the driver, so we need to know which drivers are affected. Worst case, we can use the code for all Intel drivers, but it would be nice to be more precise because not all Intel drivers are affected, and there might be performance downsides to limiting caching unnecessarily.
Is it still the problem or we can close this bug as RESOLVED?
Flags: needinfo?(ncameron)
(In reply to Virtual_ManPL [:Virtual] from comment #78)
> Is it still the problem or we can close this bug as RESOLVED?

I'm afraid I don't know, sorry (I no longer work on graphics).
Flags: needinfo?(ncameron)
Thank you very much for reply.
Maybe Robert O'Callahan [:roc] will know.
Flags: needinfo?(roc)
If I understand correctly, only Loic could still reproduce this.
Flags: needinfo?(roc) → needinfo?(epinal99-bugzilla2)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #81)
> If I understand correctly, only Loic could still reproduce this.

I can't reproduce the memory use anymore when playing the game.
Flags: needinfo?(epinal99-bugzilla2)
Status: REOPENED → RESOLVED
Closed: 12 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: