Closed Bug 1081926 Opened 5 years ago Closed 5 years ago

memory leak on freeciv HTML5 game causing computer to be unusable

Categories

(Core :: Graphics, defect, critical)

32 Branch
x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla36
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 + fixed
firefox35 + fixed
firefox36 + fixed
firefox-esr31 --- unaffected
b2g-v2.0 --- affected
b2g-v2.0M --- affected
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: georgezhu, Assigned: mwu)

References

()

Details

(Keywords: crash, regression, Whiteboard: [MemShrink:P1])

Attachments

(3 files, 3 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140924083558

Steps to reproduce:

1. Start Firefox (tested with Ubuntu repository's 32.0.3 and nightly 35.0a1)
2. Go to http://play.freeciv.org/webclient/?action=load&load=tutorial&scenario=true
3. Computer becomes unusable(see results)


Actual results:

My 4 GiB RAM is immediately filled up so my computer becomes unusable. The game never stops loading.
See collected data: http://pastebin.com/sXCd8qiJ

Shutting down Firefox by closing it or using killall becomes impossible because of the lack of memory. Video and audio stutters heavily and it's impossible to start a new program. The only way to regain control is by restarting the computer.

See reddit thread for original report: https://www.reddit.com/r/gamedev/comments/2j13cj/gamedev_please_review_freecivweb/cl7giu2


Expected results:

The game starts after a few seconds of loading and uses up ~138 MiB.

This is what happens if I visit the URL with chromium-browser. According to `freecivnet' in the reddit thread,this is also what happens with both Chrome and Firefox on Windows 8.
Hard to save a memory log, the memory use is really high and make the machine hang.

Regression range:
good=2014-10-01
bad=2014-10-02
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=14665b1de5ee&tochange=2399d1ae89e9
Keywords: regression
Whiteboard: [MemShrink]
Version: 32 Branch → 35 Branch
I assume that Loic tested on Windows like me. Please correct me if I'm wrong !
According to the reporter this happens with Fifrefox32 on Linux. The windows regression is a recent issue on trunk.

windows regression range:
good: 
http://hg.mozilla.org/mozilla-central/rev/165c3fd176ec

Bad
http://hg.mozilla.org/mozilla-central/rev/7f08e93cc2a5

so either bug 1075621 or bug 902952
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: Untriaged → Graphics
Flags: needinfo?(bas)
Product: Firefox → Core
I just tried Firefox trunk (f547cf19d104 and a couple of slightly older revisions) on Ubuntu 14.04 and failed to reproduce. I played for several turns, built a city, and RSS didn't go above ~400 MiB.

Getting memory reports (visit about:memory and click on "Measure and save...") would be really useful, though Loic says performance is so bad this is hard to do.
georgezhu, can you visit about:support in Firefox, click on "Copy text to clipboard" and then paste the results into a comment here? That will give us information about your graphics card, which may be relevant. Thank you.
Flags: needinfo?(georgezhu)
Attached image taskmanager
I can reproduce on windows7.
100% physical memory usage and gigantic virtual memory usage.
And finally, my pc is completely unresponsive. Reset PC....


Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/165c3fd176ec
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0 ID:20141001112720
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d954ed24e795
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0 ID:20141001113823

Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=165c3fd176ec&tochange=d954ed24e795

Regressed by: Bug 902952
Attached file memory.txt
about:memory (I was able to barely acquire it)
Graphics
--------

Adapter Description: ATI Radeon HD 4300/4500 Series
Adapter Drivers: aticfx64 aticfx64 aticfx32 aticfx32 atiumd64 atidxx64 atiumdag atidxx32 atiumdva atiumd6a atitmm64
Adapter RAM: 512
ClearType Parameters: Gamma: 2200 Pixel Structure: R ClearType Level: 50 Enhanced Contrast: 200
Device ID: 0x954f
Direct2D Enabled: true
DirectWrite Enabled: true (6.2.9200.16492)
Driver Date: 4-29-2013
Driver Version: 8.970.100.1100
GPU #2 Active: false
GPU Accelerated Windows: 1/1 Direct3D 11 (OMTC)
Subsys ID: 00000000
Vendor ID: 0x1002
WebGL Renderer: Google Inc. -- ANGLE (ATI Radeon HD 4300/4500 Series Direct3D9Ex vs_3_0 ps_3_0)
windowLayerManagerRemote: true
AzureCanvasBackend: direct2d
AzureContentBackend: direct2d 1.1
AzureFallbackCanvasBackend: cairo
AzureSkiaAccelerated: 0
OS: Linux → All
Thanks Alice!

All the non-graphics stuff in the memory report looks reasonable.  There are a few graphics measurements that are very high:

3,893.28 MB ── gfx-d2d-vram-source-surface
4,114.96 MB ── gpu-committed
  505.65 MB ── gpu-dedicated
1,315.55 MB ── gpu-shared
1,805.88 MB ── vsize-max-contiguous
[Tracking Requested - why for this release]: Hang/crash

umm,
I can reproduce on Firefox32 ubuntu12.04 32bit, but not Firefox31.
(PC crashes before killing browser. so, it is difficult to get about:memory and regression range)

So, Bug 902952 seems trigger something existing bug which is introduced in Firefox32.
OS: All → Linux
Version: 35 Branch → 32 Branch
Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/684d2cee3f2d
Mozilla/5.0 (X11; Linux i686; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140606102635
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/79624417d247
Mozilla/5.0 (X11; Linux i686; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140606104333
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=684d2cee3f2d&tochange=79624417d247

Regressed by: Bug 994081
Blocks: 994081
VMWare Player

Graphics
--------

Adapter Description: VMware, Inc. -- Gallium 0.4 on SVGA3D; build: RELEASE;
Device ID: Gallium 0.4 on SVGA3D; build: RELEASE;
Driver Version: 2.1 Mesa 8.0.4
GPU Accelerated Windows: 0/1 Basic
Vendor ID: VMware, Inc.
WebGL Renderer: VMware, Inc. -- Gallium 0.4 on SVGA3D; build: RELEASE;
windowLayerManagerRemote: false
AzureCanvasBackend: cairo
AzureContentBackend: cairo
AzureFallbackCanvasBackend: none
AzureSkiaAccelerated: 0
[Tracking Requested - why for this release]:
VM crashes. I cannot get crash sig.
Severity: normal → critical
Keywords: crash
Here's the info requested by Nicholas. I've also tried loading the site with all plugins disabled but the result was the same as before:

Application Basics
------------------

Name: Firefox
Version: 32.0.3
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:32.0) Gecko/20100101 Firefox/32.0

Crash Reports for the Last 3 Days
---------------------------------

All Crash Reports

Extensions
----------

Name: Adblock Plus
Version: 2.6.4
Enabled: true
ID: {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}

Name: HTTPS-Everywhere
Version: 4.0.1
Enabled: true
ID: https-everywhere@eff.org

Name: NoScript
Version: 2.6.9
Enabled: true
ID: {73a6fe31-595d-460b-a920-fcc0f8843232}

Name: Self-Destructing Cookies
Version: 0.4.6
Enabled: true
ID: jid0-9XfBwUWnvPx4wWsfBWMCm4Jj69E@jetpack

Name: Ubuntu Firefox Modifications
Version: 2.9
Enabled: true
ID: ubufox@ubuntu.com

Name: User Agent Switcher
Version: 0.7.3
Enabled: true
ID: {e968fc70-8f95-4ab9-9e79-304de2a71ee1}

Name: Form History Control
Version: 1.3.3.0
Enabled: false
ID: formhistory@yahoo.com

Graphics
--------

Adapter Description: ATI Technologies Inc. -- AMD Radeon R9 200 Series
Device ID: AMD Radeon R9 200 Series
Driver Version: 4.3.12798 Compatibility Profile Context 13.35.1005
GPU Accelerated Windows: 0/1 Basic
Vendor ID: ATI Technologies Inc.
WebGL Renderer: ATI Technologies Inc. -- AMD Radeon R9 200 Series
windowLayerManagerRemote: false
AzureCanvasBackend: cairo
AzureContentBackend: cairo
AzureFallbackCanvasBackend: none
AzureSkiaAccelerated: 0

Important Modified Preferences
------------------------------

accessibility.typeaheadfind.flashBar: 0
accessibility.warn_on_browsewithcaret: false
browser.cache.disk.capacity: 358400
browser.cache.disk.smart_size_cached_value: 358400
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size.use_old_max: false
browser.cache.frecency_experiment: 4
browser.places.smartBookmarksVersion: 7
browser.sessionstore.upgradeBackup.latestBuildID: 20140924083558
browser.startup.homepage: https://duckduckgo.com/
browser.startup.homepage_override.buildID: 20140924083558
browser.startup.homepage_override.mstone: 32.0.3
browser.tabs.warnOnClose: false
dom.max_chrome_script_run_time: 40
dom.max_script_run_time: 40
dom.mozApps.used: true
extensions.lastAppVersion: 32.0.3
general.autoScroll: true
network.cookie.prefsMigrated: true
places.database.lastMaintenance: 1413225666
places.history.expiration.transient_current_max_pages: 100590
plugin.disable_full_page_plugin_for_types: application/pdf
plugin.importedState: true
plugin.state.flash: 0
plugins.notifyMissingFlash: false
privacy.sanitize.migrateFx3Prefs: true
storage.vacuum.last.index: 1
storage.vacuum.last.places.sqlite: 1413057448

JavaScript
----------

Incremental GC: true

Accessibility
-------------

Activated: false
Prevent Accessibility: 0

Library Versions
----------------

NSPR
Expected minimum version: 4.10.6
Version in use: 4.10.6

NSS
Expected minimum version: 3.16.5 Basic ECC
Version in use: 3.16.5 Basic ECC

NSSSMIME
Expected minimum version: 3.16.5 Basic ECC
Version in use: 3.16.5 Basic ECC

NSSSSL
Expected minimum version: 3.16.5 Basic ECC
Version in use: 3.16.5 Basic ECC

NSSUTIL
Expected minimum version: 3.16.5
Version in use: 3.16.5

Experimental Features
---------------------
Flags: needinfo?(georgezhu)
Thanks for the interest in fixing this bug! I'm the developer of the Freeciv HTML5 client, please let me know if you discover anything that I can do in the code to improve the memory usage or performance of this game.
There's a few issues here, but the big one is the CanvasImageCache doesn't really work well. It seems to generate different entries for the same image. Previously, that was less of an issue since the gfxASurface version of the image would store the optimized SourceSurface version of the image, but now we might generate new SourceSurfaces for every image that doesn't hit the CanvasImageCache. And in Freeciv's case, it misses CanvasImageCache a lot.

This is a particularly big problem on X11, where OptimizeSourceSurface doesn't expect the caller to pass in an already optimized image, so it ends up generating a new one. I have a patch for that that will go in a new bug.
Depends on: 1082745
It seems like there are two distinct problems triggered by FreeCiv: the Windows problem and the Linux problem. Should we split this bug in two to track the two problems separately?
The windows problem is being investigated in https://bugzilla.mozilla.org/show_bug.cgi?id=1082827 .

However, there's a core platform independent issue which I'm leaving this bug for.
Depends on: 1082827
Whiteboard: [MemShrink] → [MemShrink:P1]
mwu: with bug 1082745 fixed, I think the problem should be fixed for georgezhu, right?

Assuming that's right, if georgezhu is willing to try a Nightly build (see nightly.mozilla.org) then hopefully the problem will be fixed.
It looks like we're missing the cache because the canvas is different every time.
(In reply to Michael Wu [:mwu] from comment #20)
> It looks like we're missing the cache because the canvas is different every
> time.

Could you please elaborate on this? Does this mean that I can change the Javascript 
and canvas usage in Freeciv-web to work around the issue?
>Assuming that's right, if georgezhu is willing to try a Nightly build (see nightly.mozilla.org) then hopefully the problem will be fixed.

I've just tried the Nightly build and the problem was gone. I guess it's fixed now. Thanks all for looking into the problem!
This adds a second hashtable in the cache which allows us to do another lookup that ignores the canvas. Still need to clean it up a bit but I'm going to send it to try and see how that works out.
Assignee: nobody → mwu
(In reply to Andreas Røsdal from comment #21)
> (In reply to Michael Wu [:mwu] from comment #20)
> > It looks like we're missing the cache because the canvas is different every
> > time.
> 
> Could you please elaborate on this? Does this mean that I can change the
> Javascript 
> and canvas usage in Freeciv-web to work around the issue?

Basically it means that you should try to recycle canvases and draw images to the same one if possible. Don't know if that's reasonable for you to do though. It looks like the code is using canvas to split the tileset into individual tiles. It should be better to draw individual tiles directly out of the tileset image. Only issue is that the width of the tileset may make drawing out of the tileset image less efficient. You can fix that by making the tileset the width of the widest tile so the tileset ends up being a very tall column of tiles.

If you do fix it, it would be convenient to have the old one around so it's easy to test this bug. :)

FWIW I suspect there's other things that can be optimized - seems like this game shouldn't be eating so much CPU while sitting idle in a turn.
Now with less crashing, hopefully.
Attachment #8508229 - Attachment is obsolete: true
Attachment #8509002 - Attachment is obsolete: true
Attachment #8510525 - Flags: review?(matt.woodrow)
I think this also affects b2g since it uses skia as the canvas backend.
B2G is affected but since it has a limit on the canvas image cache memory usage, it doesn't get taken down. Fixing this can still improve performance if images fit within the image cache, however.
Comment on attachment 8510525 [details] [diff] [review]
Fallback on a simple lookup when the normal lookup fails, v3

Review of attachment 8510525 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry about the slow response on this.

How does this help? Wasn't the issue that a single image was being used on multiple Elements? This seems like it would only help the case where a single image element is being drawn to multiple canvases.
Yup - I misdiagnosed the issue the first time. What freeciv is doing is drawing a large image containing many tiles to many different canvases which each hold a tile. The canvas are then used to draw to the main canvas.
Comment on attachment 8510525 [details] [diff] [review]
Fallback on a simple lookup when the normal lookup fails, v3

Review of attachment 8510525 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, sounds good then.

Maybe add a comment/todo about using something specific to the image (rather than the element) as a key so we can solve that use case too, if it ever matters.
Attachment #8510525 - Flags: review?(matt.woodrow) → review+
Got a bit bitrotted, so I'm putting it up on try one more time - https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2f6ec7f95a3f
See Also: → 1085267
Unbitrotted
Attachment #8510525 - Attachment is obsolete: true
Duplicate of this bug: 1084586
Duplicate of this bug: 1085267
Keywords: checkin-needed
See Also: 1085267
https://hg.mozilla.org/mozilla-central/rev/b365ed022337
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
mwu - Assuming this is good on m-c, what do you think about uplifting to 34 and 35, which are both marked as affected?
Flags: needinfo?(mwu)
Comment on attachment 8513888 [details] [diff] [review]
Fallback on a simple lookup when the normal lookup fails, v4

Approval Request Comment
[Feature/regressing bug #]: Regressed by bug 994081.
[User impact if declined]: Sites which render large images to multiple canvas run slowly and may cause the browser to run out of memory. This would only impact systems where the canvas and content backends are different. At the moment, that includes Windows 7+, starting with Gecko 35, and Android since Gecko 32. B2G is also technically affected but it has a hard limit on canvas image caching which prevents any out of memory issues.
[Describe test coverage new/current, TBPL]: No new tests. Possible to test, though the test would only work on platforms where content and canvas backends are different.
[Risks and why]: I think the risk is low, but the patch isn't trivial either. It's the most conservative fix I can think of, however.
[String/UUID change made/needed]: None.
Flags: needinfo?(mwu)
Attachment #8513888 - Flags: approval-mozilla-beta?
Attachment #8513888 - Flags: approval-mozilla-aurora?
Comment on attachment 8513888 [details] [diff] [review]
Fallback on a simple lookup when the normal lookup fails, v4

I understand from comment 43, that this fix should have no impact on desktop but will impact mobile. Let's get this into beta6 to test.

Beta+
Aurora+
Attachment #8513888 - Flags: approval-mozilla-beta?
Attachment #8513888 - Flags: approval-mozilla-beta+
Attachment #8513888 - Flags: approval-mozilla-aurora?
Attachment #8513888 - Flags: approval-mozilla-aurora+
Bas, please address your needinfos. Just clear them with a brief explanation if you don't have anything to add.
Flags: needinfo?(bas)
Depends on: 1096913
(In reply to Nicholas Nethercote [:njn] from comment #47)
> Bas, please address your needinfos. Just clear them with a brief explanation
> if you don't have anything to add.

I get a lot of them. I usually have something to add but I address them by priority rather than date (if I some day find the time to clear my queue, it would probably be better ;-)). I'd rather leave them open than say I have nothing to add when I might. So when I do answer a ni? I like the answer to be true rather than just an excuse to close it :-). Anyway, didn't mean to spam this bug with it, but I happened to see this when I found a regression this caused.
You need to log in before you can comment on or make changes to this bug.