Closed Bug 1102487 Opened 10 years ago Closed 8 years ago

Memory leak with canvas JS since 26.0 ctx.arc followed by ctx.fill when called in setInterval X server

Categories

(Core :: JavaScript Engine: JIT, defect)

27 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bugzilla.mozilla, Unassigned)

References

Details

(Keywords: qawanted, Whiteboard: [MemShrink:P2])

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0
Build ID: 20141011074935

Steps to reproduce:

run very specific Javascript in any release >26 (latest tested version 33.1)

I have currently spent a couple of months trying to track this down. Creating over 40 versions of the Javascript and testing almost 30 different releases.

<canvas id="canvasId"></canvas>
	var context = document.getElementById('canvasId').getContext("2d");

	setInterval(function(){
		context.beginPath();
		context.arc(0, 0, 10, 0, 2 * Math.PI, true);
		context.fill();
		context.closePath();
	}, 1);

If the code is called in a loop rather than setInterval the leak does not occur, if the call to fill() is not called the leak does not occur. Varying the parameters does not affect the leak and increasing the time only makes the leak take longer. Removing the closePath() does not affect the leak.

I ran the code for 30mins apx for each FF release and noted that the X process increased by 3%. When left over night this would consume all system memory.

phenon ~ # ps -eo pmem,vsize,cmd| sort -k 1 -nr | head -5
 4.3 555668 /usr/bin/X -nolisten tcp :0 -auth /home/user/.serverauth.11848
 1.9 703344 ./firefox
 0.8 406872 /usr/bin/claws-mail
 0.6 149992 spamd child
 0.5 140232 spamd child

30mins later

phenon ~ # ps -eo pmem,vsize,cmd| sort -k 1 -nr | head -5
 7.1 785632 /usr/bin/X -nolisten tcp :0 -auth /home/user/.serverauth.11848
 1.9 679704 ./firefox
 0.8 406872 /usr/bin/claws-mail
 0.6 149992 spamd child
 0.5 140232 spamd child

close Firefox

phenon ~ # ps -eo pmem,vsize,cmd| sort -k 1 -nr | head -5
 7.0 765920 /usr/bin/X -nolisten tcp :0 -auth /home/user/.serverauth.11848
 0.8 406872 /usr/bin/claws-mail
 0.6 149992 spamd child
 0.5 140232 spamd child
 0.3 532508 gedit

Please note I have not been able to replicate this leak in Windows or a Linux virtual machine. However it consistently occurs on my 64bit Linux machine. Running <27 version has no leak.

Linux phenon 3.12.21-gentoo-r1 #1 SMP Sun Jun 22 22:06:54 BST 2014 x86_64 AMD Phenom(tm) II X4 850 Processor AuthenticAMD GNU/Linux
 - Sapphire ATI Radeon HD 6450 - 8G DDR3 



Actual results:

Memory was allocated to the X server process and was not released when Firefox was closed. Non leaking Firefox versions <27 not only released all memory when existing but never allocated the amount of bytes that the leaking versions did.


Expected results:

All memory released.
Mozilla Firefox-26.0 no leak
Mozilla Firefox-27.0 leak

Mozilla Firefox-26.0b5 no leak
Mozilla Firefox-26.0b8 no leak
Mozilla Firefox-26.0b10 no leak
Mozilla Firefox-27.0b1 leak
Mozilla Firefox-27.0b2 leak
Mozilla Firefox-27.0b4 leak
Mozilla Firefox-27.0b5 leak
about:memory logs from before and then after the leak
njn, is this something you can help diagnose further?
Flags: needinfo?(n.nethercote)
Whiteboard: [MemShrink]
mccr8's much better at analyzing CC logs than me.

Memory measurements from about:memory when memory usage gets high (i.e. press the "measure and save" button) might be helpful.

Another question: does it still happen if you set layers.acceleration.disabled to true in about:config?
Flags: needinfo?(n.nethercote) → needinfo?(continuation)
Attached file memory-report.json.gz
"measure and save" memory report as requested. This was with "layers.acceleration.disabled" set true.
"layers.acceleration.disabled" set true still exhibited the memory leak.
Keywords: qawanted
Whiteboard: [MemShrink] → [MemShrink:P2]
Thanks for the detailed investigation so far. A couple more ideas:

- about:config has a "gfx.xrender.enabled" preference. You could try setting it to false. Before doing so, it might be a good idea to create a new profile to do it in, because I have found that pref can render Firefox unusable!

- There's a tool called mozregression (http://mozilla.github.io/mozregression/) which can be used to bisect regressions down to a particular Nightly build. A smaller regression range makes it much easier to identify the problem.
It sounds like this is some kind of graphics-y leak, so I don't think the CC logs will actually help.  The actual Firefox memory in the about:memory you attached looks reasonable.
Flags: needinfo?(continuation)
"gfx.xrender.enabled" set false, still exhibited the memory leak, even in a new profile.

I will have a look at mozregression, thanks.
I tried mozregression. It narrowed it down to
1018:58.69 LOG: MainThread Bisector INFO Last good revision: 6ecf0c4dfcbe
1018:58.69 LOG: MainThread Bisector INFO First bad revision: 725c36b5de1a
mozregression --good=2013-12-04 --bad=2013-12-05
but the next step of fetching and build did not work (just reams and reams of build output that ends with bisectRecurse errors over and over again).

Do you know how to get the individual check ins or any other about:config settings I can try?
The changesets listed give this range, but it includes changesets from nov 27 to dec 5. the dates you have listed are only dec 4 to dec 5, so I'm not sure whats going.
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6ecf0c4dfcbe&tochange=725c36b5de1a
(In reply to Timothy Nikkel (:tn) from comment #11)
> The changesets listed give this range, but it includes changesets from nov
> 27 to dec 5. the dates you have listed are only dec 4 to dec 5, so I'm not
> sure whats going.
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=6ecf0c4dfcbe&tochange=725c36b5de1a

Nightly from the 4th gives:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9688476c1544&tochange=725c36b5de1a

which would make more sense.
I talked to Julian over breakfast, and he said he might be interested in what's going on here. :-)
Flags: needinfo?(jseward)
Just tried 36.0b5 and the issue still exists. mozregression has been updated but is still unable to run, a bug had already been reported.

Is there any way to get a list of all the relevant commits that I can then manually pull and build one by one?
Assuming the 4th and 5th of december 2013 are accurate nightly dates, the changesets in question are these:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9688476c1544&tochange=725c36b5de1a

Keep in mind that because of merges, that history isn't entirely linear, but you should be able to narrow it down initially to which merge caused the regression, and then narrow it down further from there.
I have been working my way through the commits and found the issue is introduced after 
1fe0178cd92d(Tue Dec 03 07:53:58 2013 +0000)
in the 40 commits of 
71088609c1f3(Tue Dec 03 09:14:36 2013 +0000)

I have tried a number of those commits, many would not build, but some did and did not exhibit the issue.http://hg.mozilla.org/mozilla-central/rev/6787bcb8ea7e
does.

Is it reasonable to assume that the issue is in that one commit or do I need to keep trying all the commits?

Can anyone see anything odd or interesting in that commit?
What was the last commit which built and did not exhibit the problem in your testing? FWIW, here's what should be the equivalent mozilla-inbound range:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=044c28763a8d&tochange=6787bcb8ea7e
Out of those 18 commits in the inbound range;
f8b57cbe128a - f047d2032ad7 Failed to build, same error(http://pastebin.com/12jhQnby)
d6f47a333fe1, 04f25103c4a9, cfe7047da1a0, 942d149a7c0c all built but did not exhibit the issue then the next one 6787bcb8ea7e did.
The code in /js/src/jit/MIR.cpp (line 1424) that changed

return !isTruncated();

to 

return !isTruncated() &&
(isUnsigned() || canBeDivideByZero() || canBeNegativeDividend());

Is the culprit. I reverted it back to the original state and then ran my test with no issues, then put them back just to make sure and the issue occurred again.

But I am unsure what to do now. I do not want to create a pull request to remove those additional conditions if other code/commits relies on them being there. This code is now over 14 months old.
Needinfo'ing per comment #19 - thanks for the detective work!
Flags: needinfo?(emanuel.hoogeveen)
Product: Firefox → Core
Indeed, great work! That would be bug 944963 then. Dan, over to you.
Blocks: 944963
Flags: needinfo?(emanuel.hoogeveen) → needinfo?(sunfish)
Given the amount of work bluenuht has done to find the regression range, I think we can bear to mark this as NEW :) I also updated the component to match bug 944963.

bluenuht, one last thing that might be helpful as we wait for Dan's needinfo is to try the latest Nightly and see if the problem still exists. I don't know what the Nightly situation is like for Linux, but you can also just build the latest mozilla-central to get something almost equivalent.
Status: UNCONFIRMED → NEW
Component: Untriaged → JavaScript Engine: JIT
Ever confirmed: true
I'm at a conference this week, but I do plan to investigate this when I'm back next week.
Ok, I'm confused. The culpable change identified in comment 19 is adding conditions to MMod::fallible(), making it return true in fewer cases. The only effect of this should be that the JIT creates fewer unused LSnapshot instances. This should mean that we allocate slightly less memory, if anything, so I don't currently understand how it could cause a leak.
I concur with Dan analysis, in addition, all the Jit compiler allocations are made on another thread which is using a LifoAlloc which is wipe out of memory when the compilation ends (no hand-made malloc).

So I do not see how the lines highlighted in comment 19 can play any role in adding such leak, especially to another process.

Can you set "javascript.options.ion" to false, and see if it can still be reproduced, and bisect again?
Do you want the good news or the bad?
The good news is I cannot replicate it any more and the bad news is I cannot replicate it any more.

I tried reverting to the bad commit again with 
hg update -r 6787bcb8ea7e
./mach clobber
./mach configure
./mach build

on the grounds I would try each of the three conditions one at a time. But I could not get the leak even with deleting and recloning. I then tried another newer commit that I knew failed. manually built and again no leak. Tried a know bad binary release 28.0 and my current 31.5.0 binary and neither leaked.

I have updated my Gentoo system including kernel and graphics drivers in the last rew weeks. So I can only conclude that that had something to do with it.

But just to recap and for my sanity;
 Leak only occured in Firefox, this machine has an up-time measured in weeks and months, I would have noticed another appilcation being involved.
 Leak only occured when using the looped JavaScript for drawing circles.
 I could never replicate this issue on any other machine including virtual machines.
 The leak exposed itself as memory that was allocated against the X process and that was never returned on exiting Firefox.
 Leak was consistently reproducable with certain versions/builds of Firefox.

Before closing this issue I would love it if anyone could chime in on anything that could have caused this. Kernel, binary graphics card driver etc.

Thanks
Flags: needinfo?(sunfish)
No comments for two weeks and I have been running my code for days at a time and have seen no evidence that X memory is increasing.

Please feel free to close this, and thanks for all your help.
Flags: needinfo?(jseward)
Thanks for the investigation and letting us now this is possibly resolved.
I'm going to close this. Please open a new bug if you encounter this again.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: