Closed Bug 712822 Opened 10 years ago Closed 10 years ago

Sudden large heap-unclassified (<100MB to >2GB in under 10 minutes) with no browser usage on saucelabs.com

Categories

(Core :: General, defect)

11 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: developer, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P3])

Attachments

(5 files)

Attached file original about:memory
User Agent: Mozilla/5.0 (Ubuntu; X11; Linux x86_64; rv:11.0a1) Gecko/20111213 Firefox/11.0a1
Build ID: 20111213112436

Steps to reproduce:

I had 4 tabs open (an instance of Jenkins, saucelabs.com, two internal site).  With Firefox completely idle (meaning I hadn't touched it for 10+ minutes), I started to notice that my system was bogging down.  htop showed that Firefox was using a lot of memory.  I was able to pull up about:memory on Firefox and it was 3GB used with 2.5GB in heap-unclassified.

I've grabbed three about:memory?verbose.  One with the 4 tabs open (plus about:memory), one with just about:memory, and one after pressing "Minimize Memory Usage"

My enabled extensions are:
about:telemetry 0.9
Adblock Plus 2.0.1
Add-on Compatibility Reporter 1.0.1
Clear Cache 1.2
Cookie Monster 1.1.0
Firebug 1.9.0b4
HttpFox 0.8.10
JSONView 0.7
Morning Coffee 1.35

My enabled plugins are:
Java(TM) Plug-in 1.6.0_26
Shockwave Flash 11.1 r102
Marking as blocking 563700 as this has a large heap-unclassified usage.

I'm not sure where to start looking into this because I wasn't doing anything with the browser when it went from little memory usage to extremely high memory usage.
Blocks: DarkMatter
Whiteboard: [MemShrink]
The first step is to see if you can recreate it.  If you can, then try again with all addons disabled.  If you can't reproduce with addons disabled, then you want to bisect your set of addons to find which one is causing it.
Keywords: mlk
I just attempted to duplicate it by opening the last 20-30 pages of my history and then closing all but about 8.  I then left it open all night.  The memory usage did increase but no where near the extreme that I previously saw.
Unfortunately, without steps to reproduce, this bug is really unlikely to lead anywhere :(
Unless we can reproduce this, there's not much we can do.  :-/
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Understood.  I haven't seen it since but when I do, I'll post some more information and see if I can reproduce it.
I just saw this again.  I was in a different "Workspace" on my box that didn't have firefox visible.  I started to notice the box slowing down and discovered Firefox was using 60% of available memory and was climbing.  It was completely unresponsive as well.

I decided to strace it and for some reason, that made the browser a little bit responsive.  I was able to open up about:memory and noticed it was >2GB in unclassified-heap.  I clicked on the "Minimize Memory Usage" once and it didn't appear to change much but then clicking on it again caused the memory to go away.  The browser is now back to how it was.

It almost feels like strace allowed the browser to get back on its feet, which doesn't seem possible.

I had 8 tabs open when it went beserk and I still have those 8 open (jira, websvn, jenkins, saucelabs, php.net, about:memory, 2 internal sites at the login page).  I don't have Firebug enabled on any of them.  I had the Web Console opened on one of the internal site tabs.

Any ideas?  Would the strace output be useful or maybe a gdb backtrace when it is locked up?
This sounds like one of your extensions is acting up; that's likely what's causing the large heap-unclassified.

A gdb backtrace would be useful, and if it's spinning in JS, so would the output of 

  (gdb) call DumpJSStack()

I'm not sure what I'd do with the strace output, but it wouldn't hurt to attach it...
It looks like part or all of it is happening because of the saucelabs.com tab.  I've duplicated it on my Linux box and a Windows XP box running Nightly with empty profiles.  To duplicate it, you need to do the following:

1 - Login to saucelabs.com (they have a free account setup)
2 - Go to the "My Jobs" page (https://saucelabs.com/jobs)
3 - Execute some tests (see my next comment with the attached file on how to bypass this)
4 - Sit and wait for 30-60 minutes

I'm showing heap-unclassified over 400MB with just that tab open.

I'm going to file a bug report with them to indicate that they have a memory leak on their website but about:memory should have classified this memory.
Status: RESOLVED → UNCONFIRMED
Resolution: INCOMPLETE → ---
Attached file updated_jobs
To bypass the requirement of running tests, made a simple PHP file that would generate the output that the jobs page is expecting in its polling request.  To use:

1 - You need a web server that has ssl.
2 - Place the attached file so that it is reachable as https://<your_site>/rest/v1/<username>/updated_file
3 - Login to saucelabs and go to the jobs page
4 - Change your hosts file to point saucelabs.com to your web server
5 - Watch the jobs page until you start seeing new jobs showing up.  I usually open a new tab pointing directly to https://saucelabs.com/rest/v1/<username>/updated_file and refresh it until I start seeing output.  That lets me know it is working correctly.
6 - Sit and wait

Yes, these steps aren't the simplest, but unless you have a bunch of selenium tests and a lot of minutes in your sauce lab account that you want to spend, this works.
Thanks for narrowing this down and providing the test, Trev, that's enormously helpful.
Summary: Sudden large heap-unclassified (<100MB to >2GB in under 10 minutes) with no browser usage → Sudden large heap-unclassified (<100MB to >2GB in under 10 minutes) with no browser usage on saucelabs.com
Attached file Reduced test case
I made a reduced test case so that everything is in just one file.  I've discovered that you just need at one test running for a long time and it will cause this.  And if I remember correctly, both the cases where I saw the huge memory spikes was because of bad tests that would take >10 minutes to finish.

If I run this file for 5-10 minutes, heap-unclassified is >200M and is over half of all explicit memory.
Attachment #588665 - Attachment mime type: text/plain → text/html
Thanks again, Trev!  Just to clarify, are the new steps to reproduce as follows?

1 - Login to saucelabs.com (they have a free account setup)
2 - Go to the "My Jobs" page (https://saucelabs.com/jobs)
3 - Execute the reduced test case
4 - Sit and wait for 30-60 minutes

And what are the details of step 3?
Sorry, the new steps would be:

1 - Open the reduced test case attachment
2 [review] - Wait for 10-20 minutes

The reduced test case has all the code from saucelabs in it and it doesn't need to do an ajax request either.
Excellent, that's much easier.  Thanks for simplifying the test case.
Trev, comment 4 mentioned add-ons.  We find that add-ons very frequently are the cause of high memory usage these days.  Have you tried your test after restarting in safe mode, which disables all add-ons?  Knowing if the problem still exists with add-ons disabled would be very helpful info for isolating the problem.
The saucelabs steps from comment 11 were duplicated without any add-ons.  It was a brand new profile.
I tried the steps from comment 11 in a new profile on Linux.  After 34 minutes my "explicit" memory usage is 70MB, only a little higher than when I started.
Try the test case in comment 14.  If I open it in a clean profile on Windows XP, after 10 minutes about:memory shows:

439.21 MB (100.0%) -- explicit
├──284.19 MB (64.71%) -- js
│  ├──234.22 MB (53.33%) -- compartment(https://bug712822.bugzilla.mozilla.org/attachment.cgi?id=588665)
│  │  ├──121.65 MB (27.70%) -- gc-heap
│  │  │  ├───99.01 MB (22.54%) -- objects
│  │  │  │   ├──80.39 MB (18.30%) -- non-function
│  │  │  │   └──18.62 MB (04.24%) -- function
│  │  │  ├───11.90 MB (02.71%) -- arena
│  │  │  │   ├──11.22 MB (02.55%) -- unused
│  │  │  │   └───0.68 MB (00.15%) -- (2 omitted)
│  │  │  ├────6.75 MB (01.54%) -- shapes
│  │  │  │    ├──5.33 MB (01.21%) -- dict
│  │  │  │    └──1.42 MB (00.32%) -- (2 omitted)
│  │  │  ├────3.91 MB (00.89%) -- strings
│  │  │  └────0.08 MB (00.02%) -- (2 omitted)
│  │  ├──104.00 MB (23.68%) -- object-slots
│  │  ├────3.30 MB (00.75%) -- shapes-extra
│  │  │    └──3.30 MB (00.75%) -- (4 omitted)
│  │  ├────2.90 MB (00.66%) -- string-chars
│  │  └────2.37 MB (00.54%) -- (5 omitted)
│  ├───26.40 MB (06.01%) -- xpconnect
│  ├────9.48 MB (02.16%) -- gc-heap-chunk-dirty-unused
│  ├────7.06 MB (01.61%) -- (8 omitted)
│  └────7.02 MB (01.60%) -- compartment([System Principal], 0x3578000)
│       ├──3.61 MB (00.82%) -- gc-heap
│       │  └──3.61 MB (00.82%) -- (7 omitted)
│       └──3.41 MB (00.78%) -- (8 omitted)
├──144.21 MB (32.83%) -- heap-unclassified
├────6.56 MB (01.49%) -- storage
│    ├──5.27 MB (01.20%) -- sqlite
│    │  └──5.27 MB (01.20%) -- (13 omitted)
│    └──1.29 MB (00.29%) -- (1 omitted)
└────4.25 MB (00.97%) -- (10 omitted)

One thing I've noticed is that if you only have completed tests, the memory doesn't grow.  You have to have tests under progress.  It looks like it is because of the following pieces of code on saucelabs.com:

    var videoLink = $("<a>");
    videoLink.html("Video");
    videoLink.attr("target", "_blank");
    videoLink.css("cursor", "pointer");
    videoLink.click(function(e) {
      jobsPage.showVideo(v);
    });

    var logLink = $("<a>");
    logLink.html("Log");
    logLink.attr("target", "_blank");
    logLink.css("cursor", "pointer");
    logLink.click(function(e) {
      jobsPage.showLog(v);
    });

    var sep = $("<span>");
    sep.css("color", "lightgray");
    sep.html(" | ");

They aren't cleaning up those elements from jQuery's cache when the test is in progress and so they stay around.  I made a small page that just created the videoLink element in a loop and the memory grew rapidly.  If I add videoLink.remove(), the memory stays constant.
> They aren't cleaning up those elements from jQuery's cache when the test is in progress and so they 
> stay around.

So then this is a bug in the page, not in Gecko?
Er, I guess the high-ish heap-unclassified is a Gecko bug.  33% is not completely ridiculous, though.

Nick, do you think we likely have existing dark matter bugs on whatever this is (DOM node-related things?)?
> Nick, do you think we likely have existing dark matter bugs on whatever this
> is (DOM node-related things?)?

Hard to say, I haven't reproduced the problem yet.  If you held a gun to my head I'd guess it was orphaned DOM nodes.
Whiteboard: [MemShrink] → [MemShrink:P3]
Sauce Labs made a change to their website that appears to have removed this problem.  So, unless you want this bug open to track the dark matter, I think this bug can be closed.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.