Closed Bug 404645 Opened 17 years ago Closed 16 years ago

memory usage spikes dramatically while doing nothing with fresh profile

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: myk, Assigned: dcamp)

References

Details

(Keywords: memory-footprint, regression)

Attachments

(1 file)

If I start the latest nightly (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b2pre) Gecko/2007112004 Minefield/3.0b2pre) with a fresh profile and then let it sit there with its default tabs open in the background and do nothing whatsoever with the application, then anywhere from a few seconds to a few minutes later, memory usage of the process will spike dramatically, going from tens to hundreds of MB resident memory (and even more virtual memory) and slowing my system to a crawl.

Strangely, this doesn't happen when I start the build with my normal (years-old) profile, although I do see it with a profile that is several months old but I haven't used very much.

Bug 404628 and bug 404638 have both been filed today citing similar problems, although they're both on Windows (as is bug 404605, which could be related).
unable to reproduce using a tree pulled right now, --enable-optimize --disable-debug, using either a new profile or a somewhat-old one. Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9b2pre) Gecko/2007112021 Minefield/3.0b2pre
Fetching malware/safebrowing updates would fit the access pattern described, though we shouldn't be leaking the buffers or anything -- johnath?
Dave: might his be malware-db-update-channel stuff?
dcamp would know better than I what the fetch protocol looks like, but the reports in bug 402469 and my own testing seem to confirm that the malware DB update channel has sped up a great deal recently (day of the beta), so that what used to take far too long (hundreds of hours for a complete malware+phishing DB pull) now happens very quickly.  And the behaviour matches - a couple minutes after starting a new profile, one would expect to see the pull start.  As shaver says, even if it is that, hundreds of meg seems high...

This theory can be tested: if Myk still sees the described behaviour on new profiles, he should see that the spike happens just as $profile/urlclassifier3.sqlite starts growing, and ideally it should stop/revert at some later point when the urlclassifier db has steadied off at its full size (around 15-20M iirc).
Yes, after some minutes the download starts and finishes. That's when memory usage goes up uncontrollably:

elapsed time, vm size, vm rss
15:29:38:062695278,187868,44924
15:34:02:155084243,250732,108172
15:35:04:507754415,587740,402128

Only killing Minefield helps. Linux 2007112004 nightly, new profile. Happened to my old one too.
Based on the bug reports in comment #1, and seeing it on Windows myself, I think this is cross-platform.

On Win32, 
* there is plenty of both CPU & memory usage
* watching the files in the profile
 * urlclassifier3.sqlite steadily grows in size
 * the size of urlclassifier3.sqlite-journal fluctuates as data is written to the above file

So that looks like confirmation of comment #3.
OS: Linux → All
Hardware: PC → All
Flags: blocking-firefox3?
Component: General → Phishing Protection
QA Contact: general → phishing.protection
I don't think it happens just for new/fresh profiles because it is triggered with my regular profile or in -safe-mode (bug #404688).
On Kubuntu Gutsy with 2007112004, 512 MB RAM, 1 GB Swap and a new profile I've had to kill Minefield twice because it I/O-blocked the whole system with swapping like mad. The urlclassifier3.sqlite then had reached it's final(?) size of ~16.8 MB. Now it seems to behave.
(In reply to comment #10)
> I don't think it happens just for new/fresh profiles because it is triggered
> with my regular profile or in -safe-mode (bug #404688).

Presumably Myk used that summary because he wanted to emphasize that it wasn't a problem with extensions or other modifications. A new profile is also a convenient way to force Firefox to retrieve the safe-browsing database. 

As a temporary workaround, disable the suspected forgery detection on the Security tab of the options/preferences. That's enough to prevent the problem with a new profile (should it though ? isn't there still a d/l for the attack site option ?) 
The effect is temporary, so it's not a hard B2 blocker, but definitely blocks final release (and would require, like, a 30pt relnote if we don't get it for B2)
Flags: blocking-firefox3? → blocking-firefox3+
Priority: -- → P2
On Bug 404672 I've reported 3 Samples taken with Mac OS X Activity Monitor, if they could be of any help... :)
If you're testing this on a posix system use:

ulimit -m $((256*1024))
ulimit -v $((256*1024)) 

in a bash shell before launching firefox so you don't run the risk of virtually halting your system due to thrashing. Sets max RSS and VM to 256M (or you can go higher if you have more physical RAM, 512M is good). When doing this I've seen that it will keep trying to open a new website (the animation will continue) but ffx seems to be hung when it's experience the sudden increase in memory usage.  
Would it be possible for Firefox to download and copy a prebuilt sqlite db file into the profile if none exists yet or if it is sufficiently out of date? This should use less memory, CPU and potentially less bandwidth. Why filling a few MB worth of data into the sqlite db uses so much RAM would still have to be investigated.
Tony: can you find out when/what changes have been made to the SafeBrowsing protocol?

(see bug 402469 comment 11 for a report of the difference in fetch speed)
I could easily confirm the issue on the trunk with an existing profile, where I
removed the urlclassifier3.sqlite file. A few minutes after startup Firefox
startup to use over 500 MB of memory and climbing. Unchecking "Tell me if the
site I'm visiting is a suspected forgery" prevents this behavior.

Last time I tried to start up with a brand new profile, was somewhere last
week. Something must have changed since then.
Can we know in release builds whether a cycle-collector fault has happened?
This bug seemed to start with the 11/19 build or somewhere around there.
There is no way to know if a fault has happened. We will fix that for B2 by warning in the nsIConsoleService when we're faulting.
So we're using a transaction per update chunk added to the database.  I don't really know why yet, but these transactions are leaking or something.

The attached patch uses one transaction per entire update, which is still causing a bit of a spike, but more like ten megs than hundreds.
Attachment #289707 - Flags: review?(tony)
(In reply to comment #25)
> This bug seemed to start with the 11/19 build or somewhere around there.
> 

No. 

This bug is also in Firefox 3.0 beta 1, which tens of thousand people are
testing just now with fresh profiles as recommended. Freezing their systems
without warning.

Problem is (maybe) that nobody has tested this urlclassifier3.sqlite thing with
bigger urlclassifier data chunks, which Google now delivers?

https://bugzilla.mozilla.org/show_bug.cgi?id=402469

Mozilla/5.0 (Windows; U; Windows NT 5.1; fi; rv:1.9b1) Gecko/2007110904
Firefox/3.0b1 - Build ID: 2007110904 
In my case (using FF 3.0b1, Mac OS X 10.4.11 PPC G4, 512MB RAM), while building
the
urlclassifier3.sqlite db, the whole system was taken over by thrashing ... I
saw the VM as reported by top and System Profiler go up to 1GB.  Looked
at it for 1/2 hour, while the runtime system was brought to its knees.  I think
my db store (urlclassifier3.sqlite) was 13 to 14MB at one point.

I was trying to look at the active thread (using Apple's Thread Viewer app),
and it seemed to have a lot of nested  sqlite3_mprintf()  calls, is that
normal?

One question I have is, when the system is occupied in this manner by the site
classifier db maintenance -- why is it that when I close the browser window,
and try to Quit Firefox from its normal menu command, it doesn't quit and
carries on with its sqlite thrashing?  Perhaps that is an indication that
something isn't working as planned.  Thanks; Larry.
(In reply to comment #27)
> So we're using a transaction per update chunk added to the database.  I don't
> really know why yet, but these transactions are leaking or something.
> 
> The attached patch uses one transaction per entire update, which is still
> causing a bit of a spike, but more like ten megs than hundreds.

So this doesn't fix the underlying issue.  The number of transactions shouldn't impact memory usage, although it may cause excess disk IO.

Let's try to find what is actually leaking.
(In reply to comment #25)
> This bug seemed to start with the 11/19 build or somewhere around there.
> 

I noticed high CPU on 11/19 Hourly.
http://forums.mozillazine.org/viewtopic.php?p=3151360#3151360
(In reply to comment #28)
> (In reply to comment #25)
> > This bug seemed to start with the 11/19 build or somewhere around there.
> > 
> 
> No. 
> 
> This bug is also in Firefox 3.0 beta 1, which tens of thousand people are
> testing just now with fresh profiles as recommended. Freezing their systems
> without warning.
> 
> Problem is (maybe) that nobody has tested this urlclassifier3.sqlite thing with
> bigger urlclassifier data chunks, which Google now delivers?
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=402469
> 
> Mozilla/5.0 (Windows; U; Windows NT 5.1; fi; rv:1.9b1) Gecko/2007110904
> Firefox/3.0b1 - Build ID: 2007110904 
> 

Well, it started happening on the 19th for me. And that was with an old profile.
cf pointed out that there hasn't been an update to this bug for a while, so I'll add a couple points about current status:

- Asa has summarized the situation in a blog post here: http://weblogs.mozillazine.org/asa/archives/2007/11/firefox_3_beta_1.html

- The problem has been mitigated for the moment by having the safebrowsing servers return backoff errors to beta clients to slow down their updates while we work with the google safebrowsing team to figure out what long term fixes are necessary.  For more about the safebrowsing protocol: http://code.google.com/p/google-safe-browsing/wiki/Protocolv2Spec

- We will still be able to test and debug the problem, since trunk builds can be identified with a different client ID than the (throttled for the moment) beta builds.
It seems that currently all updates for the trunk are being blocked, not just beta clients (Google doesn't discriminate yet). A new profile that I created this morning, still hasn't received anything.
Assignee: nobody → dcamp
So we're writing to the database faster than the mozStorage async IO thread can keep up.  It ends up having to buffer a huge amount of writes, causing the memory spike.

I wrote a quick hack to throttle the async IO thread when buffering too much data.  Between that hack and the transaction fix, the spike is drastically reduced and the import is a whole lot faster.

Unfortunately the specific hack isn't a good idea, because it would end up blocking any mozStorage consumers on the main thread.  We probably need to do some throttling in the url-classifier.

There are two ways we can go about this:  either we can just choose a max amount of data we'll accept per update (and try to choose an amount that won't overwhelm the io thread), or we can expose information on the io thread's queue size and back off when we're starting to overwhelm it.  Or maybe some combination of the two.
Attachment #289707 - Flags: review?(tony) → review+
Blocks: 403354
Blocks: 404619
Depends on: 406657
Between the transaction fix here and the caching fix in 403354, we have a much smaller impact on the async IO thread.

I've filed bug 406657 for getting information from the storage service to throttle updates.
Checked in the transaction patch:

Checking in src/nsUrlClassifierDBService.cpp;
/cvsroot/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp,v  <--  nsUrlClassifierDBService.cpp
new revision: 1.39; previous revision: 1.38
done
i have the same problem with Firefox 2.0.0.9 @ Windows XP, memory peaking to 500+mb after using google maps a few minutes. When closing the google maps tab memory usage drops to 180mb, however this is much more than i'm used to (56mb is common for my firefox installation). Seems like it isnt cleaning things right. I also get the idea that after each update (we had 4 or 5 the last 2 weeks) it's getting worse.
J.A. Oord, that's a different issue, this was about trunk only.
Please file a new bug about the memory issue you're seeing.
(In reply to comment #41)
> i have the same problem with Firefox 2.0.0.9 @ Windows XP, memory peaking to
> 500+mb after using google maps a few minutes. When closing the google maps tab
> memory usage drops to 180mb, however this is much more than i'm used to (56mb
> is common for my firefox installation). Seems like it isnt cleaning things
> right. I also get the idea that after each update (we had 4 or 5 the last 2
> weeks) it's getting worse.
> 

The recent updates are very unlikely to have made this worse and as noted earlier this is unlikely your issue. If you have easy steps to reproduce please file a new bug.
Now that async IO has been disabled, I think we can consider this bug fixed.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1a1pre) Gecko/2008060302 Minefield/3.1a1pre

Latest build doesn't seem to have this problem. Memory usage is at a normal level, and doesn't spike with the created profile.

Can anyone else confirm?
Status: RESOLVED → VERIFIED
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.