Closed Bug 634642 Opened 14 years ago Closed 14 years ago

Slow gradual memory increase in Fennec

Categories

(Firefox for Android Graveyard :: General, defect)

Product:

Component:

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

(fennec-)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

fennec

-

---

People

(Reporter: mbrubeck, Assigned: azakai)

References

Details

(Keywords: memory-footprint, mobile, regression)

Attachments

(8 files)

massif data 14 years ago Alon Zakai (:azakai) 518.88 KB, text/plain		Details
massif data 2: autoloading 14 years ago Alon Zakai (:azakai) 491.48 KB, text/plain		Details
massif 3: google&yahoo, parent process 14 years ago Alon Zakai (:azakai) 725.96 KB, text/plain		Details
massif 4: google&yahoo, child process 14 years ago Alon Zakai (:azakai) 372.44 KB, text/plain		Details
7-hour massif: parent 14 years ago Alon Zakai (:azakai) 489.51 KB, text/plain		Details
7-hour massif: child 14 years ago Alon Zakai (:azakai) 368.17 KB, text/plain		Details
test page 14 years ago Alon Zakai (:azakai) 557 bytes, text/html		Details
massdiff data 14 years ago Alon Zakai (:azakai) 116.24 KB, text/plain		Details

Matt Brubeck (:mbrubeck)

Reporter

Description

•

14 years ago

Memory measurements are based on the "Pss" total from "dumpsys meminfo org.mozilla.fennec" running on a Galaxy Tab with Android 2.2. On startup, Fennec trunk Android builds use around 45MB of RAM. As I navigate in a single tab (without opening any new tabs), memory use increases by about 1MB every time I navigate to a new page.

Matt Brubeck (:mbrubeck)

Reporter

Updated

•

14 years ago

tracking-fennec: --- → ?

Aaron Train [:aaronmt]

Comment 1

•

14 years ago

In comparison to the Galaxy Tab, on the Nexus One, I start off at 47MB, navigating to a single page in the single open tab, memory usage shot to 56MB (based off Pss total)

Alon Zakai (:azakai)

Assignee

Comment 2

•

14 years ago

Are people seeing this only on Android? Not on desktop?

Matt Brubeck (:mbrubeck)

Reporter

Comment 3

•

14 years ago

I can reproduce this on desktop Linux too. It looks like the RSS for the chrome process grows continually without bound, while the content process remains mostly flat long term (going up and down depending on which page is loaded).

Alon Zakai (:azakai)

Assignee

Comment 4

•

14 years ago

On Linux desktop, I see the fennec process start at 54. Then, with a combination of reloads and clicking on links, it rises to 85. However, it then drops to 74 eventually. And the process repeats. So it doesn't increase without limit in my testing.

Matt Brubeck (:mbrubeck)

Reporter

Comment 5

•

14 years ago

(In reply to comment #4) > On Linux desktop, I see the fennec process start at 54. Then, with a > combination of reloads and clicking on links, it rises to 85. > > However, it then drops to 74 eventually. And the process repeats. > > So it doesn't increase without limit in my testing. Hmm, maybe I just didn't test long enough - I stopped when the memory was around 82 MB. In a Linux desktop trunk nightly from 2011-01-16, I see RSS for the chrome process grow to about 67 MB, then drop back to about 60 MB as I continued browsing. Now I'm testing a Linux desktop trunk nightly from 2011-02-16. RSS for chrome reached 99 MB without stopping. Then I had to leave for a bit, and while the browser was idle, it dropped back down to 76 MB.

Matt Brubeck (:mbrubeck)

Reporter

Comment 6

•

14 years ago

(In reply to comment #5) > Now I'm testing a Linux desktop trunk nightly from 2011-02-16. RSS for chrome > reached 99 MB without stopping. Then I had to leave for a bit, and while the > browser was idle, it dropped back down to 76 MB. In another test with the 2011-02-16 nightly, the memory reached 100 MB and then quickly dropped back down to 79 MB while I was still loading pages. So it looks like the overall pattern has not changed, but the baseline or threshold has.

OS: Android → All

Hardware: ARM → All

Summary: Fennec memory usage increases over time and never goes back down → Fennec chrome process memory can grow up to ~100 MB while loading pages.

Matt Brubeck (:mbrubeck)

Reporter

Updated

•

14 years ago

Keywords: regression, regressionwindow-wanted

Alon Zakai (:azakai)

Assignee

Comment 7

•

14 years ago

Attached file massif data — Details

Massif data, running Fennec for 15 minutes minutes, 25 page loads, 25 page reloads (google search pages). Nothing seems immediately odd, but maybe I missed something.

Alon Zakai (:azakai)

Assignee

Comment 8

•

14 years ago

Attached file massif data 2: autoloading — Details

30 minutes of Fennec loading pages, t1 and t2, each of which waits 2 seconds then loads the other (2 seconds, since that is how long it takes to load a page in valgrind). So, 900 page loads. Memory doesn't appear to increase uncontrollably. It might be that more complicated pages would be different though. The pages used: <html> <body> <script> setTimeout(function() { window.location = 't2.html'; }, 2000); </script> Page 1! </body> </html> etc.

Alon Zakai (:azakai)

Assignee

Updated

•

14 years ago

Assignee: nobody → azakai

Alon Zakai (:azakai)

Assignee

Comment 9

•

14 years ago

I had hoped to get some more profiling data, but massif crashes when trying to load most websites. In any case, I think the data above should be enough, and I'm out of ideas for what else to test. That our memory use rises by 30MB or so before stabilizing is not great, it's probably a combination of fragmentation and arena allocations, but it isn't too bad.

Assignee: azakai → nobody

Alon Zakai (:azakai)

Assignee

Comment 10

•

14 years ago

Attached file massif 3: google&yahoo, parent process — Details

Ok, got massif to work again (I forgot smc-check before). Here is 45 minutes of automatically loading google, yahoo alternating in an iframe. There might be a slight gradual increase, but it is very hard to say.

Alon Zakai (:azakai)

Assignee

Comment 11

•

14 years ago

Attached file massif 4: google&yahoo, child process — Details

Child process for previous comment. Seems stable (perhaps a tiny overall increase... not sure), except for a huge spike of 8MB or so (snapshot 25). Looks like that is due to 26.04% (7,340,032B) 0x5BC0B7D: gfxImageSurface::gfxImageSurface(gfxIntSize const&, gfxASurface::gfxImageFormat) which doesn't appear in any other snapshot actually. I'm not sure why it doesn't appear at all in any of the others, or why it appears at such a large size in this one case. wesj had a theory that perhaps one of the loads of yahoo.com loaded a different image (they do change in different loads sometimes), that was exceptional in some way.

Benjamin Stover (:stechz)

Comment 12

•

14 years ago

Isn't jemalloc supposed to handle fragmentation? Or is the fragmentation outside of jemalloc? I could see a win here by properly managing the memory we allocate for layers (assuming it is allocated outside of jemalloc).

Brad Lassey [:blassey] (use needinfo?)

Comment 13

•

14 years ago

jemalloc allocates memory in a way that reduces fragmentation as compared to other allocators. All of our memory is (or at least very much should be) allocated through jemalloc including these layers.

Benjamin Stover (:stechz)

Comment 14

•

14 years ago

jemalloc doesn't allocate buffers for layers. That's all ashm.

Alon Zakai (:azakai)

Assignee

Comment 15

•

14 years ago

Attached file 7-hour massif: parent — Details

7 hours of massif, loading pages A, B in alteration, each with an iframe containing mozilla's mobile site or twitter's mobile site. 7 hours, with about 7 seconds to load a page (as this is in valgrind), gives about 3,600 page loads, 100% CPU all the while. This file is the parent process.

Alon Zakai (:azakai)

Assignee

Comment 16

•

14 years ago

Attached file 7-hour massif: child — Details

Child data.

Alon Zakai (:azakai)

Assignee

Comment 17

•

14 years ago

Looks like there is a slow, gradual increase. Perhaps 2MB over the whole 7 hours. I will work on the logs to figure out what is changing here.

Assignee: nobody → azakai

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

tracking-fennec: ? → 2.0-

Mark Finkle (:mfinkle) (use needinfo?)

Comment 18

•

14 years ago

If we are only rowing 2MB over 7 hours, I think we shouldn't block on this

Alon Zakai (:azakai)

Assignee

Comment 19

•

14 years ago

Attached file test page — Details

Test page used in the data I will post next.

Alon Zakai (:azakai)

Assignee

Comment 20

•

14 years ago

Attached file massdiff data — Details

I wrote a script to diff massif snapshots, http://hg.mozilla.org/users/azakai_mozilla.com/patches/file/tip/massdiff.py Ran the script on massif data from the test page in the previous comment, over 1 hour, which included 528 pageloads (4147 seconds, or 7.85 seconds per pageload). The massdiff output here is of snapshots 33 and 66, out of 95 (just to avoid anything special near start or near shutdown). It looks like some significant things are: JS: * 1,048,576 from AllocateArena in jsgc.cpp * 151,248 from js_malloc * 27,072 from js_NativeGetInline->XPCConvert->moz_malloc HTTP: * 76,000 from HttpChannelParent::RecvAsyncOpen->nsStringBuffer::Realloc * Later some more in nsHttpChannel::OnStartRequest etc. Definitely looks like we have a slow gradual increase in memory use. I estimate 8K per page load. (This doesn't take into account memory fragmentation, which would make things worse.) Of course some increase might be expected, due to session history storing URLs and such.

Alon Zakai (:azakai)

Assignee

Updated

•

14 years ago

Summary: Fennec chrome process memory can grow up to ~100 MB while loading pages. → Slow gradual memory increase in Fennec

Alon Zakai (:azakai)

Assignee

Comment 21

•

14 years ago

Math error in last bit - the estimate should be 10.8K per page load.

Alon Zakai (:azakai)

Assignee

Comment 22

•

14 years ago

njn, if you have time I would appreciate your thoughts on both the massdiff script (diff for massif snapshots) linked to in comment #20, and the data posted in that comment (specifically the JS memory increase).

Alon Zakai (:azakai)

Assignee

Comment 23

•

14 years ago

Hmm, the 1MB JS increase is not gradual at all. It appears at that size in snapshot 50 (almost exactly in the middle of the run), and stays at exactly that size. Here is the full stack. Looks like it's coming from an async messagemanager message. 04.95% (1,048,576B) 0x5DC7A05: js::gc::Arena<JSObject_Slots8>* AllocateArena<JSObject_Slots8>(JSContext*, unsigned int) (jsgc.cpp:479) 04.95% (1,048,576B) 0x5DCBEE3: bool RefillTypedFreeList<JSObject_Slots8>(JSContext*, unsigned int) (jsgc.cpp:1160) 04.95% (1,048,576B) 0x5DBFBE6: RefillFinalizableFreeList(JSContext*, unsigned int) (jsgc.cpp:1185) 04.95% (1,048,576B) 0x5D4B5BE: JSObject* NewFinalizableGCThing<JSObject>(JSContext*, unsigned int) (jsgcinlines.h:127) 04.95% (1,048,576B) 0x5D48BDD: js_NewGCObject(JSContext*, js::gc::FinalizeKind) (jsgcinlines.h:139) 04.95% (1,048,576B) 0x5D5D5EA: JSObject* js::detail::NewObject<false, false>(JSContext*, js::Class*, JSObject*, JSObject*, js::gc::FinalizeKind) (jsobjinlines.h:1081) 04.95% (1,048,576B) 0x5D5D43D: JSObject* js::NewArray<false>(JSContext*, unsigned int, JSObject*) (jsarray.cpp:3061) 04.95% (1,048,576B) 0x5D5C61D: js::NewDenseEmptyArray(JSContext*, JSObject*) (jsarray.cpp:3076) 04.95% (1,048,576B) 0x5E1000F: js_BeginJSONParse(JSContext*, js::Value*, bool) (json.cpp:688) 04.95% (1,048,576B) 0x5D41D96: JS_BeginJSONParse (jsapi.cpp:5641) 04.95% (1,048,576B) 0x4ADC8DD: nsFrameMessageManager::ReceiveMessage(nsISupports*, nsAString_internal const&, int, nsAString_internal const&, JSObject*, InfallibleTArray<nsString>*, JSContext*) (nsFrameMessageManager.cpp:387) 04.95% (1,048,576B) 0x4ADD1ED: nsFrameMessageManager::ReceiveMessage(nsISupports*, nsAString_internal const&, int, nsAString_internal const&, JSObject*, InfallibleTArray<nsString>*, JSContext*) (nsFrameMessageManager.cpp:477) 04.95% (1,048,576B) 0x58C39C6: mozilla::dom::TabParent::ReceiveMessage(nsString const&, int, nsString const&, InfallibleTArray<nsString>*) (TabParent.cpp:607) 04.95% (1,048,576B) 0x58C2CC5: mozilla::dom::TabParent::RecvAsyncMessage(nsString const&, nsString const&) (TabParent.cpp:309) 04.95% (1,048,576B) 0x593B9B8: mozilla::dom::PBrowserParent::OnMessageReceived(IPC::Message const&) (PBrowserParent.cpp:569) 04.95% (1,048,576B) 0x5948A9E: mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) (PContentParent.cpp:629) 04.95% (1,048,576B) 0x58F2295: mozilla::ipc::AsyncChannel::OnDispatchMessage(IPC::Message const&) (AsyncChannel.cpp:262) 04.95% (1,048,576B) 0x58FA8EC: mozilla::ipc::RPCChannel::OnMaybeDequeueOne() (RPCChannel.cpp:433) 04.95% (1,048,576B) 0x5900369: void DispatchToMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)()>(mozilla::ipc::RPCChannel*, bool (mozilla::ipc::RPCChannel::*)(), Tuple0 const&) (tuple.h:383) 04.95% (1,048,576B) 0x59002F5: RunnableMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run() (task.h:307) 04.95% (1,048,576B) 0x58FC0BE: mozilla::ipc::RPCChannel::RefCountedTask::Run()

Nicholas Nethercote [inactive]

Comment 24

•

14 years ago

(In reply to comment #23) > Hmm, the 1MB JS increase is not gradual at all. It appears at that size in > snapshot 50 (almost exactly in the middle of the run), and stays at exactly > that size. > > Here is the full stack. Looks like it's coming from an async messagemanager > message. > > 04.95% (1,048,576B) 0x5DC7A05: js::gc::Arena<JSObject_Slots8>* > AllocateArena<JSObject_Slots8>(JSContext*, unsigned int) (jsgc.cpp:479) Welcome to the world of interpreting Massif's output when it's run on a big complex program like Firefox :/ Firefox has lots of custom allocators that are built on top of malloc. This is one of them. What happens is the Firefox makes lots of small requests from that custom allocator, and the custom allocator occasionally allocates a large block (a.k.a. a superblock) and then hands it out in small pieces on subsequent requests. That's why you see a big jump. Unfortunately, this behaviour obscures what's happening, because you only see one big allocation rather than lots of little allocations. And the little allocations may be coming from more than one place. You should also try massif-visualizer: https://projects.kde.org/projects/extragear/sdk/massif-visualizer. It's a GUI for Massif files. It's especially useful if you're trying to analyze the graph, because it's much more precise than the ASCII art version. You might also like to try Massif's --alloc-fn option. You can use it to ignore malloc wrappers (eg. huge_malloc), which can make the output easier to read. And then there's Massif's --pages-as-heap=yes option, which lets you profile at the mmap level instead of the heap level. It's more comprehensive (covers all allocated memory) but is even harder to interpret. As for the massdiff.py script... I don't know Python, but it looks pretty reasonable. Showing the old, new and diff values is interesting, I hadn't though of doing that. I'm surprised you chose to parse the output of ms_print, rather than Massif's output files. I probably would do it by writing a script that diff's Massif's output files and produces a diff that can be handled normally by ms_print. That's how I did things with Cachegrind and cg_diff.

Nicholas Nethercote [inactive]

Comment 25

•

14 years ago

BTW, bug 632234 is a tracking bug for this sort of thing, it's possible that one of the pending fixes for another bug will help this bug.

Blocks: mlk2.0

Alon Zakai (:azakai)

Assignee

Comment 26

•

14 years ago

Thanks njn, that is very helpful.

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Blocks: mlk-fx5+

Alon Zakai (:azakai)

Assignee

Updated

•

14 years ago

Depends on: 641663

Alon Zakai (:azakai)

Assignee

Comment 27

•

14 years ago

I did a lot more profiling here with massif/massdiff, and there don't appear to be any remaining problems. So, closing this. We do, however, gradually increase in size over time, due to JS garbage piling up, caches filling, etc. But these things are not leaks, and the increase eventually peaks. We might tweak the prefs for such things, though (like bug 641663 did).

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nicholas Nethercote [inactive]

Updated

•

13 years ago

Blocks: mlk-fx4-beta

Nicholas Nethercote [inactive]

Updated

•

13 years ago

No longer blocks: mlk2.0

Nicholas Nethercote [inactive]

Updated

•

13 years ago

No longer blocks: mlk-fx5+

Kevin Brosnan [Ex-Mozilla]

Updated

•

13 years ago

Keywords: regressionwindow-wanted

You need to log in before you can comment on or make changes to this bug.