Closed Bug 516458 Opened 15 years ago Closed 15 years ago

Excessive memory consumption/OOM on http://bbarker.co.uk

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: chris.bainbridge, Assigned: dmandelin)

References

()

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.14) Gecko/2009090216 Ubuntu/9.04 (jaunty) Firefox/3.0.14
Build Identifier: firefox-3.7a1pre.en-US.linux-i686.tar.bz2

I'm filing this as a Javascript bug as it seems likely, but the fault may lie elsewhere.

Viewing this URL on a Thinkpad T42p with 1.5GB memory running Ubuntu Jaunty: http://bbarker.co.uk/animate/map.php?file=Edinburgh_to_Holy_Island_cycle.gpx

results in resident memory rising until desktop becomes unresponsive at around 1GB resident, usage is still increasing (highest I've seen is 1.2GB resident then top becomes unresponsive). At this point either firefox is killed, or xorg hangs. 

Confirmed with firefox-3.0.14 and firefox-3.7a1pre nightly.

I already reported this bug in Ubuntu: https://bugs.launchpad.net/ubuntu/+source/firefox-3.0/+bug/429476

Reproducible: Always

Steps to Reproduce:
1.go to http://bbarker.co.uk/animate/map.php?file=Edinburgh_to_Holy_Island_cycle.gpx
2. that's it
3.
Actual Results:  
resident memory increases, eventually firefox is killed or desktop hangs

Expected Results:  
Chromium uses about 80MB resident memory for the same page.
Firefox 3.5.3 with 32-bit Windows XP SP 3: Memory consumption rises to 200 MB, then goes down to 20-30 MB, but virtual memory size is higher than 700 MB.
Confirmed. For some reason the GC doesn't kick in and our RSS size grows out of control. We have a patch for much better GC scheduling. Either we land that, or we have to investigate how to better address this problem here with the current heuristics. I saw a JSON failure instead of a crash, so my suspicion would be a ton of strings being created.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: crash on viewing page → Excessive memory consumption/OOM on http://bbarker.co.uk
Depends on: 506125
I tried it on MacOS 10.5 with a debug build and it worked (mostly). When I first loaded the page, CPU went to 100% and FF became unresponsive for at least 10 seconds. During this time memory usage (Real Memory in Activity Monitor) went up to about 1GB but then it went back down to reasonable levels. I clicked "Continue" to the slow script dialog. Eventually the map with the track did come up, and then it worked normally.

I found that Safari had somewhat similar behavior: slow startup, but then worked normally. I tried Chrome on Windows and startup was pretty quick, and then it worked normally.

So what actually should be the bug here? Is it that memory usage goes up to ~1GB during the startup phase?
Do we GC during that 1GB phase? If we don't, its a bug. If we do, I am not sure its a bug.
In a run of this page from start until the animated map appears, this is the log of the times at which a GC was done:

2:214:GC t=0.175912
2:370:GC t=0.925824
2:595:GC t=1.469851
2:645:GC t=1.532591
2:2284:GC t=23.293265
2:10953:GC t=85.411238
2:11016:GC t=87.652958
2:11034:GC t=87.910751
2:11038:GC t=88.201925
2:11039:GC t=88.234485
2:11040:GC t=88.316330
2:11041:GC t=88.328484

As you can see, we go 20 seconds without any GC, and then 60 more after that until the next GC. Here are the last few calls to IsGCThresholdReached before the 23s GC:

IsGCThresholdReached t=18.488329 mb=7437487/33554432, gb=2427140/4390912
IsGCThresholdReached t=18.640668 mb=7439791/33554432, gb=2427140/4390912
IsGCThresholdReached t=18.869077 mb=7443295/33554432, gb=2427140/4390912
IsGCThresholdReached t=19.843841 mb=7446751/33554432, gb=2431236/4390912
IsGCThresholdReached t=19.930377 mb=7447423/33554432, gb=2431236/4390912
IsGCThresholdReached t=21.518867 mb=7450891/33554432, gb=2431236/4390912

'mb' is the mallocBytes (cur/max) and 'gb' is the gcBytes (cur/max). The last few calls just before the 85s GC are:

IsGCThresholdReached t=84.934737 mb=7226436/33554432, gb=7086000/34246720
IsGCThresholdReached t=84.935768 mb=7230504/33554432, gb=7090096/34246720
IsGCThresholdReached t=84.936678 mb=7234144/33554432, gb=7090096/34246720
IsGCThresholdReached t=84.937600 mb=7238212/33554432, gb=7094192/34246720
IsGCThresholdReached t=84.938453 mb=7241852/33554432, gb=7094192/34246720
IsGCThresholdReached t=84.942108 mb=7245272/33554432, gb=7098288/34246720

Whatever is causing the memory usage, it doesn't show up in the metrics used by IsGCThresholdReached, so we are not GC'ing.
Here is the key part of a trace-malloc snapshot with 1.4GB allocated. Almost all of the memory is created through nsXMLHttpRequest::ConvertBodyToText, which I believe is called to access the property |requestText|.

What I don't understand is why the nsAStrings returned by ConvertBodyToText are not freed in XPCWrappedNative::CallMethod after the call. Who would know about this?

1411376623 malloc
  1407026664 my_malloc_logger (pldhash.c:)
    1394617050 malloc_zone_malloc (/usr/lib/libSystem.B.dylib)
      1393036801 malloc (/usr/lib/libSystem.B.dylib)
        1376888066 nsAString_internal::MutatePrep(unsigned int, unsigned short**, unsigned int*) (pldhash.c:)
          1376865590 nsAString_internal::ReplacePrep(unsigned int, unsigned int, unsigned int) (pldhash.c:)
            1376853800 nsAString_internal::Assign(unsigned short const*, unsigned int) (pldhash.c:)
              1376771840 nsXMLHttpRequest::ConvertBodyToText(nsAString_internal&) (nsUnicharUtils.cpp:)
                1376771840 NS_InvokeByIndex_P (pldhash.c:)
                  1376771840 XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/xpconnect/src/dom_quickst
ubs.h:)
                    1376771840 XPC_WN_GetterSetter(JSContext*, JSObject*, unsigned int, long*, long*) (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/xpconnect/src/dom_quickstu
bs.h:)
                      1376771840 js_Invoke (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/jsautokw.h:)
                        1376771840 js_InternalInvoke (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/jsautokw.h:)
                          1376771840 js_InternalGetOrSet (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/jsautokw.h:)
                            1376771840 js_NativeGet (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/jsautokw.h:)
                              1375691234 js_Interpret (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/jsautokw.h:)
                              1080606 js_GetPropertyHelper (/Users/dmandelin/sources/tracemonkey/obj-tm/js/src/jsautokw.h:)
Nasty. Sounds like bug 506125 would fix this very nicely.
So do we know if the data is leaked in the sense that we loose track of it without freeing it, or is it leaked in the sense that we are keeping track of it but think it's being used and so we don't free it. Or finally, are things leaking in the sense that if we only ran a GC it would get freed, but for some reason a GC isn't running.

cc'ing jst who is the resident expert on XPConnect string stuff :)
From my observation memory use drops back down. The problem is that we don't count this in our current cheesy memory pressure heuristics so we don't know its time to GC (which would free all this stuff). Hence my pointer to the actual working-set based heuristics bug.
(In reply to comment #8)
> So do we know if the data is leaked in the sense that we loose track of it
> without freeing it, or is it leaked in the sense that we are keeping track of
> it but think it's being used and so we don't free it. Or finally, are things
> leaking in the sense that if we only ran a GC it would get freed, but for some
> reason a GC isn't running.

I don't know the full answer yet. The memory does get freed eventually, so presumably we don't lose track of it. But I haven't yet figured out exactly what event causes the memory to get freed.

In particular, what I don't know is whether these DOM strings are (a) sent back to JS as wrapped DOM strings, or (b) converted to JS strings. If (a), then running js_GC would do the trick, and the problem is that the JS GC doesn't know it could free a lot of memory if it ran. If (b), then something weird is going on. So I guess it's probably (a).
The DOM strings are probably wrapped in JSStrings and I don't think the JS engine gets notified about that memory (JS_NewExternalString just sets the length using initFlat). The wrapping would happen in XPCStringConvert::ReadableToJSVal.
Attached patch Patch (obsolete) — Splinter Review
(In reply to comment #11)
> The DOM strings are probably wrapped in JSStrings and I don't think the JS
> engine gets notified about that memory (JS_NewExternalString just sets the
> length using initFlat). The wrapping would happen in
> XPCStringConvert::ReadableToJSVal.

Thanks! That's exactly what I needed to know. This patch accounts for the memory in the 'gcMallocBytes' statistic. It prevents the memory spike on this test--it doesn't go over 125MB or so on my machine with this site. The only thing I'm not sure about is if I need to take a lock or increment a thread-local counter instead. I was hoping Igor could educate me on that. :-)
Assignee: general → dmandelin
Status: NEW → ASSIGNED
Attachment #401579 - Flags: review?(igor)
Comment on attachment 401579 [details] [diff] [review]
Patch

>diff -r 7e985a173416 -r 892ea3ad67be js/src/jsapi.cpp
>--- a/js/src/jsapi.cpp	Fri Sep 18 14:27:07 2009 -0700
>+++ b/js/src/jsapi.cpp	Fri Sep 18 17:33:53 2009 -0700
>@@ -2626,6 +2626,7 @@
>     if (!str)
>         return NULL;
>     str->initFlat(chars, length);
>+    cx->runtime->gcMallocBytes += length * sizeof(jschar);
>     return str;

This is not thread-safe. Use cx->updateMallocCounter(size) here. r+ with that fixed. Also it would be nice to have a a diff with -p -U8.
Attachment #401579 - Flags: review?(igor) → review-
gcMallocBytes is just an approximate counter. Precision is not required here so collisions and loss of increments can be tolerated.
(In reply to comment #14)
> gcMallocBytes is just an approximate counter. Precision is not required here so
> collisions and loss of increments can be tolerated.

updateMallocCounter is the function to use here. Optimizing that function is for another bug.
Updated per review & pushed to TM as 9131d01c2d2a.
Attachment #401579 - Attachment is obsolete: true
Whiteboard: fixed-in-tracemonkey
That should be (length + 1) * sizeof(jschar), right? rs=me on followup.

/be
(In reply to comment #17)
> That should be (length + 1) * sizeof(jschar), right? rs=me on followup.

I intentionally left the 1 out because it's really just an approximation anyway so I figured I'd save a cycle (which I admit is equally pointless if not more). I suppose it does hurt accuracy notably if someone allocates a million short strings, so pushed to TM as ee059a50f204.
http://hg.mozilla.org/mozilla-central/rev/ee059a50f204
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: