Closed Bug 634444 Opened 13 years ago Closed 12 years ago

Long lines with many warnings from javascript.options.strict cause memory spikes when a console or Firebug is in use

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla16
Tracking Status
blocking2.0 --- .x+

People

(Reporter: whimboo, Assigned: n.nethercote)

References

()

Details

(Keywords: hang, regression, Whiteboard: [mlk][has patch][mozmill-endurance][MemShrink:P3] [advisory-tracking+])

Attachments

(2 files, 2 obsolete files)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b12pre) Gecko/20110209 Firefox/4.0b12pre ID:20110209030359

We run into a huge memory leak which can also freeze the system, just by performing the following steps:

1. Create a fresh profile and stick the attached users.js file into it
2. Install the latest alpha version of Firebug 1.7 (http://getfirebug.com/releases/firebug/1.7X/firebug-1.7X.0a10.xpi)
3. Open the given URL

Within seconds the memory will be filled-up completely and the system can freeze.  

This is a regression and happened between Beta 7 and Beta 8. I will nail down the regression range tomorrow morning.
Attached file user.js
Regressed between the builds 10120903 and 10121003:

PASS: http://hg.mozilla.org/mozilla-central/rev/11e328a49e0a
FAIL: http://hg.mozilla.org/mozilla-central/rev/a4544a4b3224

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=11e328a49e0a&tochange=a4544a4b3224

Could be related to the tracemonkey merge during that day. I will do a hg bisect tomorrow.
what advanced js settings are set?

user_pref("javascript.options.strict", true);
user_pref("pref.advanced.javascript.disable_button.advanced", false);

why are you using strict? This is a known problem, believe we have bugs on it
already.
Whiteboard: [mlk] → [mlk][dupeme]
If you use the strict option, you will get lots of warnings to the error console on pretty much any site.

If Firebug saves all error console output, you will get a "leak".
It only happens with all those three prefs set. Removing only one will not cause this leak. Even with GreaseMonkey not installed, I don't see how that pref has an influence here.
This is a regression from:

Bug 588648 - Don't copy chars when scanning. r=brendan.
author	Nicholas Nethercote <nnethercote@mozilla.com>
	Tue Dec 07 15:22:52 2010 -0800 (at Tue Dec 07 15:22:52 2010 -0800)
changeset 58990	2b9d805d77a1
parent 58989	25e9c434bb19
child 58991	4c18087c8bda
pushlog:	2b9d805d77a1
Blocks: 588648
(In reply to comment #5)
> It only happens with all those three prefs set.

What three prefs? I see two in comment 3.

> Removing only one will not
> cause this leak. Even with GreaseMonkey not installed, I don't see how that
> pref has an influence here.

What does "that pref" mean in the last sentence?

Nick, can you take and investigate?

/be
(In reply to comment #7)
> (In reply to comment #5)
> > It only happens with all those three prefs set.
> 
> What three prefs? I see two in comment 3.

See the user.js file I have attached to this bug.

> > Removing only one will not
> > cause this leak. Even with GreaseMonkey not installed, I don't see how that
> > pref has an influence here.
> 
> What does "that pref" mean in the last sentence?

You will see when checking the above mentioned attachment.
Blocking on investigation. We'll retriage once we find out what the problem is.
blocking2.0: ? → final+
Whiteboard: [mlk][dupeme] → [mlk][dupeme][hardblocker]
Assignee: general → nnethercote
Before that patch, in certain places we would parse JS files by reading them
and parsing them one chunk at a time, where a chunk was a few KB.  After
that patch, we instead read the whole file into memory before parsing.  This
made scanning much simpler and faster at the cost of higher (transient)
memory use. 

Compiler::compileScript() was the top-level function whose API changed.
It's called by the following functions:

- JS_CompileFile (affected)
- JS_CompileFileHandleForPrincipals (affected), which is called by:
  - JS_CompileFileHandleForPrincipalsVersion (affected)
  - JS_CompileFileHandle (affected)

- JS_CompileUCScriptForPrincipals (unaffected)
- EvalKernel (unaffected)

Of the affected functions, only JS_CompileFileHandleForPrincipalsVersion() 
is used in the browser, AFAICT, and only then if HAVE_PR_MEMMAP isn't
defined.

So I can't see off the top of my head what could be going wrong, but I've
written this down to (a) refresh the details in my own head, and (b) in case
it sparks an idea in someone else.

I'll now try to reproduce.
I can reproduce on Linux64, more or less.  Viewing that page without Firebug installed, process size topped out at ~800MB.  With Firebug, it topped out at ~3100MB, with ~2300 resident.  about:memory claimed only 256MB mapped, so it's not a heap problem.  The browser was responsive the whole time, maybe because I have 4GB of RAM.

But in the time it's taken me to write this comment, process size has dropped to 869MB.  So it looks like a big transient spike.  Time to pull out Massif.
Oh, wait, I tried again.  The big drop didn't occur until after I opened about:memory.  And this time about:memory said "memory mapped" was 2.0GB and "memory in use" was 1.9GB, but none of the individual counts below that were out of the ordinary.  So it *is* heap memory that's the problem.
I can reproduce with only this option set:

  user_pref("javascript.options.strict", true);

So I suspect this is a dup of bug 620218.  I don't know why bug 588648 would be the regression suspect, I'll double-check if it really is the regression.

(As bug 620818 says, setting that option is just asking for trouble, it's not clear to me why it exists.)
Summary: Huge memory leak with Firebug installed when advanced JS settings are set and the given URL gets opened → Huge memory spike with Firebug installed when advanced JS settings are set and the given URL gets opened
(In reply to comment #13)
> I can reproduce with only this option set:
> 
>   user_pref("javascript.options.strict", true);
> 
> So I suspect this is a dup of bug 620218.  I don't know why bug 588648 would be
> the regression suspect, I'll double-check if it really is the regression.
> 
> (As bug 620818 says, setting that option is just asking for trouble, it's not
> clear to me why it exists.)

Thanks for checking on this. From what I gather, that pref is considered harmful, and bug 620218 doesn't block, so neither should this bug.
blocking2.0: final+ → -
Whiteboard: [mlk][dupeme][hardblocker] → [mlk][dupeme]
Oh, I know what's happened.  Bug 588648 also changed error reporting a bit.  Previously, because we were dealing with chunks, there was a limit to how much of a line an error message could include.  But now the error message contains the entire line.  If you have very long lines and lots of errors, it's a recipe for high memory usage, especially since we call js_DeflateString() on the error message string, resulting in two copies of it (one made of jschars, the other made of chars).  

On the site in question, heaps of errors occurred on a line containing 122,895 chars, resulting in over 1G of chars (at 3 bytes per char!) being put into error messages.

As for why it only occurs when Firebug is installed... without Firebug, I think we do lots of big allocations but then free each one as soon as the error has been reported.  So there's lots of malloc churn, but no big spike.  With Firebug, the error reports are evidently held onto somehow.  Furthermore, if I disable Firebug but open the error console the spike also occurs.

It should be a pretty easy fix;  I just need to truncate the line if it's larger than a certain size.  I think it's reasonable to keep this as a hardblocker.
Summary: Huge memory spike with Firebug installed when advanced JS settings are set and the given URL gets opened → Errors in long lines cause memory spikes when a console is in use
(In reply to comment #14)
> 
> Thanks for checking on this. From what I gather, that pref is considered
> harmful, and bug 620218 doesn't block, so neither should this bug.

The question is whether we can get thousands of error messages in other situations.  I don't have a good sense of how likely that is, and thus whether this should block.
Just to clarify, the cause of this bug the interaction of the following things:

(1) Bug 588648 allowed error messages to contain entire lines (which can be very wrong).

(2) The presence of javascript.options.strict can lead to thousands of error messages.

(3) The presence of a console (be it the error console or Firebug) causes those error messages to be held onto.


(3) is controlled by the user.  (2) is a bad option, but large numbers of errors may occur due to other reasons.  (1) is stupid and can be fixed easily.
blocking2.0: - → final+
Whiteboard: [mlk][dupeme] → [mlk][hardblocker]
You could have some sort of minified JS (so it's all on one line; I'll bet that's what happens with the site in question) that has a bug in a setInterval it sets.  Then you'll get an error every 10ms or worse if there are multiple copies of the interval.
This patch adjusts the syntax error reporting to truncate long lines --
it'll show at most 2*WINDOW+1 chars of a line in an error message, where
WINDOW=100.  100 seems reasonable but I'm open to changing it.

The patch also fixes another defect that bug 588648 introduced, whereby
tp->begin.index could be set to -1, which became 4294967295 because it's
unsigned.  I found this because the new line truncation code uses
begin.index and when it was 4294967295 all hell broke loose.  This required
fiddling with the |adjust| parameter to getChar() for a couple of cases.  I
added an assertion in newToken() to prevent this defect from arising again.

I've confirmed that the big memory spike seen in the browser no longer
happens when javascript.options.strict is true and a console is present.
Attachment #513032 - Flags: review?(brendan)
Whiteboard: [mlk][hardblocker] → [mlk][hardblocker][has patch]
Comment on attachment 513032 [details] [diff] [review]
patch (against TM 62203:d7a8d64336ba)

Looks good, nits only. The "sline" name looks like "slime" but maybe that is fitting ;-).

>-TokenStream::findEOL()
>+TokenStream::findEOLorTrunc(jschar *tokptr, int max)

This isn't quite camelCaps but I see why "or" is all-lowercase. Still kind of overlong and unsightly. Just "findEOL" seems fine to me, better than trying to say more yet still chopping "ate" off.

> TokenStream::reportCompileErrorNumberVA(JSParseNode *pn, uintN flags, uintN errorNumber,
>                                         va_list ap)
> {
>     JSErrorReport report;
>     char *message;
>-    size_t linelength;
>-    jschar *linechars;
>-    jschar *linelimit;
>-    char *linebytes;

Blank line here if you keep the comment.

>+    /* "sline" is short for "shown part of line", because we might not show it all. */

Could use "wline" or "window" instead of "sline" -- would fit the WINDOW name better.

Digging the non-camelCaps localnames. Don't change (but see below).

>+    size_t slinelength;
>+    jschar *slinechars;
>+    jschar *slinelimit;
>+    jschar *slinebase;
>+    jschar *tokptr;
>+    char *slinebytes;

Blank line here couldn't hurt.

>+    /*
>+     * We only show a portion of the line around the erroneous token -- WINDOW

s/only show/show only/

r=me with some nit-picking; good fix.

/be
Attachment #513032 - Flags: review?(brendan) → review+
http://hg.mozilla.org/tracemonkey/rev/2d44fc834071
Whiteboard: [mlk][hardblocker][has patch] → [mlk][hardblocker][has patch], fixed-in-tracemonkey
Follow-up for assertion failure:
http://hg.mozilla.org/tracemonkey/rev/0d4b01278890
This needs to be backed out and not included in Firefox 4.0 -- see bug 635235 and bug 629858.  If Waldo doesn't get to it today I'll back it out on Monday morning, Melbourne time (Sunday afternoon, MV time).
I agree with backing this out.  (Although, to be fair, bug 635235 was pre-existing, and this only happened to trigger that concomitant bad behavior.)  I'll do the deed later tonight -- I have to push a patch to m-c, so I'm going to be around watching a tree anyway.

But really, hoary unchecked pointer arithmetic this close to release?  For error messages only?  This seems a quintessential non-hardblocker, at least fixed this way.  If excess memory consumption for long-line errors is truly a hardblocker concern (I don't think it is), let's just "null out" the right fields and not provide source for long-line errors.
> I don't think it is

It depends on how much it makes user's browsers crash, doesn't it?
(In reply to comment #25)
> 
> But really, hoary unchecked pointer arithmetic this close to release?  For
> error messages only?  This seems a quintessential non-hardblocker, at least
> fixed this way.  If excess memory consumption for long-line errors is truly a
> hardblocker concern (I don't think it is), let's just "null out" the right
> fields and not provide source for long-line errors.

I found that paragraph hard to understand, but I think you're saying this bug shouldn't be a hardblocker.  I agree, which is why we're backing it out;  I'm assuming/hoping a higher-up will change this to blocking2.x and remove the "[hardblocker]" marking in the whiteboard.
blocking2.0: final+ → .x
Whiteboard: [mlk][hardblocker][has patch], fixed-in-tracemonkey → [mlk][has patch], fixed-in-tracemonkey
I don't want to belabor it, but the point was that if we're talking something primarily affecting developers who use a console-like function, and the fix involves a bunch of unchecked pointer arithmetic (LOCAL_ASSERT-style checks might have avoided real badness), it seems reasonably clear that deferring is the right thing to do now.

http://hg.mozilla.org/tracemonkey/rev/3f71115041e8
Whiteboard: [mlk][has patch], fixed-in-tracemonkey → [mlk][has patch]
(In reply to comment #24)
> This needs to be backed out and not included in Firefox 4.0 -- see bug 635235
> and bug 629858.

I've confirmed that both of these were pre-existing bugs that this patch just exposed.  But we shouldn't include this patch in 4.0 just in case there are other similar latent bugs.
http://hg.mozilla.org/mozilla-central/rev/2d44fc834071
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
oops, this was backed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 638034
Whiteboard: [mlk][has patch] → [mlk][has patch][mozmill-test-needed][mozmill-endurance]
I've replicated this issue with a Mozmill endurance test. The first report linked below is without Firebug for comparison.

Firefox 4.0b13pre (2.0b13pre, en-US, 20110303122430), without Firebug:
http://mozmill.blargon7.com/#/endurance/report/b5eb523b2dfa00631bd71a5a3c017205

Firefox 4.0b13pre (2.0b13pre, en-US, 20110303122430), with Firebug 1.7X.0a10:
http://mozmill.blargon7.com/#/endurance/report/b5eb523b2dfa00631bd71a5a3c017167
Whiteboard: [mlk][has patch][mozmill-test-needed][mozmill-endurance] → [mlk][has patch][mozmill-endurance]
Blocks: 653890
Whiteboard: [mlk][has patch][mozmill-endurance] → [mlk][has patch][mozmill-endurance][MemShrink]
No longer blocks: 653890
Whiteboard: [mlk][has patch][mozmill-endurance][MemShrink] → [mlk][has patch][mozmill-endurance][MemShrink:P3]
Summary: Errors in long lines cause memory spikes when a console is in use → Long lines with many warnings from javascript.options.strict cause memory spikes when a console or Firebug is in use
Blocks: 746644
Another page, where this problem can be reproduced, is:

http://www.winvistatips.com/

Seems to be the following script, which causes the problem:

http://www.google.com/uds/api/elements/1.0/5d91467acc9e309ea86f2bde0e977216/inputtools.js

With SeaMonkey 2.9.1 (Firefox 12) on Windows XP this causes the browser to completely hang up.
Attached patch patch, v2 (obsolete) — Splinter Review
This is a revived version of the old patch, which shows at most 121 chars in
an error message (the first char of the offending token, plus up to 60
chars before it, and up to 60 chars after it).

There's no test, unfortunately, because AFAIK you can't get the context in
an error message within JS code.  I'd love to hear how a test might be
written for this.

I tested on winvistatips.com (comment 37).  Without the patch (and with 
script option enabled and the error console open), the browser locks up.
With the patch there's a pause but then it works ok again fairly quickly.
Attachment #513032 - Attachment is obsolete: true
Attachment #623569 - Flags: review?(jwalden+bmo)
Blocks: 634800
(In reply to Nicholas Nethercote [:njn] from comment #38)
> There's no test, unfortunately, because AFAIK you can't get the context in
> an error message within JS code.  I'd love to hear how a test might be
> written for this.

Might be possible with xpcshell tests, or? 
https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests
Flags: in-testsuite?
Comment on attachment 623569 [details] [diff] [review]
patch, v2

Review of attachment 623569 [details] [diff] [review]:
-----------------------------------------------------------------

This is still doing a depressing amount of raw pointer arithmetic of the sort that made me leery over a year ago here, but it looks like it checks out.  Still I'm leery enough of this that I'd kind of like it if you held off landing this until after the next merge date, in a week and a few days.  I'd really really really like to see this converted to use StringBuffer or something that does all the bounds-checking assertions for us.  Maybe in a quick followup?

::: js/src/frontend/TokenStream.cpp
@@ +403,5 @@
>      return i == n;
>  }
>  
> +// Finds the next EOL, but stops once 'max' jschars have been scanned *past*
> +// the starting jschar.

This comment should go by the declaration, in the header.

@@ +416,2 @@
>              break;
> +        if (n > max)

Make this >= so that the |windowRadius| name isn't a lie, see below.

@@ +500,5 @@
> +        // Find EOL, or truncate at the back if necessary.
> +        const jschar *windowLimit = userbuf.findEOLIfNotTooFar(tokptr, windowRadius);
> +    
> +        size_t windowLength = windowLimit - windowBase;
> +        JS_ASSERT(windowLength <= windowRadius * 2 + 1);

Okay, um, that + 1 at the end kind of makes the windowRadius name a lie.  :-)  Make the change mentioned earlier and I think you can make this assert a more proper, obvious, mathematically consistent condition.

@@ +508,4 @@
>              warning = false;
>              goto out;
>          }
> +        PodCopy(windowChars, windowBase, windowLength);

Before the copy, let's do a last set of asserts that |linebase <= windowBase && windowBase < limit| and |linebase <= windowBase + windowChars && windowBase + windowChars < limit|.

@@ +515,5 @@
>              warning = false;
>              goto out;
>          }
>  
>          /* Unicode and char versions of the offending source line, without final \n */

This comment needs updating.

::: js/src/frontend/TokenStream.h
@@ +751,5 @@
>          static bool isRawEOLChar(int32_t c) {
>              return (c == '\n' || c == '\r' || c == LINE_SEPARATOR || c == PARA_SEPARATOR);
>          }
>  
> +        const jschar *findEOLIfNotTooFar(const jschar *p, int max);

findEOLMax?  I'd rather avoid phrase-y names when possible, just turns into a mouthful -- plus "not too far" is not exactly concise.  Also make max size_t, please.
Attachment #623569 - Flags: review?(jwalden+bmo) → review+
> Okay, um, that + 1 at the end kind of makes the windowRadius name a lie. 

The code shows the character, plus |windowRadius| characters before it, plus |windowRadius| characters after it.  We seem to be interpreting "radius" differently in this discrete context, perhaps I should use a different name?
Attached patch patch, v3Splinter Review
This version uses StringBuffer to construct |windowChars|.  I had to expose StringBuffer::extractWellSized().  I've left |windowRadius| unchanged for now, but I've addressed the other nits.
Attachment #623569 - Attachment is obsolete: true
Attachment #630046 - Flags: review?(jwalden+bmo)
Comment on attachment 630046 [details] [diff] [review]
patch, v3

Review of attachment 630046 [details] [diff] [review]:
-----------------------------------------------------------------

Better.  This is still too messy overall, but it's an improvement.

(In reply to Nicholas Nethercote [:njn] from comment #41)
> The code shows the character, plus |windowRadius| characters before it, plus
> |windowRadius| characters after it.  We seem to be interpreting "radius"
> differently in this discrete context, perhaps I should use a different name?

Oh, hm.  So you're showing an *odd* number of characters, then?  That seems more than a bit strange.  Let's show the even number and make radius mean half width, as everyone will definitely expect.  (Whether they would expect the "radius" of an odd width to be as you have it in this patch, I dunno.  Let's avoid the confusion entirely and use it in a way everyone will understand.)

::: js/src/frontend/TokenStream.cpp
@@ +498,5 @@
> +        JS_ASSERT(windowLength <= windowRadius * 2 + 1);
> +
> +        // Create the windowed strings.
> +        StringBuffer windowBuf(cx);
> +        if (!windowBuf.append(windowBase, windowLength) || !windowBuf.append((jschar)0)) {

Make that '\0', please.

@@ +503,4 @@
>              warning = false;
>              goto out;
>          }
> +        windowChars = windowBuf.extractWellSized();

You need to handle failure here.  (extractWellSized can OOM if the characters are stored inline and allocation fails, or if realloc to the smaller length fails [possible with some libc implementations].)

::: js/src/vm/StringBuffer.h
@@ +88,5 @@
>      JSAtom *finishAtom();
> +
> +    /*
> +     * Creates a raw string from the characters in this buffer.  You should
> +     * append a NUL char to the StringBuffer before calling this.

Well, it's not *necessarily* the case people should do that.  It'll depend on people's use cases.  How about having something like this instead:

Creates a raw string from the characters in this buffer.  The string is exactly the characters in this buffer: it is *not* null-terminated unless the last appended character was '\0'.
Attachment #630046 - Flags: review?(jwalden+bmo) → review+
> > +        if (!windowBuf.append(windowBase, windowLength) || !windowBuf.append((jschar)0)) {
> 
> Make that '\0', please.

Even though '\0' is not a jschar?
> > Make that '\0', please.
> 
> Even though '\0' is not a jschar?

Using '\0' caused an ambiguous overloading of append(), so I stuck with |(jschar)0|.
I addressed the other nits.

https://hg.mozilla.org/integration/mozilla-inbound/rev/451ed5328312
Target Milestone: --- → mozilla16
https://hg.mozilla.org/mozilla-central/rev/451ed5328312
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
OS: Mac OS X → All
Hardware: x86_64 → All
Could we get this backported to the Aurora branch so we can have it in beta soon, or should it ride the train?
(In reply to Henrik Skupin (:whimboo) from comment #48)
> Could we get this backported to the Aurora branch so we can have it in beta
> soon, or should it ride the train?

The problematic code has been present in released code for over a year without causing too many real-world problems.  It should ride the trains.
Whiteboard: [mlk][has patch][mozmill-endurance][MemShrink:P3] → [mlk][has patch][mozmill-endurance][MemShrink:P3] [advisory-tracking+]
Blocks: 795352
You need to log in before you can comment on or make changes to this bug.