Last Comment Bug 89443 - Crash on large number of chained || operators
: Crash on large number of chained || operators
Status: VERIFIED FIXED
Included new logic for trailing AND'a
: crash, js1.5, relnote
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 All
: -- critical (vote)
: ---
Assigned To: Kenton Hanson (gone)
: Phil Schwartau
Mentors:
http://www.hclib.org/kidlinks.proxy
Depends on:
Blocks: 79893
  Show dependency treegraph
 
Reported: 2001-07-05 12:10 PDT by Peter Murray
Modified: 2014-04-26 03:30 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
stacktrace (large -- 132k) (129.73 KB, text/plain)
2001-07-05 12:56 PDT, Chase Tingley
no flags Details
proposed fix (2.18 KB, patch)
2001-07-25 19:08 PDT, Brendan Eich [:brendan]
no flags Details | Diff | Splinter Review
Differences of jsopcode.c (2.00 KB, patch)
2001-08-08 11:54 PDT, Kenton Hanson (gone)
no flags Details | Diff | Splinter Review
Differences of jsinterp.c (543 bytes, patch)
2001-08-08 11:58 PDT, Kenton Hanson (gone)
no flags Details | Diff | Splinter Review
Differences of jsconfig.h (2.68 KB, patch)
2001-08-08 12:02 PDT, Kenton Hanson (gone)
no flags Details | Diff | Splinter Review
Differences of jsemit.c (4.02 KB, patch)
2001-08-08 12:22 PDT, Kenton Hanson (gone)
no flags Details | Diff | Splinter Review
Revised patch of jsemit.c (3.60 KB, patch)
2001-08-09 16:21 PDT, Kenton Hanson (gone)
no flags Details | Diff | Splinter Review

Description Peter Murray 2001-07-05 12:10:58 PDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.2) Gecko/20010628
BuildID:    2001062815

When loading a large (130KB) proxy autoconfig file, Mozilla crashes with an
application error.  The PAC file is used by a public library as a site selection
tool:  only the sites that appear in the PAC are accessible through the browser.
 All other sites redirect the user to a proxy server which tells the user that
access at their particular station is restricted to pre-approved sites.  The PAC
file consists of a large number of or'd dnsDomainIs() calls.  Eg:

  if (dnsDomainIs(host,".oclc.org") ||
      dnsDomainIs(host,".collegesource.com") ||
      dnsDomainIs(host,".grolier.com") ||
 [...]
      )
   return "DIRECT";
   else return "PROXY ...";

I am reporting this on behalf of Glenn Peterson at Hennepin County Library. 
Glen notes that in his testing a 22KB PAC file works, but a 36KB does not.


Reproducible: Always
Steps to Reproduce:
1. Use http://www.hclib.org/kidlinks.proxy as the Automatic Proxy Config URL
2. Relaunch browser
3. Wait about 10 seconds for application error
Comment 1 Chase Tingley 2001-07-05 12:37:38 PDT
Hennepin County 55405!

Confirming on Linux, 2001070206 -- it segfaults on startup every time after
setting the autoconfig pref.  Ugly.

Moving back to networking, as PAC stuff tends to live there.
Comment 2 Chase Tingley 2001-07-05 12:56:05 PDT
Created attachment 41293 [details]
stacktrace (large -- 132k)
Comment 3 Chase Tingley 2001-07-05 13:01:31 PDT
I got a stack trace using a linux cvs build from 20010705.

The trace is quite large, but is dominated by thte 1350 levels of js_EmitTree. 
The crash is:
0x400dbdba in _js_LookupProperty (cx=0x8233e08, obj=0x827b7a8, id=136891936, 
    objp=0xbfe020c8, propp=0xbfe020bc, file=0x40115aa0 "jsemit.c", line=548)
    at jsobj.c:2046

Comment 4 Chase Tingley 2001-07-05 13:19:41 PDT
Workaround: the PAC consists of a billion lines of this form:

|| dnsDomainIs(host, ".acceptableviewingmaterial.com")

Use sed/awk/perl/vim to convert these lines to the form:

if (dnsDomainIs(host, ".sanitizedcontent.com") { return "PROXY
proxy.hclib.org:80"; }

Since if any of the domains match you'll be returning the proxy, there's no need
to chain them all into a single megalithic if statement.  I tried this, and I
successfully started up without crashing.

Comment 5 Chase Tingley 2001-07-05 14:33:51 PDT
I can get xpcshell to segfault merely by loading this PAC file.  -> JS engine.
Comment 6 Phil Schwartau 2001-07-05 14:51:18 PDT
cc'ing Brendan on this one - 
Comment 7 benc@chuang.net 2001-07-06 05:36:24 PDT
This is not the most intelligent way of controlling internet access.

They should create a spoofing DNS server and map the offensive domains to a
single webserver to handle errors. Or they should set up access control on a
proxy server.

We need to declare a maximum PAC size that will be supported. The user is going
to need to find a sensible way to manage their network access. At some point,
the string size that expresses the desirable or undesirable domains is going to
exceed the practical limits of PAC. PAC was meant as a general routing and
network access control mechanism, not as a way of distributing large lists of
domains to many clients.
Comment 8 Peter Murray 2001-07-06 07:03:50 PDT
It may not be the most intellegent way, but it is arguably the easiest for a 
library with little technical expertise to accomplish it.  Creating a PAC file 
based on simple instructions, storing on the server, and configuring 
clients to use it is a very simple thing to do.

Granted, this is the largest PAC file I've seen.  But it should be handled 
more gracefully than it is now.  Although the execution of such a large 
JavaScript file may have an impact on client-side performance, I believe 
that is a choice the site makes for itself, not one imposed by the 
application.

The workaround to convert the lines into:

  if (dnsDomainIs(host, ".sanitizedcontent.com") return "PROXY
proxy.hclib.org:80";

...worked in my testing, although it almost doubled the size of the PAC file.  
This by itself shouldn't be a big problem since this application of a PAC is 
intended for workstations on a LAN -- the PAC will probably never be 
loaded over a slow link.  I'm waiting to hear back from Glenn to see if it 
works for him at his site.  Thanks for the suggestion!
Comment 9 Brendan Eich [:brendan] 2001-07-06 15:04:39 PDT
The crash is due to a stack overflow.  The code generator uses recursion to walk
parse trees that are right-heavy in the case of chained || operators.  See the
comment near the top of jsparse.h.

There is no easy fix, and I hate to WONTFIX a crash bug, but I bet we all have
better things to do than sweat a fix for this bug right now.  For example, the
proxy autoconfig function could be simplified quite a bit without changing its
meaning, like so:

var hclibProxyDomains = [
    ".asahi.com",
    ...
    "zoom.baton-rouge.la.us"
];

function FindProxyForURL(url, host) {
    if (isPlainHostName(host) ||
        dnsDomainIs(host, ".hennepin.lib.mn.us") ||
        dnsDomainIs(host, ".hclib.org"))
        return "DIRECT";

    for (var i = 0; i < hclibProxyDomains.length; i++)
        if (dnsDomainIs(host, hclibProxyDomains[i]))
            return "PROXY proxy.hclib.org:80";

    return "PROXY 172.16.100.20:80";
}

The function could be more efficient if it could avoid linear search by
isolating the domain suffix or suffixes of host, and looking them up in an
object that maps each string in hclibProxyDomains to true or some other "found"
property value.  Using JS objects as associative arrays gives O(1) lookup time
complexity.

/be
Comment 10 Phil Schwartau 2001-07-12 14:29:53 PDT
Reassigning to Kenton - 
Comment 11 Phil Schwartau 2001-07-12 14:36:52 PDT
Compare bug 90445, "Crash compiling bloated function while loading page", 
which also involves || operators ...
 
Comment 12 Phil Schwartau 2001-07-12 15:04:57 PDT
Testcase added to JS testsuite: 

                   js1_5/Regress/regress-89443.js
Comment 13 basic 2001-07-21 00:42:06 PDT
change summary from
PAC: browser crashes on large PAC files
to
PAC: crash on large number of chained '||'

is this still considered a PAC bug?
Comment 14 Phil Schwartau 2001-07-21 12:13:54 PDT
No, I believe this is purely a JavaScript Engine issue. 
Resummarizing to remove mention of the PAC file per se -
Comment 15 benc 2001-07-23 01:14:48 PDT
If there is no pre-existing JS bug, can you create a new one and let us be a
depends on you?
Comment 16 Morten Welinder 2001-07-25 10:49:01 PDT
(Showing my ignorance)

Why is this hard?  I.e., why doesn't the following work:

1. Reduce using ((A || B) || C) = (A || (B || C))
2. Evaluate using (A || B) = (A ? true : B)    [valid for js?]
   The evaluation of B would be a tail call.  Since destructors might
   interfere, a suitable goto could be used.
Comment 17 Brendan Eich [:brendan] 2001-07-25 19:04:42 PDT
Morten: thanks for the nudge -- of course you are right that the recursion is
tail recursion, which can be eliminated, but we need storage as well, because
all the JSOP_OR bytecodes must have as their immediate jump-offset operand the
offset from each JSOP_OR to the bytecode after the entire chained-|| expression.

This is not hard to do with back-patching, so I was wrong to say that there is
no easy fix.  Patch coming up.  I did not do likewise for JSOP_AND, because I
believe chained && expressions of such length as to threaten stack overflow to
be unheard of, and so not worth the added code.  I could be wrong....

/be
Comment 18 Brendan Eich [:brendan] 2001-07-25 19:08:10 PDT
Created attachment 43622 [details] [diff] [review]
proposed fix
Comment 19 Brendan Eich [:brendan] 2001-07-25 19:11:25 PDT
Looking for r= and sr= for 0.9.4.

/be
Comment 20 Brendan Eich [:brendan] 2001-07-25 19:25:15 PDT
sr=shaver (as brendan)
Comment 21 Phil Schwartau 2001-07-26 23:00:27 PDT
Ran the JS test suite on WinNT in the debug and optimized JS shell
with the patch applied. No regressions were introduced by the patch.
On the other hand, the testcase for this bug did not pass: 


Testcase js1_5/Regress/regress-89443.js failed 
   Expected exit code 0, got 253
   Testcase terminated with signal 0
   Complete testcase output was:
   Testcase produced no output!


http://lxr.mozilla.org/mozilla/source/js/tests/js1_5/Regress/regress-89443.js
Comment 22 Phil Schwartau 2001-07-26 23:06:03 PDT
Apologies for the indentation style in the testcase - the function
it contains is a direct copy from the URL reported above...

It's too big to fiddle with!
Comment 23 Brendan Eich [:brendan] 2001-07-27 00:18:19 PDT
Worksforme still -- can you find a core file, gdb the shell with it to get a
stack backtrace?

/be
Comment 24 Phil Schwartau 2001-07-27 11:07:25 PDT
My blunder!

I misused the command-line options for the test driver in such a way 
that it utilized a NON-patched JS shell I have in another directory.

WORKSFORME, too - the patch makes the testcase pass!
Comment 25 benc 2001-07-31 13:54:44 PDT
+relnote - is there a practical limit I can give for this? If not, can someone
pick a safe practical limit?
Comment 26 Brendan Eich [:brendan] 2001-07-31 14:44:51 PDT
benc: I don't know of a practical limit, because it depends on OS stack size
limits, admin-settable in some cases (see the limit command on modern BSD-ish
Unixes), and on how much is alreayd on the stack when JS starts nesting.

kenton: please feel free to grill me on details of this patch, questions about
the surrounding code, anything.  I'd like to get this into 0.9.4 and need code
review (r=) from a module owner/peer as well as shaver's sr=.

/be
Comment 27 benc 2001-08-01 09:37:37 PDT
Can you say it's safe up to, says 50?

That's a pretty large number to me...
Comment 28 Kenton Hanson (gone) 2001-08-02 15:49:46 PDT
I tested the patch with a an if statement with 707 ||'s that behaved well with 
the patch but failed without the patch.
It seems like a good time to consider the corresponding patch for trailing 
multiple &&'s as well as trailing combinations of ||'s and &&'s.  I will try to 
construct these patches to test my understanding of the current patch
Comment 29 Brendan Eich [:brendan] 2001-08-02 16:01:43 PDT
benc: 50 sounds safe, but I am guessing.

khanson: great!  With a little more work, we could probably unify the code for
the TOK_OR and TOK_AND cases.  Note that pn->pn_op contains JSOP_OR for TOK_OR
and JSOP_AND for TOK_AND, so we can use pn->pn_op as a parameter instead of
hardcoding JSOP_OR as the patch and existing case-code does.  Feel free to rip
out JS_BUG_SHORT_CIRCUIT while you are at it -- from all the files in which it
is tested or defined.

/be
Comment 30 Kenton Hanson (gone) 2001-08-08 11:54:14 PDT
Created attachment 45099 [details] [diff] [review]
Differences of jsopcode.c
Comment 31 Kenton Hanson (gone) 2001-08-08 11:58:16 PDT
Created attachment 45100 [details] [diff] [review]
Differences of jsinterp.c
Comment 32 Kenton Hanson (gone) 2001-08-08 12:02:02 PDT
Created attachment 45101 [details] [diff] [review]
Differences of jsconfig.h
Comment 33 Kenton Hanson (gone) 2001-08-08 12:15:04 PDT
Removed all references to JS_BUG_SHORT_CIRCUIT.  Applied the change from OR to 
AND code.  Tested changes.  Consolidated AND and OR code.  However, this code was 
much less clear, so I did not include these changes.  Including logic for 
combinations of trailing AND's and OR's seems a bit ambitious for this bug.  I am 
willing to pursue it.
Comment 34 Kenton Hanson (gone) 2001-08-08 12:22:22 PDT
Created attachment 45105 [details] [diff] [review]
Differences of jsemit.c
Comment 35 Morten Welinder 2001-08-08 12:28:28 PDT
I don't see why combinations of || and && is even interesting.  What kind of
"naturally-occurring" immensely deep expression are we worried about?  It would
have to be something like "A && (B || (C && (D || ... )))" should does not
strike me as likely.

Comment 36 Brendan Eich [:brendan] 2001-08-08 13:51:30 PDT
There are only 4 out of 29 non-comment/whitespace-only lines that differ between
the TOK_OR and TOK_AND cases, so it seems worthwhile to me to unify the two.  I
don't find this unification unclear:

 if (!js_EmitTree(cx, cg, pn->pn_left))
            return JS_FALSE;
        top = js_Emit3(cx, cg, pn->pn_op, 0, 0);
        if (top < 0)
            return JS_FALSE;
        jmp = top;
        pn2 = pn->pn_right;
        if (pn2->pn_type == pn->pn_type) {
            do {
                pn = pn2;
                if (!js_EmitTree(cx, cg, pn->pn_left))
                    return JS_FALSE;
                off = js_Emit3(cx, cg, pn2->pn_op, 0, 0);
                if (off < 0)
                    return JS_FALSE;
                SET_JUMP_OFFSET(CG_CODE(cg, jmp), off - jmp);
                jmp = off;
                pn2 = pn->pn_right;
            } while (pn2->pn_type == pn->pn_type);
        }
        if (!js_EmitTree(cx, cg, pn2))
            return JS_FALSE;
        for (off = CG_OFFSET(cg); top < jmp; top += tmp) {
            jsbytecode *pc = CG_CODE(cg, top);
            tmp = GET_JUMP_OFFSET(pc);
            CHECK_AND_SET_JUMP_OFFSET(cx, cg, pc, off - top);
        }
        CHECK_AND_SET_JUMP_OFFSET(cx, cg, CG_CODE(cg, jmp), off - jmp);
        break;

What am I missing?  Probably the lack of TOK_OR, JSOP_OR, etc. -- but a comment
could make up for that lack, surely?

/be
Comment 37 Brendan Eich [:brendan] 2001-08-08 13:58:46 PDT
Urk, I had a pn2 in that inline unified code fragment that stylistically ought
to be pn -- pn and pn2 are == at the point in question, but pn is the "current"
TOK_OR or TOK_AND node whose pn_op we want to emit:

 if (!js_EmitTree(cx, cg, pn->pn_left))
            return JS_FALSE;
        top = js_Emit3(cx, cg, pn->pn_op, 0, 0);
        if (top < 0)
            return JS_FALSE;
        jmp = top;
        pn2 = pn->pn_right;
        if (pn2->pn_type == pn->pn_type) {
            do {
                pn = pn2;
                if (!js_EmitTree(cx, cg, pn->pn_left))
                    return JS_FALSE;
                off = js_Emit3(cx, cg, pn->pn_op, 0, 0);
                if (off < 0)
                    return JS_FALSE;
                SET_JUMP_OFFSET(CG_CODE(cg, jmp), off - jmp);
                jmp = off;
                pn2 = pn->pn_right;
            } while (pn2->pn_type == pn->pn_type);
        }
        if (!js_EmitTree(cx, cg, pn2))
            return JS_FALSE;
        for (off = CG_OFFSET(cg); top < jmp; top += tmp) {
            jsbytecode *pc = CG_CODE(cg, top);
            tmp = GET_JUMP_OFFSET(pc);
            CHECK_AND_SET_JUMP_OFFSET(cx, cg, pc, off - top);
        }
        CHECK_AND_SET_JUMP_OFFSET(cx, cg, CG_CODE(cg, jmp), off - jmp);
        break;

I agree that combinations of && and || are invariable short enough that
recursion never threatens a real-world OS stack limit.

/be
Comment 38 Brendan Eich [:brendan] 2001-08-08 14:24:06 PDT
Finally getting caffeine, spotting if(C)do{...}while(C) silliness in my patch,
which should of course be simplified to while(C){...}.  Final unified case code:

 if (!js_EmitTree(cx, cg, pn->pn_left))
            return JS_FALSE;
        top = js_Emit3(cx, cg, pn->pn_op, 0, 0);
        if (top < 0)
            return JS_FALSE;
        jmp = top;
        pn2 = pn->pn_right;
        while (pn2->pn_type == pn->pn_type) {
            pn = pn2;
            if (!js_EmitTree(cx, cg, pn->pn_left))
                return JS_FALSE;
            off = js_Emit3(cx, cg, pn->pn_op, 0, 0);
            if (off < 0)
                return JS_FALSE;
            SET_JUMP_OFFSET(CG_CODE(cg, jmp), off - jmp);
            jmp = off;
            pn2 = pn->pn_right;
        }
        if (!js_EmitTree(cx, cg, pn2))
            return JS_FALSE;
        for (off = CG_OFFSET(cg); top < jmp; top += tmp) {
            jsbytecode *pc = CG_CODE(cg, top);
            tmp = GET_JUMP_OFFSET(pc);
            CHECK_AND_SET_JUMP_OFFSET(cx, cg, pc, off - top);
        }
        CHECK_AND_SET_JUMP_OFFSET(cx, cg, CG_CODE(cg, jmp), off - jmp);
        break;

/be
Comment 39 Kenton Hanson (gone) 2001-08-08 14:59:44 PDT
I agree with all the changes including the simplification of the "if do while" 
going to a "while".  
Comment 40 Kenton Hanson (gone) 2001-08-08 15:56:45 PDT
The while statement can be changed to

       while (pn2->pn_type == TOK_OR || pn2->pn_type == TOK_AND) {

to handle mixed && and || tails.
Comment 41 Brendan Eich [:brendan] 2001-08-08 16:10:22 PDT
khanson: righteous!  new jsemit.c patch?

/be
Comment 42 Brendan Eich [:brendan] 2001-08-08 16:11:07 PDT
Nominating for js1.5 and mozilla0.9.4.

/be
Comment 43 Kenton Hanson (gone) 2001-08-09 16:21:38 PDT
Created attachment 45315 [details] [diff] [review]
Revised patch of jsemit.c
Comment 44 Brendan Eich [:brendan] 2001-08-09 16:46:40 PDT
khanson: looks great, except for the ^M Mac line endings in the patch, and
except for this over-indented loop body:

+        while (pn2->pn_type == TOK_OR || pn2->pn_type == TOK_AND) {
+                pn = pn2;
+                if (!js_EmitTree(cx, cg, pn->pn_left))
+                    return JS_FALSE;
+                off = js_Emit3(cx, cg, pn->pn_op, 0, 0);
+                if (off < 0)
+                    return JS_FALSE;
+                SET_JUMP_OFFSET(CG_CODE(cg, jmp), off - jmp);
+                jmp = off;
+                pn2 = pn->pn_right;
+
+        }

Oh, and the extra newline before the closing brace seems unnecessary to me.  Not
sure if tabs existed in your version that made it look right in CodeWarrior, but
the patch has no tabs, and the loop body should indent only by four space (and
no tabs should get checked in -- beard can help with CW configuration to avoid
this, I think).

Pick these nits and sr=brendan@mozilla.org -- thanks.

/be
Comment 45 Brendan Eich [:brendan] 2001-08-13 12:13:01 PDT
Let's get this in for 0.9.4 -- who can r=?  New patch for jsemit.c needed?  I'm
ok just asking for the overindentation/extra-newline nits to be picked, without
seeing yet another patch, myself.

/be
Comment 46 rogerl (gone) 2001-08-14 13:14:14 PDT
r=rogerl
Comment 47 Brendan Eich [:brendan] 2001-08-17 03:34:10 PDT
I checked in khanson's patches (made a few whitespace/comment-only changes to
the jsemit.c one).  The JS testsuite passes except for the five (right, phil?)
known failures.  Thanks, all.

/be
Comment 48 Phil Schwartau 2001-08-28 13:15:58 PDT
Trying to verify this one. The JS testsuite passes except for the five 
known failures. The test driver passes the specific testcase for this bug. 
However, if I go into the JS shell and load this testcase manually,
I get this error: 

[//d/JS_TRUNK/mozilla/js/src/WINNT4.0_DBG.OBJ]./js

js> load('../../tests/js1_5/shell.js')
js> load('../../tests/js1_5/Regress/regress-89443.js')
InternalError: else statement too large


I'm guessing it comes from this in jsemit.c:

JSBool
js_SetJumpOffset(JSContext *cx, JSCodeGenerator *cg, jsbytecode *pc,
                 ptrdiff_t off)
{
    if (off < JUMP_OFFSET_MIN || JUMP_OFFSET_MAX < off) {
        ReportStatementTooLarge(cx, cg);
        return JS_FALSE;
    }
    SET_JUMP_OFFSET(pc, off);
    return JS_TRUE;
}


where JUMP_OFFSET_MAX is defined in jsopcode.h as:
#define JUMP_OFFSET_MAX         ((int16)0x7fff)  = 32,767


The test driver doesn't detect this internal error because the JS shell
gives exit code 0, exit signal 0 for the test...

Is this the expected behavior here - or is there something more 
to do on this bug? 
Comment 49 Kenton Hanson (gone) 2001-08-29 10:58:08 PDT
This is probably a caused by the fix to bug 90445.  Prior to fix 90445 the 
JumpOffset was not being range checked.
Comment 50 Brendan Eich [:brendan] 2001-08-29 12:00:48 PDT
Can the testcase be tweaked so as to avoid overflowing the jump offset operand
but still have so many chained || operators that it overflows typical process
stacks, when run against a version of js that lacks this bug's fix?

/be
Comment 51 Phil Schwartau 2001-08-29 12:03:57 PDT
OK, I will try tweaking the testcase - 
Comment 52 Phil Schwartau 2001-08-29 14:10:51 PDT
Testcase successfully tweaked. It now loads successfully by hand in the
current JS shell on WinNT and Linux. Yet it crashes on JS shells as of:

Recently            (01 Aug 2001)
Release Candidate 3 (11 May 2001)
Release Candidate 2 (14 Aug 2000)


Marking this bug VERIFIED FIXED
Comment 53 Phil Schwartau 2001-08-29 14:14:01 PDT
That leaves one question: should I file a new bug on the behavior
I observed above? That is, if a testcase triggers something like

             InternalError: else statement too large

shouldn't the JS exit code be non-0?

Comment 54 Brendan Eich [:brendan] 2001-08-30 10:56:03 PDT
Yes, any non-warning error detected by the compiler should cause an abnormal
termination of the shell.  Can you file a bug?  This should be easy if a bit
tedious to debug.

/be

Note You need to log in before you can comment on or make changes to this bug.