Closed Bug 112626 Opened 23 years ago Closed 22 years ago

String.split(regexp) crashes if regexp contains ()'s

Categories

(Core :: JavaScript Engine, defect)

All
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: c.schneider, Assigned: rogerl)

References

Details

(Keywords: crash, js1.5)

Attachments

(3 files)

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
BuildID:    2001112009

The following page causes a segmentation fault. Should be easy to find...

<HTML>
<HEAD>
<SCRIPT language="JavaScript">
	var _cs='2001-01-01';
	var curTime = _cs.split(/([- :])/);
</SCRIPT>
</HEAD>
<BODY>

</BODY>
</HTML>

Reproducible: Always
Steps to Reproduce:
1. call the webpage in description
2. boom
3.

Actual Results:  Segmentation Fault

Expected Results:  Calculate the correct result
worksforme, 2001-11-28 linux build.  No crash.
confirmend with win2k build 20011129..

I will try to get a stack trace but my debug doesn't crash (updating now)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
it doesn't crash my debug build.
0.9.6 talkback ID: TB38670554Q
Stack trace from Talkback incident TB38670554Q:

Build ID 2001112012
Product ID MozillaBranch
Operating System Win32
Stack Trace

str_split [d:\builds\seamonkey\mozilla\js\src\jsstr.c, line 1846] 
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 834] 
js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2792] 
js_Execute [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 1014] 
JS_EvaluateUCScriptForPrincipals [d:\builds\seamonkey\mozilla\js\src\jsapi.c, 
                                                              line 3380] 
nsJSContext::EvaluateString 
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSEnvironment.cpp, line 654] 
nsScriptLoader::EvaluateScript 
[d:\builds\seamonkey\mozilla\content\base\src\nsScriptLoader.cpp, line 576] 
nsScriptLoader::ProcessRequest 
[d:\builds\seamonkey\mozilla\content\base\src\nsScriptLoader.cpp, line 484] 
nsScriptLoader::ProcessScriptElement 
[d:\builds\seamonkey\mozilla\content\base\src\nsScriptLoader.cpp, line 428] 
nsHTMLScriptElement::SetDocument 
[d:\builds\seamonkey\mozilla\content\html\content\src\nsHTMLScriptElement.cpp, 
                                                                  line 159] 
nsGenericContainerElement::AppendChildTo 
[d:\builds\seamonkey\mozilla\content\base\src\nsGenericElement.cpp, line 3724] 
HTMLContentSink::ProcessSCRIPTTag 
[d:\builds\seamonkey\mozilla\content\html\document\src\nsHTMLContentSink.cpp, 
                                                                    line 5126] 
HTMLContentSink::AddLeaf 
[d:\builds\seamonkey\mozilla\content\html\document\src\nsHTMLContentSink.cpp, 
                                                                    line 3497] 
CNavDTD::AddLeaf [d:\builds\seamonkey\mozilla\htmlparser\src\CNavDTD.cpp, line 
                                                                         3774] 
CNavDTD::AddHeadLeaf [d:\builds\seamonkey\mozilla\htmlparser\src\CNavDTD.cpp, 
                                                                    line 3833] 
CNavDTD::HandleStartToken 
[d:\builds\seamonkey\mozilla\htmlparser\src\CNavDTD.cpp, line 1719] 
CNavDTD::HandleToken [d:\builds\seamonkey\mozilla\htmlparser\src\CNavDTD.cpp, 
                                                                     line 895] 
CNavDTD::BuildModel [d:\builds\seamonkey\mozilla\htmlparser\src\CNavDTD.cpp, 
                                                                     line 526] 
nsParser::BuildModel [d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, 
                                                                    line 1983] 
nsParser::ResumeParse [d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, 
                                                                    line 1847] 
nsParser::OnDataAvailable 
[d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, line 2505] 
nsDocumentOpenInfo::OnDataAvailable 
[d:\builds\seamonkey\mozilla\uriloader\base\nsURILoader.cpp, line 260] 
nsFileChannel::OnDataAvailable 
[d:\builds\seamonkey\mozilla\netwerk\protocol\file\src\nsFileChannel.cpp, line 
                                                                          511] 
nsOnDataAvailableEvent::HandleEvent 
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsStreamListenerProxy.cpp, line 
                                                                          203] 
PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 591] 
PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, 
                                                                     line 524] 
_md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 
                                                                         1072] 
nsAppShellService::Run 
[d:\builds\seamonkey\mozilla\xpfe\appshell\src\nsAppShellService.cpp, line 303] 
main1 [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp, line 1316] 
main [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp, line 1633] 
WinMain [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp, line 1651] 
WinMainCRTStartup() 
KERNEL32.DLL + 0x17d08 (0x77e87d08) 
An even more-reduced testcase. This crashes current Mozilla trunk binaries:

                         '1,2'.split(/(,)/);
                          

The crash has something to do with the back-reference parentheses.
If I take those out, there is no crash.

As Matti said above, I cannot get any crash with my Mozilla debug
builds on WinNT, Linux. I also cannot crash in the debug or optimized
JS shell. I only crash with Mozilla binary builds.

Reassigning to Kenton; cc'ing Roger for his RegExp expertise -
Assignee: rogerl → khanson
Summary: split function crashes mozilla → String.split(regexp) crashes if regexp contains ()'s
Clarification: using Mozilla binaries 2001-11-26-xx on WinNT, Linux, Mac9.1
It's only my WinNT binary that crashes on this; Linux and Mac9.1 load fine.
OS: All → Windows NT
To try to provoke the crash just key this in the URL bar:

                javascript: '1,2'.split(/(,)/);

My debug WinNT build prints out 1,,,2 
My WinNT binary crashes.


For prettier output, try 

                javascript: ('1,2'.split(/(,)/)).toSource();

My debug WinNT build prints out ["1", ",", "2"]
My WinNT binary crashes.
Compare bug 104375:  "string.replace() placeholder '$1' not working"
Attached file Crash case for same
Test case for crash on win32 for string.split()
Attached another test case, note on this test case that it is trying to split 
up the list by the semi-colons (;).  If you do NOT put a semi-colon in the 
textarea, it won't crash.  If you do, however, it breaks.  Tested on the 
Nightly Win32 build 2001121003.
The heart of Chris' testcase is:

              function check_email(email)
              {
                var email_ar = new Array();
                email_ar = email.split(/(;)/);
              }


Am seeing all the same features as above: 

1. I do not crash on any platform besides Windows
2. I do not crash with a Windows debug build, only nighly binaries
3. I do not crash in the debug or optimized JS shell
4. I do not crash if there is no backreference in the regexp: 

                email.split(/(;)/)  <--- crashes 
                email.split(/;/)    <--- is fine
It seems like a MS compiler bug.  I am moving to 0.9.8.
Target Milestone: --- → mozilla0.9.8
Targeting 9.9.  I need help!
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Cc'ing potential windows compiler bug buddies.

/be
hrm, can someone add a printf for sep before http://lxr.mozilla.org/mozilla/source/js/src/jsstr.c#1846
[or try sep && sep-> at that line]

sep = &tmp and sep and tmp have the same scope, so it really should never be null, but perhaps something magical is happening

my cable provider was @home so i'm w/o net access for my win box, otherwise i'd love to play :-)
in the regexp case, i noticed that we don't initialize the lenght of the empty
tmp JSSubString...

1780 #if JS_HAS_REGEXPS
1781 if (JSVAL_IS_REGEXP(cx, argv[0])) {
1782 reobj = JSVAL_TO_OBJECT(argv[0]);
1783 re = (JSRegExp *) JS_GetPrivate(cx, reobj);
1784 sep = &tmp;
1785 1786 /* Set a magic value so we can detect a successful re match. */
1787 sep->chars = NULL;
 === AT THIS POINT tmp.length IS GARBAGE ===
1788 } else
1789 #endif

could we be ending up in a routine such as find_split(...) where the length is
important ?

it might be worthwhile to clear the length along with the buffer, ie:
    sep->length = 0;
    sep->chars  = NULL;

just a wild guess ;-)  unfortunately i can't try it cause i don't have any
release builds handy :-(

-- rick
If my bleary eyes don't deceive me, find_split still does not depend on
sep->length being valid in the re != NULL case.  But it's gnarly, and setting
tmp.length = 0 would remove all doubt.  I'm betting, though (or my bleary eyes
are betting) that it won't help.

/be
hey brendan,

i think you're right... i ran the crash under purify and didn't see anything bad 
:-(  

but i always tend to look for using uninitialized locals (etc) since they tend 
to show up as optimized-only bugs...  but maybe it is codegen :-(

-- rick
I'm looking into a crash that's occuring in a re-built regexp engine and I don't
see where the input string for the regexp statics is getting set - leaving all
of the regexp static substrings in danger of pointing into collected space.
Could something like this could be happening here?
rogerl: how is RegExp.input related to this bug?  I don't see any use of it in
the script, and it's not used internally (only regexp_exec_sub uses it if called
with no arguments).  It's also a GC root, so it shouldn't dangle.

/be
The regexp static paren substrings point into the input string. If that string
is no longer in use by the user code then the regexp static input string is the
only remaining reference. But in the case I'm seeing, the input field isn't
getting set and so the paren substrings are bogus (because a gc has collected
the original). The loop in 'str_split' is using those paren substrings. It's
only a guess - I have a completely different test case that's tripping up here
and it may yet be something stupid I've done (there is precedence).
In file jsregexp.c  the following consturct is found in line 2235

  for (num = 0; num < state.parenCount; num++) {
.
.
.
.
  }

  res->parenCount = num;  // line 2235

It is not clear what num will be outside the loop
Ignore the last comment.  Code behaves as author intended.
Moved to Mozilla 1.2 as a temporary holding place, bugs remaining on 0.9.9 are
being scrutinized, now that the milestone has passed.

If this must be fixed for MachV, then it needs to be assigned the "nsbeta1"
keyword.
Target Milestone: mozilla0.9.9 → mozilla1.2
I think this should be fixed for 1.0, but I'm sorry I can't help diagnose it
right now.  Adding keyword.

/be
Keywords: js1.5, mozilla1.0
nominated with the `nsbeta1' keyword, on the grounds that this is a crasher.
Keywords: nsbeta1
Will not make 1.0.  Important bug but not going to make 1.0
David Bradley was unable to reproduce this bug as of May 24, 2002.  I need to
isolate when the errant behavior disappeared, and examine what changes occurred.
Resolving as WORKSFORME. 

Recall how to reprocuce this bug from Comment #11. I am no longer
able to reproduce the crash. To be more precise: I have a series 
of Mozilla WinNT trunk binaries. Every one of them up to and 
including my 2002-04-22 binary crashes on the testcase. But
my next one, 2002-04-28,  does not crash, nor does any after that.

So something happened between 2002-04-22 and 2002-04-28, but 
a check in bonsai shows it had nothing to do with jsregexp.c.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
Based on the report in Comment #28, marking Verified.

c.schneider@scram.de: if this problem still occurs for you 
with an up-to-date build, please reopen this bug; thanks -
Status: RESOLVED → VERIFIED
I bet this is a "dup" of bug 140070, based on Phil's build dates and a reading
of bonsai for mozilla/js/src in that period.

/be
I hate to be a contrarian, but I think my earlier wild guess might have been
substantive. Try this in the shell:

/b(a)/("cba")
RegExp.$1            // should be "a"
gc()
RegExp.$1            // should be "a", but often isn't.

The RegExp.input string is getting collected leaving the paren substring dangling.
I can't prove that's what happened in the original case, but it seems easiest to
re-open this than start another.
Status: VERIFIED → REOPENED
Resolution: WORKSFORME → ---
Comment on attachment 87414 [details] [diff] [review]
Set RegExp static input 

Gahhh.	How long has this hole been there?  sr=brendan@mozilla.org, but I hope
we don't extend the life of long strings arbitrarily, where we used to get by
with the weak refs from the parens arrays in cx->regExpStatics.

/be
Attachment #87414 - Flags: superreview+
Let's get this into the 1.1beta trunk and go for drivers approval for the 1.0
branch and js1.5 final.  Phil, can you link this and any other 1.5 bugs not yet
tracked up to the 1.5 metabug?  Thanks,

/be
Assignee: khanson → rogerl
Status: REOPENED → NEW
Keywords: mozilla1.0.1
Target Milestone: mozilla1.2alpha → mozilla1.0.1
Blocks: 149801
Have linked this bug to the JS1.5 meta bug 149801, along with other
bugs in the JS Engine component that have the "js1.5" keyword -
Just need r= to get this patch in 
Comment on attachment 87414 [details] [diff] [review]
Set RegExp static input 

r=rginda
Attachment #87414 - Flags: review+
Fix checked in to trunk.
Status: NEW → ASSIGNED
i sent drivers an approval request for the 1_0 branch,
in it i noted that Comment 32 indicated a correctness issue which should 
probably have an entry in the js regression test suite.

per mozilla seamonkey practices, this bug should be resolved fixed, since it is 
fixed on trunk. however, I'm not sure how that works when there's another 
branch (?) [js1.5] that cares about a fix -- I do need to know, since I have a 
nubus bug fix that I'd like in js1.5.
timeless: thanks. In JS Engine the custom is to leave a bug in the
Assigned state until it is checked into both trunk and branch.
Only then do we mark it Fixed. For example, this is the pattern that
Brendan always follows -
Attachment #87414 - Flags: approval+
please checkin to the 1.0.1 branch. once there, remove the "mozilla1.0.1+"
keyword and add the "fixed1.0.1" keyword.
Fix checked in to 1.0.1 branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Keywords: adt1.0.1
Flags: testcase?
Checking in regress-112626.js;
/cvsroot/mozilla/js/tests/js1_5/String/regress-112626.js,v  <--  regress-112626.js
initial revision: 1.1
Flags: testcase? → testcase+
verified fixed 1.9 20060818
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: