Closed Bug 51954 Opened 24 years ago Closed 24 years ago

Random crashes in id lookup

Tracking

()

Status:

VERIFIED FIXED

Milestone:

M18

People

(Reporter: pschwartau, Assigned: brendan)

References

Details

(Keywords: crash, Whiteboard: [nsbeta3+][rtm++] FIX IN TRUNK, READY FOR RTM)

Attachments

(3 files)

Linux stack trace - 24 years ago Phil Schwartau 2.12 KB, text/plain		Details
stack of free location of future garbage 24 years ago John Bandhauer 2.01 KB, text/plain		Details
proposed fix 24 years ago Brendan Eich [:brendan] 3.76 KB, patch		Details \| Diff \| Splinter Review

Phil Schwartau

Reporter

Description

•

24 years ago

Using JS shell built 2000-09-08 on WinNT, Linux, and MacOS9.

The above testcase now crashes on Linux and Mac. 
On WinNT: doesn't crash, but takes 28 seconds to complete.


Here is the part of the testcase that seems to cause the crash,
with a debugging "print" added:


function addTestCase( t ) {
    for ( d = 0; d < TimeInMonth(MonthFromTime(t)); d+= msPerDay ) {
        t += d;
        testcases[tc++] = new TestCase( SECTION,
                                    "(new Date("+t+")).getDate()",
                                    DateFromTime(LocalTime(t)),
                                    (new Date(t)).getDate() );
    }
    print ("t is " + t);
    exit (3);
}


Here is what appeared in the console:


             t is -62132140800000


and then: CRASH! Will attach Linux stack trace below...

Phil Schwartau

Reporter

Comment 1

•

24 years ago

Attached file Linux stack trace - — Details

gerardok

Updated

•

24 years ago

Keywords: crash

Mike McCabe

Comment 2

•

24 years ago

I saw that the stack trace included global_resolve, which is defined in js.c. 
On a hunch, I did blame, and found code there bracketed in #ifdef
LAZY_STANDARD_CLASSES that uses the new lazy class initialization APIs.  (bug
46703)

If I comment out #define LAZY_STANDARD_CLASSES earlier in the file, the test
runs to completion.  It still takes a long time, so I don't think that's part of
the problem.

The trace also includes 'StdNameToAtom (cx=0x80d7198, name=0x80cd3fc "escape")'
... so it seems that we're trying to resolve 'escape' when we run into trouble.
'escape' may have gone away recently as a fix to 42221, so it seems likely that
this problem arises from an interaction between the fixes to 42221 and 46703.

CCing Brendan for a look.

Brendan Eich [:brendan]

Assignee

Comment 3

•

24 years ago

I've been running the testsuite successfully for a while, during and long after
the LAZY_STANDARD_CLASSES stuff was committed.

mccabe, which pointer is bad, and can you find out why?  I'll look too, in a
bit.

/be

Mike McCabe

Comment 4

•

24 years ago

Brendan -

Are you running the testsuite from the standalone JS shell?  It's the missing
(or possibly not entirely gone) 'escape' function that I suspect.

Brendan Eich [:brendan]

Assignee

Comment 5

•

24 years ago

Yes, I'm running js/tests/jsDriver.pl -e smdebug -L lc2 lc3 -t and getting no 
crashes.  At least I did two or three days ago.  How do you make it crash?  And 
did you debug around a bit to see what the bad address was, and whence it came?

/be

Robert Ginda

Comment 6

•

24 years ago

I'm getting an assert failure now...

[rginda@swamp tests] perl jsDriver.pl -e smdebug -l ecma/Date/15.9.5.10-2.js -f
/dev/null -k
-#- Executing 1 test(s).
*-* Testcase ecma/Date/15.9.5.10-2.js failed:
Expected exit code 0, got 134
Testcase terminated with signal 0
Complete testcase output was:
Assertion failure: first, at jscntxt.c:85

-#- Wrote results to '/dev/null'.
-#- 1 test(s) failed

Robert Ginda

Comment 7

•

24 years ago

ok, I guess I needed to make clean.  I don't see any failure in the date
testcase anymore, but instead, five other segfaults show up under the ecma
suite.  I'll go discuss this with phil and try to sort it out.

Mike McCabe

Comment 8

•

24 years ago

I'm testing on Linux with the standalone js engine.  Executing
'load("../jsref.js");load("15.9.5.10-2.js")' in the tests/ecma/Date directory
gets me a crash.  I think it has to do with overflowing the GC heap before a GC
can occur, but I'm still looking.

Brendan, sorry for the poorly founded suppositions earlier.

Mike McCabe

Comment 9

•

24 years ago

I'm still seeing this.  It looks like it was introduced with the
LAZY_STANDARD_CLASSES checkin of 8/19/2000.  (Verified via before-and-after
checkouts.)

I'm still using the same recipe to reproduce.  It looks like something elusive,
as making the testcase generate fewer objects (it uses many) or using the proper
"shell.js" test init file rather than "jsref.js" hides the problem.

In the JS shell, load("../jsref.js"); load("15.9.5.10-2.js"); foo; (where foo is
undefined) crashes.  The crash occurs when the engine tries to look up "foo" via
js.c's global_resolve; this runs into the JS_ResolveStandardClass code.  This
bombs when trying to atomize some of the standard class names to compare to
"foo", as the hashtable entries for some standard names ("escape") point to
garbage.  Just why, I don't know.

Mike McCabe

Updated

•

24 years ago

Summary: Testcase mozilla/js/tests/ecma/Date/15.9.5.10-2.js failing → Random crashes in id lookup

Brendan Eich [:brendan]

Assignee

Comment 10

•

24 years ago

Try purify.

Any GC'ing going on?  A breakpoint will tell.

/be

John Bandhauer

Comment 11

•

24 years ago

mccabe and I spent some time looking at this. Purify did not gather any extra 
info - it crashed in the same place but no error or warnings preceeded that.

What we *did* do was find the hashtable entry that contains a garbage string. We 
set a data breakpoint for that data changing and let it rip. We then got a stack 
trace for the location where that JSString - that *should* be rooted in the atom 
table - is free'd. This is before the point when it will later crash. This free 
happens in a finalization in gc. The *really* interesting thing is that this is 
a gc invoked from the "last ditch" gc in js_AllocGCThing!

This links us to the problem in the marking phase of the same sort of "last 
ditch" gc we see in bug 53123.

I'll attach the stack of the gc location.

John Bandhauer

Comment 12

•

24 years ago

Attached file stack of free location of future garbage — Details

Brendan Eich [:brendan]

Assignee

Comment 13

•

24 years ago

Oh My Gawd.

This is an ancient bug in atom mark/sweep GC.  Patch coming up.  I can't believe 
this wasn't found sooner.  The gist of it is that only pinned atoms have their 
keys marked, and therefore not swept.  Worse, we rely on marking pinned atoms 
(not their keys, mind you) to keep the pinned atoms (again, not their keys) from 
being swept.  Argh!

/be

Keywords: patch, review

Brendan Eich [:brendan]

Assignee

Comment 14

•

24 years ago

Wait, it's not as bad as I said -- I was on goof gas.  The problem is that the 
atom mark function wants to mark only pinned atoms, and it was doing that, but 
it was failing to mark all atom's keys.  This too seems right: some atoms are 
garbage, and so are their keys.  If a garbage atom has a non-garbage key, the 
key will be marked otherwise (via a live object's strong ref to that string 
jsvsal, say).

Now, in the "last ditch" case where js_AllocGCThing calls js_GC, we won't sweep 
any atoms (the GC_KEEP_ATOMS gcflag is passed to js_GC) because atoms are never 
rooted (they aren't jsvals, too bad; old old design decision).  But note well!  
Atoms are not swept, but their keys may be, because we have failed to mark those 
keys (such as the string "escape"), and no other ref may exist to the key.

Patch still coming up, but this isn't as dire as I thought in my goof-gas state: 
it afflicts only the last-ditch GC attempt in js_AllocGCThing.

/be

Brendan Eich [:brendan]

Assignee

Comment 15

•

24 years ago

Attached patch proposed fix — Details — Splinter Review

Brendan Eich [:brendan]

Assignee

Comment 16

•

24 years ago

mccabe, jband: you guys deserve the credit for leading this horse to water and 
making him drink.  Gimme some testing, r= and a= love, ok?

beard: please get me nsbeta3++ approval once the r's and a's are lined up.

/be

Assignee: rogerl → brendan

Severity: normal → major

Keywords: nsbeta3, rtm

Priority: P3 → P2

Target Milestone: --- → M18

Brendan Eich [:brendan]

Assignee

Updated

•

24 years ago

Status: NEW → ASSIGNED

Mike McCabe

Comment 17

•

24 years ago

r=mccabe, looks good and makes sense.  The testcase now works OK.

Beard is out of town, so another ++er is needed.

Brendan Eich [:brendan]

Assignee

Comment 18

•

24 years ago

Looking for a=jband, need trudelle to play beard here.  Peter, this is not a 
topcrash bug that we know of, but it should be rtm++ at least.  I'm concerned 
that it and 53123-reopen (which is topcrash) are accounting for other topcrash 
bugs that haven't been diagnosed.  I'd like to get the fix in for nsbeta3 too, 
therefore -- less noise for the remaining bug signal to rise above.

/be

John Bandhauer

Comment 19

•

24 years ago

a=jband. This looks right to me. The crasher test now runs ok. I can imagine 
this bug causing some havoc.

Peter Trudelle

Comment 20

•

24 years ago

Only PDT can double-plus, but I'll give it an nsbeta3+/rtm+ to get on their
radar, and some persuasive whiteboard verbiage. To get it into nsbeta3, someone
will have to plead the merits to PDT, and given the lateness, it will have to be
on the basis of known horrible effect on the user (i.e. P1), preferably combined
with a simple/safe/well-tested fix. I don't see the evidence here yet.  Noise
argument can also be used as a reason *not* to change nsbeta3 bits unnecessarily.

Whiteboard: [nsbeta3+][rtm+] FIX READY TO LAND

Brendan Eich [:brendan]

Assignee

Comment 21

•

24 years ago

"Noise argument can also be used as a reason *not* to change nsbeta3 bits 
unnecessarily."

Certainly, and that's the PDT's call.  But fix-in-hand should mean something.  
The fix here is not going to make things worse.

/be

Brendan Eich [:brendan]

Assignee

Comment 22

•

24 years ago

Fix checked into trunk.

/be

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Brendan Eich [:brendan]

Assignee

Comment 23

•

24 years ago

Oops, I should keep this open so the trunk changes get onto the Netscape 6
branch after PR3, for RTM.

/be

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Whiteboard: [nsbeta3+][rtm+] FIX READY TO LAND → [nsbeta3+][rtm+] FIX IN TRUNK, READY FOR RTM

Brendan Eich [:brendan]

Assignee

Updated

•

24 years ago

Status: REOPENED → ASSIGNED

Phil Peterson

Comment 24

•

24 years ago

PDT marking [rtm++]

Whiteboard: [nsbeta3+][rtm+] FIX IN TRUNK, READY FOR RTM → [nsbeta3+][rtm++] FIX IN TRUNK, READY FOR RTM

Brendan Eich [:brendan]

Assignee

Comment 25

•

24 years ago

Fix is in the branch too, for RTM.

/be

Brendan Eich [:brendan]

Assignee

Comment 26

•

24 years ago

Forgot to close this!

/be

Status: ASSIGNED → RESOLVED

Closed: 24 years ago → 24 years ago

Resolution: --- → FIXED

Phil Schwartau

Reporter

Comment 27

•

24 years ago

Marking Verified -

Status: RESOLVED → VERIFIED

Rahul Chauhan

Comment 28

•

22 years ago

*** Bug 174871 has been marked as a duplicate of this bug. ***

You need to log in before you can comment on or make changes to this bug.