Closed Bug 357388 Opened 13 years ago Closed 13 years ago

js_SweepScopeProperties can leave a JSScopeProperty with dangling parent pointer

Categories

(Core :: JavaScript Engine, defect, P1, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: gavin.reaney, Assigned: brendan)

Details

(Keywords: crash, fixed1.8.0.9, fixed1.8.1.1)

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7

When sweeping up a property tree node in js_SweepScopeProperties the children of the swept node are reparented by calls to InsertPropertyTreeChild. When this call completes, the child should either have no parent, or their original grandparent becomes their new parent.

There is a possibility that the call to InsertPropertyTreeChild can fail in OOM situations, which will leave the children with the old parent pointer (which points to a dead parent that's now on the free list).

In some cases the parent and child may be in different arenas, so it is also possible that the child could then refer to memory that has been returned to the OS (assuming the dead parent's arena has no live nodes).

Following is a snippit of stack trace from a crash in picsel's embedding of the engine (the code is roughly equivalent to what's in FIREFOX_1_5_0_7_RELEASE). This crash is only reproducable when carrying out particular patterns of memory squeeze testing. This shows a case where a child holds a pointer into an arena that has been freed back to the OS:

First-chance exception in browser-app.exe: 0xC0000005: Access Violation.

RemovePropertyTreeChild(JSRuntime * 0x02763a5c, JSScopeProperty * 0x01c109bc) line 670 + 6 bytes
js_SweepScopeProperties(JSRuntime * 0x02763a5c) line 1556 + 13 bytes
js_GC(JSContext * 0x012256e0, unsigned int 2) line 2025 + 9 bytes
js_ForceGC(JSContext * 0x012256e0, unsigned int 2) line 1697 + 13 bytes
js_DestroyContext(JSContext * 0x012256e0, int 2) line 261 + 11 bytes
JS_DestroyContext(JSContext * 0x012256e0) line 1054 + 11 bytes

        kids = parent->kids;
        if (KIDS_IS_CHUNKY(kids)) {
            list = chunk = KIDS_TO_CHUNK(kids);
            chunkp = &list;

            do {
                for (i = 0; i < MAX_KIDS_PER_CHUNK; i++) {
                    if (chunk->kids[i] == child) {   <-- crash here

child               is 0x01c109bc
child->parent       is 0x0275b250
child->parent->kids is 0xA9A9A9A9

All fields of 'child->parent' have been set to A9A9A9A9 (set by the fortified memory allocation shell to indicate that this memory has been released by a call to free).

I haven't got around to reproducing a similar crash in the JS stand alone shell. Assuming my understanding is correct, I'm hoping that the above description makes some degree of sense.

In terms of fixing the problem, two scenarios need to be catered for that I can see:

1) Failure to insert the child into 'propertyTreeHash'.

I think we can simply set child->parent to NULL in this case and live with the child not being in the hash table at all.

2) Failure to insert the child into the grandparents list of kids.

I think we could reuse the kids memory chunk that will now be unused in the property node that is about to be swept (this would ensure that the insertion cannot fail).


Reproducible: Always
Severity: major → critical
Keywords: crash
A slightly ragged test case that causes the js shell to crash in windows or linux if we force malloc failure when reparenting the kids during GC.

In order to force the malloc failure apply the patch I will upload next and then run this test script.

The scenario is as described in a previous comment. A parent pointer in a child refers to an arena that has been free'd already.
Use this in conjunction with attachment 243063 [details].
This might explain an elusive topcrash.  Thanks for finding it!  How did you find it, btw?

/be
Assignee: general → brendan
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.9alpha
(In reply to comment #3)
> This might explain an elusive topcrash.  Thanks for finding it!  How did you
> find it, btw?
> 
> /be
> 

By running some tests which cause deliberate malloc failures (replacing malloc/free with a different allocator is the starting point for such testing). It was just luck that I happened to have the correct scope tree structure that would lead to a crash when malloc failed.

One thing I'm curious about. InsertPropertyTreeChild and RemovePropertyTreeChild have this comment:

/* NB: Called with the runtime lock held. */

During sweeping the lock isn't taken (as far as I can tell). Is there some other higher level locking that prevents multiple threads from modifying the property tree?

The speculative fix I suggested in the bug description is based on the assumption that removing a swept node from it's parent (or the root hash table) will leave a space available that won't be taken by another thread....
(In reply to comment #4)
> (In reply to comment #3)
> > This might explain an elusive topcrash.  Thanks for finding it!  How did you
> > find it, btw?
> 
> By running some tests which cause deliberate malloc failures (replacing
> malloc/free with a different allocator is the starting point for such testing).
> It was just luck that I happened to have the correct scope tree structure that
> would lead to a crash when malloc failed.

Thanks (I should have read comment 0 more carefully, but knowing it was just luck is helpful).

> One thing I'm curious about. InsertPropertyTreeChild and
> RemovePropertyTreeChild have this comment:
> 
> /* NB: Called with the runtime lock held. */
> 
> During sweeping the lock isn't taken (as far as I can tell). Is there some
> other higher level locking that prevents multiple threads from modifying the
> property tree?

Yes, the GC runs only when all threads are no longer in requests, so the comment is not telling the whole story.

> The speculative fix I suggested in the bug description is based on the
> assumption that removing a swept node from it's parent (or the root hash table)
> will leave a space available that won't be taken by another thread....

Right, GC is non-concurrent mark/sweep.  I'll try a patch when I have more time (feel free to try one sooner if you have time).  Thanks,

/be
OK, based on the previous comments here's a patch for the bug. The basis of this fix is two fold:

- make InsertPropertyTreeChild infallible when inserting kids into a non-NULL parent (when sweeping)
- deal with failure when inserting kids into the root property hash table (when sweeping)
Attachment #243927 - Flags: review?(brendan)
Gavin, thanks so much for this excellent work.  You can use bugzilla's interdiff to see SpiderMonkey "house style" conformance changes in this attachment diff'ed against your last one.  If you could have a look to verify that this differs only cosmetically, and give a thumbs up, I'll check it into the trunk and nominate for 1.8.1.1.  Thanks again,

/be
Attachment #243927 - Attachment is obsolete: true
Attachment #243956 - Flags: review+
Attachment #243927 - Flags: review?(brendan)
Attachment #243956 - Attachment is obsolete: true
Attachment #243958 - Flags: review+
Attachment #243956 - Flags: review+
(In reply to comment #8)
> Created an attachment (id=243958) [edit]
> one more comment style police action
> 

Thanks Brendan. Checked the interdiff and I'm happy that the changes are cosmetic only.
Fixed on trunk:

$ cvs ci -m"Fix from Gavin Reaney <gavin@picsel.com> for OOM-handling bugs in js_SweepScopeProperties (357388, r=me)." jsscope.c
Checking in jsscope.c;
/cvsroot/mozilla/js/src/jsscope.c,v  <--  jsscope.c
new revision: 3.49; previous revision: 3.48
done

Thanks again,

/be
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: blocking1.8.1.1?
Flags: blocking1.8.0.9?
Resolution: --- → FIXED
Comment on attachment 243958 [details] [diff] [review]
one more comment style police action

Nominating for 1.8.[01].x as I suspect this is a fix for a topcrash.

/be
Attachment #243958 - Flags: approval1.8.1.1?
Attachment #243958 - Flags: approval1.8.0.9?
Is it likely that a bug that happens only in OOM would be a topcrasher?  Or is comment 0 incorrect in stating that it happens only in OOM situations?
(In reply to comment #13)
> Is it likely that a bug that happens only in OOM would be a topcrasher?

It's impossible to compute or estimate odds ratios, AFAICT.  We have too little data and lots of volunteer effect.  For instance, just some folks running an addon with runaway allocation bugs, or even helping test a DoS concatenate to infinity case, could generate enough talkback to get into topcrash territory.  Or so I think.  Dbaron may have better ideas.

> Or is
> comment 0 incorrect in stating that it happens only in OOM situations?

No, that's correct.

/be
I think it's unlikely that an OOM-handling bug would be a topcrasher, although I'm not sure if I've looked at memory stats for those crashes.
Flags: in-testsuite-
Flags: blocking1.8.1.1?
Flags: blocking1.8.1.1+
Flags: blocking1.8.0.9?
Flags: blocking1.8.0.9+
Comment on attachment 243958 [details] [diff] [review]
one more comment style police action

approved for 1.8/1.8.0 branch, a=dveditz for drivers
Attachment #243958 - Flags: approval1.8.1.1?
Attachment #243958 - Flags: approval1.8.1.1+
Attachment #243958 - Flags: approval1.8.0.9?
Attachment #243958 - Flags: approval1.8.0.9+
Checking in jsscope.c;
/cvsroot/mozilla/js/src/jsscope.c,v  <--  jsscope.c
new revision: 3.45.20.3; previous revision: 3.45.20.2
done

Checking in jsscope.c;
/cvsroot/mozilla/js/src/jsscope.c,v  <--  jsscope.c
new revision: 3.45.28.2; previous revision: 3.45.28.1
done

/be
You need to log in before you can comment on or make changes to this bug.