Last Comment Bug 831626 - Switch cx->enumerators from a stack to a weak list
: Switch cx->enumerators from a stack to a weak list
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla21
Assigned To: David Anderson [:dvander]
:
:
Mentors:
Depends on: 838835
Blocks: 806820 830654
  Show dependency treegraph
 
Reported: 2013-01-16 18:18 PST by David Anderson [:dvander]
Modified: 2013-02-08 17:36 PST (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
+
fixed
fixed


Attachments
fix (27.69 KB, patch)
2013-01-17 18:07 PST, David Anderson [:dvander]
wmccloskey: review+
bajaj.bhavana: approval‑mozilla‑aurora+
bajaj.bhavana: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description David Anderson [:dvander] 2013-01-16 18:18:57 PST
cx->enumerators is a balanced stack of active iterator objects. In practice, it is really difficult to keep it correctly balanced, because the JS engine has so many different ways of leaving execution modes, and the try-note-iter mechanism itself is fairly delicate.

Since the enumerator list is unrooted, when it becomes unbalanced, its items can become garbage collected and later crash. This is likely behind topcrash bug 806820.

Instead, cx->enumerators should just be a doubly-linked list. Adding and removing items would still be fast, but we wouldn't require any LIFO ordering. The GC could then automatically remove entries which were not live.
Comment 1 Alex Keybl [:akeybl] 2013-01-17 14:26:11 PST
Tracking for FF19 given this may be a fix we uplift to resolve bug 806820.
Comment 2 David Anderson [:dvander] 2013-01-17 18:07:58 PST
Created attachment 703713 [details] [diff] [review]
fix

Passes jit-tests, sending to try. The only concern I have is around generators, which are only used in chrome code. Generators had their own enumerator list, and it was swapped with cx->enumerators when the generator ran. That means deleted properties could have shown up in a resumed generator, if the property was deleted when the generator was yielded. Now, the deleted property will be suppressed.

It's not the semantic change that worries me too much, but that if a generator yields in a for-in loop, then never resumes, it'll never remove its entry off compartment->enumerators. So that could leave extra stuff dangling until the next GC, but it should be rare.
Comment 3 David Anderson [:dvander] 2013-01-17 23:34:01 PST
Comment on attachment 703713 [details] [diff] [review]
fix

https://tbpl.mozilla.org/?tree=Try&rev=7172debda82e
Comment 4 Bill McCloskey (:billm) 2013-01-18 10:58:10 PST
Comment on attachment 703713 [details] [diff] [review]
fix

Review of attachment 703713 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsiter.cpp
@@ +1102,5 @@
>                      /*
> +                     * Root the iterobj. This loop can GC, so we want to make sure that if
> +                     * the GC removes any elements from the list, it won't remove this one.
> +                     */
> +                    RootedObject iterobj(cx, ni->iterObj());

I think it would be better to make this an AutoObjectRooter for now.

@@ +1103,5 @@
> +                     * Root the iterobj. This loop can GC, so we want to make sure that if
> +                     * the GC removes any elements from the list, it won't remove this one.
> +                     */
> +                    RootedObject iterobj(cx, ni->iterObj());
> +                    (void)iterobj;

In that case, this line should be unnecessary, I think.

::: js/src/jsiter.h
@@ +30,5 @@
>  
>  struct NativeIterator
>  {
> +    HeapPtrObject obj;                  // Object being iterated.
> +    HeapPtrObject iterObj_;             // Internal iterator object.

We never mark iterObj_, so I think it should just be a raw pointer.
Comment 5 Robert Kaiser 2013-01-22 06:20:55 PST
dvander, could we land this on trunk? Given that bug 806820 is still a notable concern even on 18, I'd like to verify on trunk that this fix helps it, and then see this uplifted to aurora and beta - there's even still the possibility out there that we might think about a 18.0.2 just for this one.
Comment 6 Alex Keybl [:akeybl] 2013-01-22 10:13:19 PST
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #5)
> dvander, could we land this on trunk? Given that bug 806820 is still a
> notable concern even on 18, I'd like to verify on trunk that this fix helps
> it, and then see this uplifted to aurora and beta - there's even still the
> possibility out there that we might think about a 18.0.2 just for this one.

Yep, let's get this onto Nightly/Aurora this week, and land no later than next Tuesday (in preparation for b4).
Comment 7 Bill McCloskey (:billm) 2013-01-24 11:21:11 PST
David's try run looked good, so I pushed this. And I just realized that the author isn't set properly. Whoops.

https://hg.mozilla.org/integration/mozilla-inbound/rev/09ab58c800a1
Comment 8 Ryan VanderMeulen [:RyanVM] 2013-01-24 18:10:07 PST
https://hg.mozilla.org/mozilla-central/rev/09ab58c800a1
Comment 9 Scoobidiver (away) 2013-01-26 10:31:15 PST
Great! There are no crashes in 21.0a1/20130125 and above.
It should be uplifted to Aurora and Beta.
Comment 10 Scoobidiver (away) 2013-01-26 10:32:38 PST
and maybe a 18.0.2 considering its very high volume.
Comment 11 juan becerra [:juanb] 2013-01-28 10:15:38 PST
Added Marcia as she was one of the few ones who experienced the related Facebook problem.
Comment 12 David Anderson [:dvander] 2013-01-28 12:51:43 PST
Comment on attachment 703713 [details] [diff] [review]
fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: random crashes with slow script dialog
Testing completed (on m-c, etc.): yes
Risk to taking this patch (and alternatives if risky): details in comment #2, but we haven't seen fallout so I'm no longer worried
String or UUID changes made by this patch:
Comment 13 bhavana bajaj [:bajaj] 2013-01-28 13:01:03 PST
Comment on attachment 703713 [details] [diff] [review]
fix

Approving on Aurora/Beta as this crash is very high across all the channels. Lets hope the crash volume goes down with this and we get more testing .

Please land no later than EOD today/early tomorrow morning on beta to make sure this gets into 19.0b4 going to build tomorrow.Thank you.
Comment 14 David Anderson [:dvander] 2013-01-28 13:25:22 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/12cc07fac263
Comment 15 David Anderson [:dvander] 2013-01-28 13:42:46 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/7b8f0863a844
Comment 16 David Anderson [:dvander] 2013-01-31 18:28:30 PST
https://hg.mozilla.org/releases/mozilla-release/rev/92d3a5aa0c7d
Comment 17 Andreas Gal :gal 2013-02-01 14:13:37 PST
As the original author of this code, I am wondering whether this fix really is a fix, or whether its papering over the real problem: a missing CloseIterator somewhere. With this change, along that path we won't mark the iterator as inactive, and thus we won't be able to reuse it any more, and we are potentially accumulating a long list of pretty expensive iterator allocations. David, what do you think?
Comment 18 bhavana bajaj [:bajaj] 2013-02-01 16:03:27 PST
:dvander,

There were no reliable STR in bug 806820(fixed by patch in this bug), so its hard for QA to verify anything other than looking at the crash-stats for volume going down.

Was wondering if you could please help QA with some hints regarding what area should they be checking for any possible regressions or any other pointers you may have for  testing/verifying this patch ?
Comment 19 David Anderson [:dvander] 2013-02-01 22:22:48 PST
(In reply to Andreas Gal :gal from comment #17)
> As the original author of this code, I am wondering whether this fix really
> is a fix, or whether its papering over the real problem: a missing
> CloseIterator somewhere.

Yes, we're papering over a problem, but only in the sense that the problem was unnecessary. Given that (1) we had no reasonable leads (2) the crash was pervasive and (3) the invariants required were subtle and complicated with 3 execution modes, it made more sense to completely eliminate the crash risk.

> With this change, along that path we won't mark the
> iterator as inactive, and thus we won't be able to reuse it any more, and we
> are potentially accumulating a long list of pretty expensive iterator
> allocations. David, what do you think?

The long list was indeed a concern, but the underlying bug is rare, and the list should get wiped out by GC. We are still marking the iterators as inactive, and not doing so is still a bug, but only a performance one rather than a GC violation.
Comment 20 David Anderson [:dvander] 2013-02-01 22:29:16 PST
(In reply to bhavana bajaj [:bajaj] from comment #18)
> :dvander,
> 
> There were no reliable STR in bug 806820(fixed by patch in this bug), so its
> hard for QA to verify anything other than looking at the crash-stats for
> volume going down.

That's probably our best bet, lacking STR. The goal of this bug was to eliminate the possibility of the crash ever happening, so if the crash reports stop, it's probably fixed.

> Was wondering if you could please help QA with some hints regarding what
> area should they be checking for any possible regressions or any other
> pointers you may have for  testing/verifying this patch ?

Jesse and Bill had a test case that would crash (involving the slow script dialog), but Bill said it only repro'd once and he estimated the reproducibility at 1/100.
Comment 21 Andreas Gal :gal 2013-02-04 09:04:07 PST
I am ok with this approach, but we should have a follow-up bug on file for the actual bug that we still need to fix (CloseIterator missing). David, what do you think?
Comment 22 Andreas Gal :gal 2013-02-06 14:06:15 PST
dvander?
Comment 23 David Anderson [:dvander] 2013-02-06 14:10:21 PST
(In reply to Andreas Gal :gal from comment #21)
> I am ok with this approach, but we should have a follow-up bug on file for
> the actual bug that we still need to fix (CloseIterator missing). David,
> what do you think?

Sure, we could, but I don't see it being actively investigated as (1) it would be very low impact/priority (2) we never had any actionable leads in the first place.

I did take out the assert that would catch an unbalanced enumerator stack. I'll add that assert back in, to make sure we don't accidentally regress the state of things.
Comment 24 David Anderson [:dvander] 2013-02-08 17:36:56 PST
filed bug 839726 to get rid of SuppressDeletedProperty and compartment->enumerators, that's probably the best path forward

Note You need to log in before you can comment on or make changes to this bug.