Closed Bug 1140773 Opened 5 years ago Closed 5 years ago

Intermittent js1_5/Array/regress-474529.js | (args: "--baseline-eager --no-fpu")

Categories

(Core :: JavaScript: GC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox37 --- unaffected
firefox38 --- unaffected
firefox39 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: philor, Assigned: jimb)

References

Details

(6 keywords)

Attachments

(2 files)

No description provided.
Semi-frequent on Windows SM runs.
This seems to now be failing reliably (or nearly reliably) on inbound, maybe due to:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5e45fba743aa
Flags: needinfo?(jimb)
I'll take a look.
Flags: needinfo?(jimb)
I can reproduce.
Seems like it doesn't reproduce under rr. Going primitive.
It seems like jemalloc's more aggressive poisoning is showing up a pre-existing failure.

The generational GC nursery's FreeHugeSlotsTask is trying to iterate over a HashSet whose table has been freed. The crash occurs when the test exits (the test itself runs to completion), so I wonder if it isn't a race between the main thread deleting the JSRuntime and a previously-started off-thread task.
I think the problem is that FreeHugeSlotsTask relies on its base class's destructor to do the join() that ensures the task has completed, but the base class's destructor runs after that for the FreeHugeSlotsTask's members, including slots_. As a result, destructing the nursery's FreeHugeSlotsTask deletes slots_, freeing its hash table, before actually checking that the FreeHugeSlotsTask is not in the midst of freeing things.
GCRuntime also inherits from GCParallelTask and owns things with interesting destructors. However, since ~JSRuntime -> gc.finish() -> allocTask.cancel(CancelAndWait) -> join(), that case is probably okay.
Err, I meant, "BackgroundAllocTask also inherits"... GCRuntime owns the only instance of BackgroundAllocTask that I know of.
UpdateCellPointersTask also inherits from GCParallelTask, but doesn't have interesting destructors.
When I apply this patch, the failures stop, and the story seems plausible.
Assignee: nobody → jimb
Status: NEW → ASSIGNED
Attachment #8581242 - Flags: review?(jcoppeard)
To sheriffs, I'd like to emphasize that, although the patch from bug 1134039 (changeset 5e45fba743aa, 2015-3-19) made this failure much more common, it is merely the messenger, not the cause. The cause would be bug 1122640.
Group: core-security
Component: JavaScript Engine → JavaScript: GC
Comment on attachment 8581242 [details] [diff] [review]
Ensure that GCParallelTask subclasses properly join at the start of their destructor.

Review of attachment 8581242 [details] [diff] [review]:
-----------------------------------------------------------------

Stealing review so Jim doesn't have to get back out over the weekend.

Good catch on the race. I wonder if it's worth it to make GCParallelTask templated instead, so that it takes a task implementation and constructs it, instead of this fix (which is fine as a fix in the meantime), which requires boilerplate.

::: js/src/gc/Nursery.cpp
@@ +41,5 @@
>  {
>      explicit FreeHugeSlotsTask(FreeOp *fop) : fop_(fop) {}
>      bool init() { return slots_.init(); }
>      void transferSlotsToFree(HugeSlotsSet &slotsToFree);
> +    ~FreeHugeSlotsTask() { join(); }

Should these destructors have the override keyword?

::: js/src/vm/HelperThreads.cpp
@@ +746,5 @@
> +    // destructors run after those for derived classes' members, so a join in a
> +    // base class can't ensure that the task is done using the members. All we
> +    // can do now is check that someone has previously stopped the task.
> +    AutoLockHelperThreadState helperLock;
> +    MOZ_ASSERT(state == NotStarted);

Probably should surround the lock with #ifdef DEBUG to avoid superfluous locking for the sake of an assert.
Attachment #8581242 - Flags: review?(jcoppeard) → review+
Address review comments:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45366e6959e2
Flags: in-testsuite-
OS: Linux → All
Hardware: x86 → All
Target Milestone: --- → mozilla39
Forgot to include these in original patch. The original patch is correct as it stands, these just make it nicer.
Duplicate of this bug: 1140655
Duplicate of this bug: 1139792
(In reply to Jim Blandy :jimb from comment #28)
> I think the problem is that FreeHugeSlotsTask relies on its base class's
> destructor to do the join() that ensures the task has completed, but the
> base class's destructor runs after that for the FreeHugeSlotsTask's members,
> including slots_.

Ouch.  Thanks for fixing this.
Group: core-security
You need to log in before you can comment on or make changes to this bug.