Closed Bug 1139456 Opened 5 years ago Closed 5 years ago

Intermittent bug-1138390.js:20:3 Error: Assertion failed: got "mark", expected "none" (code 3, args "")

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 8
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox37 --- unaffected
firefox38 --- fixed
firefox39 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: cbook, Assigned: sfink)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

WINNT 5.2 mozilla-central leak test spidermonkey_tier_1-generational build

https://treeherder.mozilla.org/logviewer.html#?job_id=1107457&repo=mozilla-central

TEST-UNEXPECTED-FAIL | js\src\jit-test\tests\gc\bug-1138390.js | c:\builds\moz2_slave\m-cen_w32-d_sm-ggc-00000000000\src\js\src\jit-test\tests\gc\bug-1138390.js:20:3 Error: Assertion failed: got "mark", expected "none" (code 3, args "")
For what it's worth, I can reproduce the issue with 100% rate when running with --tbpl and JS_GC_ZEAL=14. Hope that can help?
Flags: needinfo?(jcoppeard)
Please take a look at this one.

The test has code like:

    startgc(0);
    var g = newGlobal();
    assertEq("mark", gcstate());

newGlobal() can finish the GC, maybe? In that case, it wouldn't be hard to fix.

In any case, please reverse the order of those assertions as it's assertEq(actual, expected).
Flags: needinfo?(terrence)
I was not able to reproduce on linux, which is odd of this is purely a scheduling issue.
Flags: needinfo?(terrence)
Even with the --tbpl option and JS_GC_ZEAL=14, as specified in comment 53?
Flags: needinfo?(jcoppeard) → needinfo?(terrence)
(i can still reproduce with --tbpl JS_GC_ZEAL=14 on linux32, but I need to repeat a few times before it shows up now)
I can reproduce this locally.
Flags: needinfo?(terrence)
Yes, the problem is in the JS_GC_ZEAL=14. That makes us do compacting GCs, which have the side effect of setting the zone's malloc trigger threshold to current size * 1.5, which is a pretty small number. Here, it drops us from a trigger of 47185920 (47MB) down to a trigger of only 245760 (245KB). During newGlobal(), we hit that trigger and do a full GC, which throws us out of the incremental mark state, and therefore breaks the test.
This will fix the immediate problem. Remaining issues:

(1) We can still go bi-stable in a realistic scenario if we start compacting during idle times or something. That seems problematic.

(2) The threshold setup here seems wrong to me. Going to 150% of the previous retained size doesn't seem to be to be exceptional enough to jank by forcing the current incremental GC to finish. If you do an idle compaction, then start doing something that requires growing a bunch, we should be willing to let iGC take care of it until we really start pushing on available memory.

But those are separate issues, and #2 doesn't scare me all that much because at least the nursery will cover the case where we're merely allocating at a fast rate, not increasing our retained size. I was going to file bugs on these, but I kind of feel like there's part of a more general tuning effort that we should start on once the allocation logic is adequately disentangled.
Attachment #8576878 - Flags: review?(terrence)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Comment on attachment 8576878 [details] [diff] [review]
Do not let compaction set the alloc threshold to unreasonably small sizes

Review of attachment 8576878 [details] [diff] [review]:
-----------------------------------------------------------------

Agreed in full.
Attachment #8576878 - Flags: review?(terrence) → review+
(In reply to Steve Fink [:sfink, :s:] from comment #114)
Thanks for picking this up!
Comment on attachment 8576878 [details] [diff] [review]
Do not let compaction set the alloc threshold to unreasonably small sizes

Review of attachment 8576878 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +1734,5 @@
>                                             JSGCInvocationKind gckind,
>                                             const GCSchedulingTunables &tunables)
>  {
>      size_t base = gckind == GC_SHRINK
> +                ? Max(lastBytes, tunables.minEmptyChunkCount() * ChunkSize)

Note it's possible minEmptyChunkCount might be set to zero, although it looks like it's set to one in current configurations.
https://hg.mozilla.org/mozilla-central/rev/f7be20a39a8a
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Bug 1138390 is going to be landing on Aurora, so please nominate this as well.
Flags: needinfo?(sphink)
Comment on attachment 8576878 [details] [diff] [review]
Do not let compaction set the alloc threshold to unreasonably small sizes

Approval Request Comment
[Feature/regressing bug #]: Bug 1138390 I guess?
[User impact if declined]: intermittent orange and annoyed sheriffs
[Describe test coverage new/current, TreeHerder]: on inbound for a few days
[Risks and why]: this only affects compacting GCs, and can only delay them a bit. Low risk.
[String/UUID change made/needed]: none
Flags: needinfo?(sphink)
Attachment #8576878 - Flags: approval-mozilla-aurora?
Attachment #8576878 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.