Closed Bug 1137780 Opened 5 years ago Closed 5 years ago

CRASH: browser/devtools/performance/test/browser_perf-overview-render-03.js | application crashed [@ js::jit::JitcodeGlobalTable::releaseEntry(void*, JSRuntime*)]

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect)

37 Branch
x86
macOS
defect
Not set

Tracking

(firefox39 fixed)

RESOLVED FIXED
Firefox 39
Tracking Status
firefox39 --- fixed

People

(Reporter: jsantell, Assigned: shu)

References

Details

Attachments

(2 files, 3 obsolete files)

Assignee: nobody → kvijayan
I can't repro this locally, but the crashlogs:

    https://treeherder.mozilla.org/logviewer.html#?job_id=48822&repo=gum

    20:23:14 INFO - 0 XUL!js::jit::JitcodeGlobalTable::releaseEntry(void*, JSRuntime*)
    [JitcodeMap.cpp:2858249e736f : 528 + 0x0] 

Point to an assert:
    void
    JitcodeGlobalTable::releaseEntry(void *startAddr, JSRuntime *rt)
    {
        mozilla::DebugOnly<JitcodeGlobalEntry *> entry = lookupInternal(startAddr);
        mozilla::DebugOnly<uint32_t> gen = rt->profilerSampleBufferGen();
        mozilla::DebugOnly<uint32_t> lapCount = rt->profilerSampleBufferLapCount();
        MOZ_ASSERT(entry);
  >>>>  MOZ_ASSERT_IF(gen != UINT32_MAX, !entry->isSampled(gen, lapCount));
        removeEntry(startAddr, rt);
    }

I'm thinking this assert might be too strict.  Basically, it's ensuring that an entry doesn't get removed from the JitcodeGlobalTable when it should have been kept alive due to it being potentially referenced from the circular buffer (i.e. when its generation >= (profilerGeneration - lapCount))

This assert is getting triggered, but I can't repro it locally, and it seems like there is a special corner case where it could trigger badly.
To recap, current hypothesis is that lookupForSampler needs a read barrier on any found entries' JitCode and stored TypeGroups, as the sampler could've sampled frames between incremental slices.

The simplest fix is probably to mark during the beginning of the sweep phase, instead of the marking phase. This effectively makes JitcodeGlobalTable marking non-incremental.
So there were 2 bugs here.

I suspect the crash is mostly due to this first reason. I don't know what
TraceRuntime is, but igc calls markRuntime with MarkRuntime. As a result,
JitcodeGlobalTable wasn't even getting marked half the time! I found this out
by logging the gcNumber during JitcodeGlobalTable mapping and noticing gaps.
What is TraceRuntime intended for, and why do we only mark the JitRuntime under
TraceRuntime?

Second, I still think this needs read barriers as we discussed on Vidyo. So I
implemented them, but it is some of the scariest code I've written. Could some
careful looking through, Terrence.
Attachment #8570855 - Flags: review?(terrence)
Assignee: kvijayan → shu
Status: NEW → ASSIGNED
Previous approach didn't work on OS X and Windows due to the sampler having its
own thread.
Attachment #8570959 - Flags: review?(terrence)
Attachment #8570855 - Attachment is obsolete: true
Attachment #8570855 - Flags: review?(terrence)
(In reply to Shu-yu Guo [:shu] from comment #3)
> Created attachment 8570855 [details] [diff] [review]
> Fix JitcodeGlobalTable marking and add read barriers
> 
> So there were 2 bugs here.
> 
> I suspect the crash is mostly due to this first reason. I don't know what
> TraceRuntime is, but igc calls markRuntime with MarkRuntime. As a result,
> JitcodeGlobalTable wasn't even getting marked half the time! I found this out
> by logging the gcNumber during JitcodeGlobalTable mapping and noticing gaps.
> What is TraceRuntime intended for, and why do we only mark the JitRuntime
> under
> TraceRuntime?

The actual condition is:
        if (traceOrMark == TraceRuntime || rt->atomsCompartment()->zone()->isCollecting()) {

So you were only getting called during minor collections done on behalf of a non-incremental or incremental GC containing the atoms zone.

The difference between Tracing and Marking is that tracing calls a callback for every edge and mark sets the mark bit. |markRuntime| in particular is a dumping ground for special cases and weird semantics. In this particular "special" case, JitRuntime::Mark seems to be a complete misnomer: the actual function appears to directly iterate the heap cells (red flag 1) of the atoms zone (red flag 2) and trace all JitCode (red flag 3). What in the world is JitCode doing in the atoms zone? No idea! I'd *guess* someone dumped it there when they didn't have a zone handy and we happen to not mark it because it's held live elsewhere but needs to be traced because.... This code is almost certainly either wrong or so poorly designed that it's equivalent to wrong.

> Second, I still think this needs read barriers as we discussed on Vidyo. So I
> implemented them, but it is some of the scariest code I've written. Could
> some
> careful looking through, Terrence.

Looks like there's still at least one bug, I'll see if I can spot it for you.

(In reply to Shu-yu Guo [:shu] from comment #2)
> To recap, current hypothesis is that lookupForSampler needs a read barrier
> on any found entries' JitCode and stored TypeGroups, as the sampler could've
> sampled frames between incremental slices.
> 
> The simplest fix is probably to mark during the beginning of the sweep
> phase, instead of the marking phase. This effectively makes
> JitcodeGlobalTable marking non-incremental.

Anything that's getting marked by markRuntime is going to be non-incremental; is there more marking than what happens in JitRuntime::MarkJitRuntimeGlobalTable?
Comment on attachment 8570959 [details] [diff] [review]
Fix JitcodeGlobalTable marking and add read barriers.

Review of attachment 8570959 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/JitcodeMap.cpp
@@ +420,5 @@
> +    // The entry holds its pointers weakly. They are marked only if they are
> +    // sampled. Like other weak pointers, a read barrier is required in case
> +    // an entry that was not marked in the root marking phase is sampled.
> +    Zone *zone = entry->baseEntry().zoneFromAnyThread();
> +    if (zone->needsIncrementalBarrier() && !rt->isHeapBusy()) {

Typically, readBarrier does this check.

@@ +421,5 @@
> +    // sampled. Like other weak pointers, a read barrier is required in case
> +    // an entry that was not marked in the root marking phase is sampled.
> +    Zone *zone = entry->baseEntry().zoneFromAnyThread();
> +    if (zone->needsIncrementalBarrier() && !rt->isHeapBusy()) {
> +        AutoSaveBarrierTracerDetails autoDetails(zone);

No, no, no. If you need this, then you are already hosed. The tracing details in this tracer are only valid while marking some thing. i.e. someone else's mark word and mask may be on stack, so if you update the mark bitmap here there's the possibility that your work will get overwritten when you resume.

::: js/src/vm/Stack.cpp
@@ +1744,5 @@
> +        TlsPerThreadData.set(&rt->mainThread);
> +        savedPrevOwnerThread_ = rt->ownerThread_;
> +        rt->ownerThread_ = currentThread;
> +    }
> +#endif

You shouldn't need any of this. Currently JitCode::readBarrier is calling TenuredCell::readBarrier, i.e. the fully generic form. If you implement JitCode::readBarrier, you should be able to specialize it to only do the things you need and to do them in a manner that doesn't need TLS.

Moreover, it looks like you aren't even using ReadBarriered, just doing the dispatch yourself, so you could even pass along all the relevant structures you already have access to.
Attachment #8570959 - Flags: review?(terrence) → review-
The enum will be used in sweeping.
Attachment #8570959 - Attachment is obsolete: true
Restructures how JitcodeGlobalTable is marked.

1) Marking of entries is done at the start of the SWEEP_MARK phase, to avoid
needing read barriers.

2) Since stored TypeSet::Type and a possibly associated JSFunction can be moved
by compacting GC, the table is now also swept. This means that for all JitCode
that are IsAboutToBeFinalized, their hasBytecodeMap_ can be true despite not
having any entry in the JitcodeGlobalTable. I'm okay with this, since I imagine
touching the contents of IsAboutToBeFinalized things is user error.
Attachment #8571740 - Flags: review?(kvijayan)
Attachment #8571741 - Flags: review?(terrence)
Comment on attachment 8571741 [details] [diff] [review]
Fix marking JitcodeGlobalTable.

Review of attachment 8571741 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Marking.cpp
@@ +936,5 @@
> +    } else {
> +        isAboutToBeFinalized = false;
> +    }
> +    return isAboutToBeFinalized;
> +}

These definitions are not really performance sensitive; they should be with TypeInference and use the external marking interfaces.

::: js/src/jsgc.cpp
@@ +4282,5 @@
>  
>      gc->incrementalState = SWEEP;
>      {
>          gcstats::AutoPhase ap1(gc->stats, gcstats::PHASE_SWEEP);
> +        gc->markJitcodeGlobalTable();

The phase nesting is incorrect here: you need to add another auto phase around this specific call that enters PHASE_SWEEP_MARK.
Attachment #8571741 - Flags: review?(terrence) → review+
Attachment #8572543 - Flags: review?(kvijayan)
Attachment #8572543 - Flags: review?(kvijayan) → review+
https://hg.mozilla.org/mozilla-central/rev/1e95c3785aa8
https://hg.mozilla.org/mozilla-central/rev/33e37e4feb3f
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 39
Depends on: 1140643
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.