Closed Bug 1272604 Opened 4 years ago Closed 4 years ago

Add a zeal mode to check the heap after a moving GC

Categories

(Core :: JavaScript: GC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(1 file)

No description provided.
Summary: Check → Add a zeal mode to check the heap after a moving GC
As I think Terrence mentioned at one of our GC meetings, it would be useful to have the option to walk the entire heap after a moving GC to check that there are no stale pointers anywhere.  This would help when making changes to the GC or when debugging fuzz bugs involving missing barriers.

This patch adds a zeal mode to do this.  Stale pointers are printed along with the path the tracer took to get there, which should give some context for debugging.
Assignee: nobody → jcoppeard
Attachment #8752193 - Flags: review?(terrence)
Comment on attachment 8752193 [details] [diff] [review]
bug1272604-check-heap-after-moving-gc

Review of attachment 8752193 [details] [diff] [review]:
-----------------------------------------------------------------

This is awesome! I can't wait until the fuzzers start using this.

::: js/src/gc/Marking.cpp
@@ +181,5 @@
>      MOZ_ASSERT(trc);
>      MOZ_ASSERT(thing);
>  
> +    if (!trc->checkEdges())
> +        return;

Oh! Clever! This is where I got tripped up last time I tried to write a heap corruption detector.

::: js/src/jsgc.cpp
@@ +7582,5 @@
> +        for (int index = parentIndex; index != -1; index = stack[index].parentIndex) {
> +            const WorkItem& parent = stack[index];
> +            cell = parent.thing.asCell();
> +            fprintf(stderr, "  from %s %p %s edge\n",
> +                   GCTraceKindToAscii(cell->getTraceKind()), cell, name);

Alignment is one space off here.

@@ +7629,5 @@
> +    MOZ_ASSERT(rt->isHeapCollecting());
> +    CheckHeapTracer tracer(rt);
> +    if (!tracer.init() || !tracer.check())
> +        fprintf(stderr, "OOM checking heap\n");
> +}

Would it work to put all of this new code in gc/Verifier.cpp instead?
Attachment #8752193 - Flags: review?(terrence) → review+
(In reply to Pulsebot from comment #5)

The patch resulted in a hazard analysis failure because the analysis thinks that markRuntime can GC:

GC Function: _ZN2js2gc22CheckHeapAfterMovingGCEP9JSRuntime$void js::gc::CheckHeapAfterMovingGC(JSRuntime*)
    uint8 CheckHeapTracer::check()
    void js::gc::GCRuntime::markRuntime(JSTracer*, uint32)
    FieldCall: struct js::gc::Callback<void (*)(JSTracer*, void*)>.op

I fixed this by adding a suppression to this path.

I can see why it thinks this, but I don't understand why this would be a problem here and not in the other places where this is also called as part of a GC.
Flags: needinfo?(sphink)
Depends on: 1273494
(In reply to Jon Coppeard (:jonco) from comment #6)
> (In reply to Pulsebot from comment #5)
> 
> The patch resulted in a hazard analysis failure because the analysis thinks
> that markRuntime can GC:
> 
> GC Function: _ZN2js2gc22CheckHeapAfterMovingGCEP9JSRuntime$void
> js::gc::CheckHeapAfterMovingGC(JSRuntime*)
>     uint8 CheckHeapTracer::check()
>     void js::gc::GCRuntime::markRuntime(JSTracer*, uint32)
>     FieldCall: struct js::gc::Callback<void (*)(JSTracer*, void*)>.op
> 
> I fixed this by adding a suppression to this path.
> 
> I can see why it thinks this, but I don't understand why this would be a
> problem here and not in the other places where this is also called as part
> of a GC.

Hm, good question. The analysis before your changes *does* consider markRuntime to be a GC call. It just doesn't lead to any hazards.

Callers who get to markRuntime through one of the recognized GC entry points (collect() and minorGCImpl()) don't matter, because... well, because those GC and you'd better not have anything rooted across them.

Callers who get to markRuntime through other paths... let's see, there are 6 callers. These are not within the GC:

  void js::TraceRuntime(JSTracer*)
  void js::gc::GCRuntime::startVerifyPreBarriers()
  void js::gc::MarkingValidator::nonIncrementalMark()
  void js::gc::GCRuntime::updatePointersToRelocatedCells(JS::Zone*)

Glancing at these, my guess is that we're simply not holding anything live across these calls.

From your hazard, it does look like you're adding something unusual. You insert a CheckHeapAfterMovingGC into the nursery collection, and the analysis notices that the TenureCountCache is held live across it. I suspect most callers of markRuntime aren't going to be holding any specific pointers live (except for the true GC cases, and there they'll root them.)

So I think your fix is about right, though I'd probably put it at the very top of js::Nursery::collect to screen out the analysis. Nothing reported under there would be useful. Hm... I guess that logic should be in the analysis itself, since it knows what the entry points are.

I'll file a bug: bug 1273674. Except in writing that, I think I may disagree with myself. There *is* some use to reporting hazards contained entirely underneath a GC entry point, because it's a dangerous thing to be doing if you don't realize you're doing it. Which would suggest more narrowly applying AutoSuppressGCAnalysis, as you've done here. The question is whether it'd be better to annotate markRuntime instead of your function.

I'll leave it on the list of things to ponder for now. I'd be happy to know your opinion.
Flags: needinfo?(sphink)
Duplicate of this bug: 995648
Depends on: 1298842
Depends on: 1307361
Depends on: 1337414
You need to log in before you can comment on or make changes to this bug.