Closed Bug 1407593 Opened 2 years ago Closed 1 year ago

Embedding leaks make it harder to assert shutdown correctness

Categories

(Core :: JavaScript: GC, enhancement, P1)

55 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(1 file, 1 obsolete file)

Sometimes the embedding still has live gray roots at shutdown because of a leak somewhere in the system.  If this happens that we don't free everything and we get assertion failures for assertions that that everything is cleaned up at shutdown.  

This leads to frequent bug reports (e.g. bug 1407505, bug 1402473).

The reason that we don't just clear all gray roots at shutdown like we do for internal roots is that this would cause some leaks to not get reported.  However these assertion failures also hide the real issue.

Ideally I think we should report leaks that involve JS sooner, and clear the roots on shutdown.
Priority: -- → P3
Andrew, I think I've mentioned this to you before.  Do you think this is feasible and if so what's the best way to proceed?  I don't really know how our leak checking works at the moment.
Flags: needinfo?(continuation)
Blocks: 1369712
Here's how XPCOM leak checking works:
- Classes opt into it by calling MOZ_COUNT_CTOR in their ctor and MOZ_COUNT_DTOR in their dtor, with a class name. (There's a slightly different mechanism for refcounted objects, but it doesn't matter for the present discussion.) nsTraceRefcnt.cpp keeps a table of the number of extant instances of each class.
- Very late in shutdown, nsTraceRefcnt::Shutdown() dumps out the table of various classes it saw, how many it saw, and how many remain. There is one file per process, with the pid in the file name.
- testing/mochitest/leaks.py is a script run by the test harness that checks to make sure there's a leak log for every process that we created, and it parses the table and prints out an error for every class where we had more than one instance remaining at shutdown.

In theory, I think your proposal is reasonable. You'd have to make sure that the test harness registers it as an error if there's a leak, and you'd want it to be non-fatal, so we can do the rest of the checking we do at shut down. There's some loss from not giving information about what exactly leaked, but I think that in practice if you are leaking JS you're going to get a completely useless list of leaking objects anyways.

In practice, you may need to debug a wide variety of "leaks" in your shutdown detector that we don't see with the XPCOM leak detector. The problem I am imagining is that some random place roots JS objects, but clears them out at some point in shutdown after when you want to add your check. But maybe it will be okay. Some of these leaks may be actual issues, so the work spent on debugging them won't be wasted, but some will likely be silly issues that you just have to fix to make your leak detector work.
Flags: needinfo?(continuation)
Are there any next steps here?  I am happy to disable the talos job on code coverage builds that is perma/failing to reduce the pressure to fix this bug.
Flags: needinfo?(jcoppeard)
(In reply to Andrew McCreight [:mccr8] from comment #2)
Andrew, thanks for the comments, that's really useful.

Joel, I sill want us to fix this.  But do disable the talos job if it's causing a lot of noise.
Flags: needinfo?(jcoppeard)
The bug 1408403 is still failing even after Joel disabled g2 job on ccov builds.
:jonco Any idea when this bug is going to be fixed?
Flags: needinfo?(jcoppeard)
:jonco any updates on this?
(In reply to Andrei Ciure[:andrei_ciure_] from comment #6)
I'll work on this next.
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Attached patch bug1407593-check-js-leaks (obsolete) — Splinter Review
Here's what I have so far:

XPCOM logs any roots left at shutdown in a similar way to the existing leak tracing and then clears them.  I extended the existing leak checking scripts to process these logs too, again in a very similar way.

The JS engine now asserts that there are no roots left at shutdown.

To test this I wrote a patch to force a leak of a JSString (via a Value in a ModuleScript), which produced the following output:

    TEST-INFO | Main app process: exit 0
    runtests.py | Application ran for: 0:00:43.195848
    zombiecheck | Reading PID log: /tmp/tmphtVpxxpidlog
    ==> process 24368 launched child process 24396
    zombiecheck | Checking for orphan process with PID: 24396
    Stopping web server
    Stopping web socket server
    Stopping ssltunnel
    TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes
    TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes
    TEST-INFO | leakcheck | tab process: leak threshold set at 0 bytes
    TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes
    TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes

    == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 24368

         |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
         |                                      | Per-Inst   Leaked|   Total      Rem|
       0 |TOTAL                                 |       36       56| 1014320        1|
     398 |ModuleScript                          |       56       56|       1        1|

    nsTraceRefcnt::DumpStatistics: 1394 entries
    TEST-INFO | leakcheck | default process: leaked 1 ModuleScript
    19 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: 56 bytes leaked (ModuleScript)
    20 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: found leak of 1 JS GC things
      String: 1
    runtests.py | Running tests: end.
    Buffered messages finished
    SUITE-END | took 44s

Try is looking good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c6d68bef5640d2ffe5be0ab8cd833ce4023529d&group_state=expanded

How does this look to you?
Attachment #8930152 - Flags: feedback?(continuation)
Comment on attachment 8930152 [details] [diff] [review]
bug1407593-check-js-leaks

Review of attachment 8930152 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable. Is there some reason to create a separate log rather than reporting the leaked objects to the leak checker through the normal means? Your loop could call MOZ_LOG_CTOR repeatedly instead. Adding an extra file seems like a lot of complexity.

::: dom/workers/RuntimeService.cpp
@@ +1012,5 @@
>      // The CC is shut down, and the superclass destructor will GC, so make sure
>      // we don't try to CC again.
>      mWorkerPrivate = nullptr;
> +
> +    CycleCollectedJSRuntime::Shutdown(cx);

Bug 1422316 is also filed on this missing shutdown. I don't know why it is like it is.

::: xpcom/base/nsTraceRefcnt.cpp
@@ +1373,5 @@
> +  aRuntime->TraceNativeGrayRoots(&tracer);
> +
> +  AutoTraceLogLock lock;
> +
> +  fprintf(gJSLeakLog, "Runtime %p leaked %" PRIu64 " JS GC things\n",

It would be nice if this included a descriptive string, too. EG worker or main thread. I don't think we ever log the address of a runtime, so this will only be useful for somebody with a debugger.
Attachment #8930152 - Flags: feedback?(continuation) → feedback+

Picking this up again as this continues to be a problem.

Using MOZ_LOG_CTOR makes things much easier, thanks.

What we want to happen is that GC things that are live at shutdown are reported but are finalized anyway. It might be nice to also clear embedding pointers to GC things at shutdown because otherwise these are left dangling pointing to freed memory.

My current issue is that finalizing preserved wrappers causes an assertion in the wrapper cache destructor.

Depends on: 1570666

Patch to report JS GC things that are live at shutdown as leaks by calling MOZ_LOG_CTOR for them. The JS engine now asserts there are no roots, and the CC clears any remaining JS holders (which will already have been reported). I used Unlink() for this because otherwise the wrapper cache asserts if we try and clear the contents of a preserved wrapper from a leaked GC thing's finalizer.

The only fly in the ointment is the test_ext_native_messaging.js xpcshell test, because this patch causes it to fail with an assertion here:

https://searchfox.org/mozilla-central/source/dom/events/EventListenerManager.cpp#138

I'm assuming this is connected to the fact that it leaks (bug 1570666).

Blocks: 1569655
Status: NEW → ASSIGNED
Priority: P3 → P1
Attachment #9082666 - Attachment description: Bug 1407593 - Report embedding leaks of JS GC things r=mccr8? → Bug 1407593 - Report embedding leaks of JS GC things r=mccr8,sfink?
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2d3df5525bd
Report embedding leaks of JS GC things r=sfink,sfink?

Lando seems to have messed up the commit messaged. Filed bug 1573398.

Attachment #8930152 - Attachment is obsolete: true

\o/

(In reply to Jon Coppeard (:jonco) from comment #13)

Lando seems to have messed up the commit messaged. Filed bug 1573398.

I think the trailing ? in r=foo? confuses it. I've noticed it also doesn't remove it when there's a single reviewer.

Blocks: 1573503
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Duplicate of this bug: 1569655
Regressions: 1574294
Regressions: 1573419
You need to log in before you can comment on or make changes to this bug.