Closed Bug 1407593 Opened 4 years ago Closed 2 years ago
Embedding leaks make it harder to assert shutdown correctness
47 bytes, text/x-phabricator-request
|Details | Review|
Sometimes the embedding still has live gray roots at shutdown because of a leak somewhere in the system. If this happens that we don't free everything and we get assertion failures for assertions that that everything is cleaned up at shutdown. This leads to frequent bug reports (e.g. bug 1407505, bug 1402473). The reason that we don't just clear all gray roots at shutdown like we do for internal roots is that this would cause some leaks to not get reported. However these assertion failures also hide the real issue. Ideally I think we should report leaks that involve JS sooner, and clear the roots on shutdown.
Andrew, I think I've mentioned this to you before. Do you think this is feasible and if so what's the best way to proceed? I don't really know how our leak checking works at the moment.
Here's how XPCOM leak checking works: - Classes opt into it by calling MOZ_COUNT_CTOR in their ctor and MOZ_COUNT_DTOR in their dtor, with a class name. (There's a slightly different mechanism for refcounted objects, but it doesn't matter for the present discussion.) nsTraceRefcnt.cpp keeps a table of the number of extant instances of each class. - Very late in shutdown, nsTraceRefcnt::Shutdown() dumps out the table of various classes it saw, how many it saw, and how many remain. There is one file per process, with the pid in the file name. - testing/mochitest/leaks.py is a script run by the test harness that checks to make sure there's a leak log for every process that we created, and it parses the table and prints out an error for every class where we had more than one instance remaining at shutdown. In theory, I think your proposal is reasonable. You'd have to make sure that the test harness registers it as an error if there's a leak, and you'd want it to be non-fatal, so we can do the rest of the checking we do at shut down. There's some loss from not giving information about what exactly leaked, but I think that in practice if you are leaking JS you're going to get a completely useless list of leaking objects anyways. In practice, you may need to debug a wide variety of "leaks" in your shutdown detector that we don't see with the XPCOM leak detector. The problem I am imagining is that some random place roots JS objects, but clears them out at some point in shutdown after when you want to add your check. But maybe it will be okay. Some of these leaks may be actual issues, so the work spent on debugging them won't be wasted, but some will likely be silly issues that you just have to fix to make your leak detector work.
Are there any next steps here? I am happy to disable the talos job on code coverage builds that is perma/failing to reduce the pressure to fix this bug.
(In reply to Andrew McCreight [:mccr8] from comment #2) Andrew, thanks for the comments, that's really useful. Joel, I sill want us to fix this. But do disable the talos job if it's causing a lot of noise.
The bug 1408403 is still failing even after Joel disabled g2 job on ccov builds. :jonco Any idea when this bug is going to be fixed?
:jonco any updates on this?
(In reply to Andrei Ciure[:andrei_ciure_] from comment #6) I'll work on this next.
Assignee: nobody → jcoppeard
Here's what I have so far: XPCOM logs any roots left at shutdown in a similar way to the existing leak tracing and then clears them. I extended the existing leak checking scripts to process these logs too, again in a very similar way. The JS engine now asserts that there are no roots left at shutdown. To test this I wrote a patch to force a leak of a JSString (via a Value in a ModuleScript), which produced the following output: TEST-INFO | Main app process: exit 0 runtests.py | Application ran for: 0:00:43.195848 zombiecheck | Reading PID log: /tmp/tmphtVpxxpidlog ==> process 24368 launched child process 24396 zombiecheck | Checking for orphan process with PID: 24396 Stopping web server Stopping web socket server Stopping ssltunnel TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes TEST-INFO | leakcheck | tab process: leak threshold set at 0 bytes TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 24368 |<----------------Class--------------->|<-----Bytes------>|<----Objects---->| | | Per-Inst Leaked| Total Rem| 0 |TOTAL | 36 56| 1014320 1| 398 |ModuleScript | 56 56| 1 1| nsTraceRefcnt::DumpStatistics: 1394 entries TEST-INFO | leakcheck | default process: leaked 1 ModuleScript 19 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: 56 bytes leaked (ModuleScript) 20 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: found leak of 1 JS GC things String: 1 runtests.py | Running tests: end. Buffered messages finished SUITE-END | took 44s Try is looking good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c6d68bef5640d2ffe5be0ab8cd833ce4023529d&group_state=expanded How does this look to you?
Attachment #8930152 - Flags: feedback?(continuation)
Comment on attachment 8930152 [details] [diff] [review] bug1407593-check-js-leaks Review of attachment 8930152 [details] [diff] [review]: ----------------------------------------------------------------- This seems reasonable. Is there some reason to create a separate log rather than reporting the leaked objects to the leak checker through the normal means? Your loop could call MOZ_LOG_CTOR repeatedly instead. Adding an extra file seems like a lot of complexity. ::: dom/workers/RuntimeService.cpp @@ +1012,5 @@ > // The CC is shut down, and the superclass destructor will GC, so make sure > // we don't try to CC again. > mWorkerPrivate = nullptr; > + > + CycleCollectedJSRuntime::Shutdown(cx); Bug 1422316 is also filed on this missing shutdown. I don't know why it is like it is. ::: xpcom/base/nsTraceRefcnt.cpp @@ +1373,5 @@ > + aRuntime->TraceNativeGrayRoots(&tracer); > + > + AutoTraceLogLock lock; > + > + fprintf(gJSLeakLog, "Runtime %p leaked %" PRIu64 " JS GC things\n", It would be nice if this included a descriptive string, too. EG worker or main thread. I don't think we ever log the address of a runtime, so this will only be useful for somebody with a debugger.
Attachment #8930152 - Flags: feedback?(continuation) → feedback+
Attachment #9082666 - Attachment description: Bug 1407593 - Report embedding leaks of JS GC things r=mccr8? → Bug 1407593 - Report embedding leaks of JS GC things r=mccr8,sfink?
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/f2d3df5525bd Report embedding leaks of JS GC things r=sfink,sfink?
Attachment #8930152 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.