Open Bug 988872 Opened 10 years ago Updated 2 years ago

DOMWorker threads are frequently not ended by the end of xpcom-shutdown-threads

Categories

(Core :: DOM: Workers, defect, P3)

defect

Tracking

()

People

(Reporter: jesup, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

With the patch from bug 988464, we frequently see 1-3 DOMWorker threads still active after xpcom-shutdown-threads (which is after xpcom-shutdown).  One run with an earlier version of the patch:

https://tbpl.mozilla.org/?tree=Try&rev=8ac1c136a95f

See https://wiki.mozilla.org/XPCOM_Shutdown 

xpcom-shutdown-threads: "Any non-main-thread event queues should be finished by the module which owns them. All threads running XPCOM code should be joined. Any special services that might be required for component loader shutdown should be obtained and cached by the end of this notification. Initial garbage collection should be performed at this stage."
I'm kinda skeptical... We spin the event loop waiting for all worker threads in response to the 'xpcom-shutdown-threads' notification already:

https://mxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2292

In fact it's a frequent intermittent orange to hang on shutdown here.
Ah, it looks like we were letting the thread manager take care of idle threads if we don't have any live workers.
A quick glance at ::Shutdown() and ::ShutdownIdleThreads() implies to me that this behavior hasn't changed, and so we can expect this may still be an issue once we land NS_ASSERTIONs on nsThread leakage.  Kyle, is there someone available to deal with this, or was it fixed and just wasn't obvious to a trivial once-over?
Flags: needinfo?(khuey)
Doesn't http://hg.mozilla.org/mozilla-central/annotate/d590b9601ba8/dom/workers/RuntimeService.cpp#l2107 take care of this?  Are we ending up in a situation where we have no live workers but do have idle threads, so we don't execute that code?
Flags: needinfo?(khuey)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> Doesn't
> http://hg.mozilla.org/mozilla-central/annotate/d590b9601ba8/dom/workers/
> RuntimeService.cpp#l2107 take care of this?  Are we ending up in a situation
> where we have no live workers but do have idle threads, so we don't execute
> that code?

Well, that appears unchanged (more-or-less) from when bent wrote comment 2, so my bet would be that yes, we have no live workers, but do/did have idle ones (at least when the phase of the moon is right).  We'll see if this repeats with current code.  I'll likely throw a bunch of retriggers at some of the tests to probe for new intermittents before landing.
Adjusting the condition to enter that block seems like a no-brainer.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> Adjusting the condition to enter that block seems like a no-brainer.

Yeah, though I'm not sure what bent's comment really meant....  I don't see that the test to enter that block is based on active workers, just top-level workers.  However, I'll also note that the 

    mDomainMap.EnumerateRead(AddAllTopLevelWorkersToArray, &workers);
    if (!workers.IsEmpty()) {

appears to do little useful - workers is never used again, and all we do in there is clear out mIdleThreadArray & shut-them-down, and spin the event loop waiting for mDomainMap to be empty (so let's hope they're all in mIdleThreadArray).

Not being sure of how the top-level-worker part is used, it seems like it could skip this if there were no top-level workers but there *are* idle workers... if that's really possible, then yes we should adjust it.  However, the "spin until mDomainMap is empty" implies that all workers *must* be in the idle array, or that if somehow they're not, shutting down the ones in the idle array will clear them out (or maybe that they've been told to shutdown before Cleanup() was called, and we just need to spin until they're dead).  I note that on xpcom-shutdown if calls Shutdown(), which does Kill() on all top-level workers - however I don't know what Kill() on a worker does exactly (move to idle?  cause it to die and send a "remove-me-from-worker-lists" event to main?)

I suppose we could also add an assertion that mDomainMap is empty after this part of Cleanup, or an assertion that mIdleThreadArray is empty after this (or both).  Cleanup() is called from xpcom-thread-shutdown.
FYI, adjusting the condition to enter the block had no affect (which matches the analysis in comment 7).  
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7de1f0c0c1f4
DOM Worker threads are leaking in bunches of xpcshell tests.  They didn't leak when I ran those specific tests locally in xpcshell-test

Kyle, any other ideas?
Flags: needinfo?(khuey)
Other than "debug it and see what's going on"?  no.
Flags: needinfo?(khuey)
Blocks: 988464
No longer depends on: 988464
I tried a local xpcshell testrun with the if (!workers.IsEmpty()) short-circuited (to "if (1)"), and didn't see any leaks in xpcshell tests (though I prefer to use a better solution).  (line 1986 now)

Running a Try.
DOM Workers still leak in almost every mochitest - 1 to 8 of them depending on timing.
I suspect there are DOM Worker threads that aren't idle already at shutdown (and shutdown via mIdleThreadArray being Shutdown()), and while they're probably sent a Kill(), they may not end up calling NoteIdleThread() and calling Shutdown() before the world ends, and they're still alive at threadmanager-shutdown time.  

Probably it should use a ShutdownBlocker to allow all the DOM Worker threads to be cleaned up properly; once they've all been ShutDown(), we can release the blocker.
Priority: -- → P3
Depends on: 1434618
See Also: → 705178
Depends on: 1435343

"PERF" key word?

Depends on: 1633342
No longer depends on: 1633342
Depends on: 1633342
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.