Open Bug 928222 Opened 7 years ago Updated 3 years ago

remove nsThreadPool per-thread event queues

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set

Tracking

()

REOPENED
mozilla27

People

(Reporter: karlt, Assigned: karlt)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

nsThreadPool doesn't need to use nsIThread for its threads.  The nsIThread
interface is not useful for threads running in an nsThreadPool.
The nsIEventTarget on the nsIThreadPool should be used for dispatching events,
not an interface on the individual threads, and the threads don't need an
nsEventQueue because they use the nsEventQueue on the nsThreadPool.

Shutdown of single event threads is easier than running nested event loops for
nsIThreads, avoiding the multilevel nested event loop situations when several
threads finish and are shutdown.
While the ThreadFunc is running, a nested event loop is still used in Shutdown() in case some consumers might need it and because that is the documented API.

This also simplifies thread creation, avoiding races that could mean there was
temporarily one or more extra threads.
Attachment #818830 - Flags: review?(benjamin)
Comment on attachment 818830 [details] [diff] [review]
part 2: remove nsThreadPool per-thread event queues

I'm going to have to delay reviewing this until Monday, because its multi-threaded code and I can't seem to get my head in the game today :-(
With changes in patch order from bug 926522, this needs a little merging, if anyone wants to apply.  The merged version, applying on top of parts 1 and 2 in bug 926522, is at https://hg.mozilla.org/try/rev/2efd29e87300

On the B2G emulator build, nullptr is '0' I assume, which means
mThreads.IndexOf(nullptr) gives
nsTArray.h:544: error: ISO C++ forbids comparison between pointer and integer

and mThreads.AppendElement(nullptr) gives
nsTArray.h:530: error: invalid conversion from 'int' to 'PRThread*'

I guess replacing nullptr with (nsThreadPool*)0 might be the
simplest option.

https://tbpl.mozilla.org/?tree=Try&rev=3be0734b8a83
Comment on attachment 818830 [details] [diff] [review]
part 2: remove nsThreadPool per-thread event queues

Things to worry about here:

* Does existing code that uses threadpools expect to be able to use NS_GetCurrentThread and then spin an nested event loop waiting on the main thread? That is probably pretty broken anyway, so if this is green on try we're probably ok.

I supposed the nested event loop in nsThreadPool::Shutdown is still necessary because the threadpool tasks may block on the main thread, and so we need to avoid joining the worker thread while there are still tasks running? That kinds sucks, I was hoping to avoid that nested event loop altogether.
Attachment #818830 - Flags: review?(benjamin) → review+
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #4)
> Things to worry about here:
> 
> * Does existing code that uses threadpools expect to be able to use
> NS_GetCurrentThread and then spin an nested event loop waiting on the main
> thread? That is probably pretty broken anyway, so if this is green on try
> we're probably ok.

NS_GetCurrentThread should be fine because getting nsIThreadManager::currentThread will wrap the current thread in an nsIThread if it doesn't have one.  Try is OK with it.

> I supposed the nested event loop in nsThreadPool::Shutdown is still
> necessary because the threadpool tasks may block on the main thread, and so
> we need to avoid joining the worker thread while there are still tasks
> running? That kinds sucks, I was hoping to avoid that nested event loop
> altogether.

Yes, that was my reasoning, and yes it does, but that is the model we use when emptying the nsEventQueues on nsIThreads.

Let me know if you'd like me to add signalling instead and do a try run, but I'm concerned about changing the interface.

Even if the events on the pool threads don't wait for the main thread, existing shutdown() callers may not want to main thread to block until the events have all run.

Anyway, I now see that my assumption about mThreads.Length() being atomic enough is delicate because the length is in mHdr [1].  It works out because nsTArray::RemoveElement() doesn't reallocate to change mHdr, but this is too dependent on the implementation.  I'll change to use the mutex, and get review on those changes.

[1] http://hg.mozilla.org/mozilla-central/annotate/177bf37a49f5/xpcom/glue/nsTArray.h#l370
Attachment #818830 - Attachment description: patch → part 2: remove nsThreadPool per-thread event queues
Attachment #820789 - Flags: review?(benjamin)
Attachment #820789 - Flags: review?(benjamin) → review+
Comment on attachment 820790 [details] [diff] [review]
part 2 fold-in: access mThreads.Length() with the mutex held and cast nullptr for template selector

per IRC, let's use IndexOf<PRThread*> instead of (PRThread*)0
Attachment #820790 - Flags: review?(benjamin) → review+
Blocks: 926522
No longer depends on: 926522
https://hg.mozilla.org/mozilla-central/rev/95ec386c179c
https://hg.mozilla.org/mozilla-central/rev/242bb2279283
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Bug 930495, bug 930479, bug 930428 are reporting crashes on MacOS 10.6 and 10.7 in

libSystem.B.dylib + 0x39d6d
PR_CreateThread
XUL!nsThreadPool::PutEvent(nsIRunnable*) [nsThreadPool.cpp:31e679e3e4d7 : 92 + 0x25]
XUL!nsThreadPool::Dispatch(nsIRunnable*, unsigned int) [nsThreadPool.cpp:31e679e3e4d7 : 259 + 0xa]
XUL!nsAStreamCopier::Start(nsIInputStream*, nsIOutputStream*, nsIEventTarget*, void (*)(void*, tag_nsresult), void*, unsigned int, bool, bool, void (*)(void*, unsigned int)) [nsStreamUtils.cpp:31e679e3e4d7 : 437 + 0xb]
XUL!NS_AsyncCopy(nsIInputStream*, nsIOutputStream*, nsIEventTarget*, nsAsyncCopyMode, unsigned int, void (*)(void*, tag_nsresult), void*, bool, bool, nsISupports**, void (*)(void*, unsigned int)) [nsStreamUtils.cpp:31e679e3e4d7 : 586 + 0x37]
XUL!nsInputStreamTransport::OpenInputStream(unsigned int, unsigned int, unsigned int, nsIInputStream**) [nsStreamTransportService.cpp:31e679e3e4d7 : 109 + 0x40]
XUL!mozilla::net::nsHttpChannel::OpenCacheInputStream(nsICacheEntry*, bool) [nsHttpChannel.cpp:31e679e3e4d7 : 3434 + 0xe]

so backed out part 2:
http://hg.mozilla.org/mozilla-central/rev/5a9ac6fed6ff
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 930495
Duplicate of this bug: 930479
Duplicate of this bug: 930428
A new thread could sometimes exit before receiving an event if the idle count rose (and theoretically also if the idle time expired).

This patch essentially reverts the behavior on thread creation failure to what it was before attachment 818830 [details] [diff] [review]: the events are left in the queue.

We could empty the queue if there are no threads, but I'm trying to limit how much is changed in this bug.  The return code of nsThreadPool::PutEvent() is currently not returned from Dispatch().

This happens to work around the crashes, but I'll add another patch to avoid them for certain.
Attachment #824402 - Flags: review?(benjamin)
PR_CreateThread writes to objects deleted in PR_JoinThread after the new thread is created:
http://hg.mozilla.org/mozilla-central/annotate/19fd3388c372/nsprpub/pr/src/pthreads/ptthread.c#l500
This lead to crashes when PR_JoinThread was called on a different thread to PR_CreateThread.  Bug 932029 made this a little harder to diagnose.

With PR_UNJOINABLE_THREAD, NSPR ensures the objects are not deleted before PR_CreateThread has finished writing to them.

Updated documentation at https://developer.mozilla.org/en-US/docs/PR_JoinThread$compare?to=484741&from=203306

This patch also includes changes to avoid a potential hang in Shutdown() if called on a non-main thread.
Attachment #824412 - Flags: review?(benjamin)
Sorry about all the revisions.
Perhaps it is easier to look at the complete new version than the differences.
Attachment #818830 - Attachment is obsolete: true
Blocks: 936317
No longer blocks: 936317
Attachment #824402 - Flags: review?(benjamin) → review+
Comment on attachment 824412 [details] [diff] [review]
part 2 fold-in 3: use a detached thread to avoid NSPR create/join races and wake the correct thread during shutdown

So this basically works around an NSPR bug which only occurs if PR_CreateThread and PR_JoinThread are called on different threads but near the same time?

What if, instead of making all of these threads joinable, we made all of the threadpool threads unjoinable, and just used flags/monitors to verify that they weren't doing anything? It seems that would solve the same problem without the weirdness here of another thread whose only job is to shut down other threads.
Flags: needinfo?(karlt)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #18)
> So this basically works around an NSPR bug which only occurs if
> PR_CreateThread and PR_JoinThread are called on different threads but near
> the same time?

Yes, that's correct.

It also dispatches the ShutdownCompleteNotification to the thread that is waiting for an event in Shutdown().  The previous version could get stuck wait for an event on the Shutdown() thread, when there would be no events on that thread.

> What if, instead of making all of these threads joinable, we made all of the
> threadpool threads unjoinable, and just used flags/monitors to verify that
> they weren't doing anything? It seems that would solve the same problem
> without the weirdness here of another thread whose only job is to shut down
> other threads.

I'm sure there are other options here, and I'm happy to look into them.  One awkward thing with unjoinable threads is that we don't know when they have really finished.

However, I don't understand what you are proposing here.  This patch *is* making all threadpool threads unjoinable.  It is not adding another thread, but merely recording which thread called and is still waiting in Shutdown().

The tricky part is that Shutdown() should continue processing events on its thread and so can't simply wait for a monitor notify from the threadpool threads.

I'm happy to find a different solution that keeps joinable threads, if you prefer.
Flags: needinfo?(karlt)
Comment on attachment 824412 [details] [diff] [review]
part 2 fold-in 3: use a detached thread to avoid NSPR create/join races and wake the correct thread during shutdown

I apologize for the delay. Every time I've read this I've gotten confused with the multithreaded code here.

I see why you're keeping mShutdownThread now. It's not clear to me why you're using mShutdownThread as the signal to stop spinning the nested event loop. Couldn't we just break out of the nested event loop if (mThreads.IsEmpty()) ? Then ShutdownCompleteNotification::Run could be an empty method, and I wouldn't have to worry a lot about race conditions on the null-ness of mShutdown thread: I suspect that the current loop could exit early and leave an orphaned ShutdownCompleteNotification.

Style issues: I don't think we like nested class declarations: they are hard or impossible to set breakpoints on in many debuggers; please move it outside the method.

There's a couple if( that need to be un-snuggled.
Attachment #824412 - Flags: review?(benjamin) → review-
Once the work in bug 956899 moves to mfbt, there will be no need to work around nspr bugs.
Depends on: 956899
Do you think this is still worth addressing?
Flags: needinfo?(karlt)
I think this is still worth fixing.

https://bugzilla.mozilla.org/show_bug.cgi?id=956899#c143 implies that moving js/src/threading APIs to mfbt may be difficult, but perhaps using those APIs may still be an option even though they are in js/src.

I'm not likely to finish this off soon myself, though, so would be quite happy if someone would like to take it.
Flags: needinfo?(karlt)
You need to log in before you can comment on or make changes to this bug.