Open Bug 2009073 Opened 2 days ago Updated 1 day ago

nsThreadPool::Dispatch() violate the caller's expection when it returns an error

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

ASSIGNED

People

(Reporter: sinker, Assigned: sinker)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 1 obsolete file)

The caller' expects that the task will not be executed if nsThreadPool::Dispatch() return an error. However, it is not true.

nsThreadPool::PutEvent() is where nsThreadPool enqueues an event and creates threads for the thread pool.
With the current implementation, it adds the input event to the event queue mEvents before creating threads.
When it fails to create a new thread it will return an error. However, the input event is already in the event queue and not removed.
If there is already one or more threads in the pool, even fail to create a new thread, the existing threads will eventually
run the given event.

Previously, PutEvent() added events to the queue before attempting to
create threads. If thread creation failed after the event was queued,
the function would return an error, but existing threads would still
execute the event. This violates the caller's expectation: when
PutEvent() returns an error, the event should not be executed.

The fix restructures PutEvent() to ensure threads exist before adding
events to the queue:

  1. Handle NS_DISPATCH_AT_END early (special case that doesn't need
    thread creation or notification)
  2. Check if thread creation is needed (no idle threads, below limit)
  3. Create thread if necessary
  4. Only add event to queue if we have at least one thread
  5. Wake up idle thread after event is queued

On thread creation failure:

  • If no threads exist: Return error without queuing the event (critical)
  • If threads exist: Continue without error, existing threads will handle
    the event (non-critical performance issue)

This ensures the event is only queued when we have threads to process it,
maintaining the contract that error returns mean the event won't execute.

Assignee: nobody → thinker.li
Status: NEW → ASSIGNED
Assignee: thinker.li → nobody
Blocks: 1998669
Status: ASSIGNED → NEW
Assignee: nobody → thinker.li
Status: NEW → ASSIGNED

The caller' expects that the task will not be executed if nsThreadPool::Dispatch() return an error.

I am a bit worried that the caller's expectation is most of the times that Dispatch "just works" and that they do not really think about what to do on failure or assume "that can only fail during shutdown, so it's fine".

The most common user of thread pool dispatch is probably TaskQueue, which just forwards the error on dispatch.

Looking a bit around how people dispatch to TaskQueue, the best of error handling we can expect from callers is probably a surrounding MOZ_ALWAYS_SUCCEEDS or sometimes an explicit MOZ_DIAGNOSTIC_ASSERT(NS_SUCCEEDED(rv)); (or JS exceptions if we come through JS). However (void) seems to not be uncommon, too.

So I am fine with being more resilient when we can, but if a pool has no thread and can't get one from the OS, we should probably rather always crash than return an error (and then maybe crash, maybe hang).

(In reply to Jens Stutte [:jstutte] from comment #2)

So I am fine with being more resilient when we can, but if a pool has no thread and can't get one from the OS, we should probably rather always crash than return an error (and then maybe crash, maybe hang).

We should be able to degrade the service.
Since this api is to dispatch a task to a pool of threads, including the caller's thread in the pool for temporary is reasonable for some cases if not all cases. For my case, I dispatch a task to a thread pool to avoid IO from blocking the current thread. If the thread pool fails to have any thread, doing it in the caller's thread is acceptable. What do you think to have a function like DispatchMayCurrent()? This function could run a task on the current/caller's thread if it fails to create any thread.

Flags: needinfo?(jstutte)

I would probably first want to understand why we need to bother and how frequently this actually happens. We could start from a diagnostic assertion, maybe? To get some signal from Nightly.

Most thread pools in Firefox are probably mandatory to work in order to guarantee a working browser. If we create too many threads in general, we should audit the pool settings, as a few pools could eat many threads and not leave enough for others. But if the user quota for threads is just too low, we cannot really fight against it. We then may even have problems with normal thread creation, not only pools.

A DispatchMayCurrent (or maybe such a dispatch flag) sounds like something we should only do if we learn from signals from the wild that we really have an issue here. And in that case we should probably do more than just that, like automatic balancing of the total numbers of threads we use, but that would be a bigger effort, probably.

For my case, I dispatch a task to a thread pool to avoid IO from blocking the current thread.

That sounds like a classic use case for the NS_DispatchBackgroundTask with NS_DISPATCH_EVENT_MAY_BLOCK. If that fails to work, we are probably screwed enough to crash or hang, for sure.

Flags: needinfo?(jstutte)
Attachment #9536249 - Attachment description: Bug 2009073 - Fix nsThreadPool::PutEvent to prevent event execution after error return. r?#xpcom-reviewers → Bug 2009073 - Crash instead of returning error when thread creation fails in nsThreadPool::PutEvent(). r?#xpcom-reviewers
Pushed by thinker.li@gmail.com: https://github.com/mozilla-firefox/firefox/commit/c04c964cc11d https://hg.mozilla.org/integration/autoland/rev/2a3396166ffb Crash instead of returning error when thread creation fails in nsThreadPool::PutEvent(). r=xpcom-reviewers,jstutte
Pushed by csabou@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/90ce06af4580 https://hg.mozilla.org/integration/autoland/rev/13aa00007318 Revert "Bug 2009073 - Crash instead of returning error when thread creation fails in nsThreadPool::PutEvent(). r=xpcom-reviewers,jstutte" as requested by the developer.

When nsThreadPool::PutEvent() fails to create a new thread, the behavior
should depend on whether the pool already has existing threads:

  1. If the pool is empty (mThreads.Count() == 0):

    • This is a fatal condition because there are no threads available to
      process the queued event
    • Crash with MOZ_CRASH to make this failure explicit and debuggable
  2. If the pool is not empty:

    • The existing threads can still handle the queued event
    • Return NS_OK to allow the caller to continue normally
    • This provides graceful degradation when thread creation fails but
      the pool can still function

This change makes nsThreadPool more robust by only treating thread
creation failure as fatal when it truly prevents event processing.

Attachment #9536708 - Attachment is obsolete: true
Attachment #9536249 - Attachment description: Bug 2009073 - Crash instead of returning error when thread creation fails in nsThreadPool::PutEvent(). r?#xpcom-reviewers → Bug 2009073 - Only crash when thread creation fails in empty nsThreadPool. r?#xpcom-reviewers
Blocks: 2004022
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: