Image decoding threadpool size seems quite large

RESOLVED FIXED in Firefox 60

Status

()

enhancement
P3
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: bzbarsky, Assigned: aosmond)

Tracking

(Depends on 1 bug, Blocks 1 bug)

unspecified
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox60 fixed)

Details

(Whiteboard: [MemShrink][gfx-noted], crash signature)

Attachments

(4 attachments)

It looks like our image decoding threadpool size is set to number of logical CPUs minus 1, _and_ preallocates all the threads.   So in my case, on a 4-core maching, I have 7 threads here.  On an 8-core machine I have 15 threads.

Do we ever actually end up using 15 threads effectively here?  Especially in a multiprocess world, with various other things contending for CPU timeslices across all the processes?

I don't think we have any other threadpools that have this sort of behavior....

This came up in bug https://bugzilla.mozilla.org/show_bug.cgi?id=1436179 because the memory overhead of those 15 threads ends up being about 600KB just for the profiler metadata.
Whiteboard: [MemShrink]
Seems like at a minimum we should let the threadpool deallocate threads down to some reasonable number of idle threads.
I imagine the typical load on the decoder threads is highly bursty; during pageloads it will spike to use (all?) available threads on many pages.  IIRC we don't decode all images for a page (especially if large) until they're made visible (or perhaps about to), so scrolling and the like can cause other bursts.  I think a max of all cores isn't *unreasonable*, though you can argue optimality.  Using all cores might limit desktop (or laptop) frequency maximums, though I don't see that as a major concern.  

Given the burstiness, it might make sense to allow the pool to keep some threads around (argue as to how many), and keep others temporarily (N seconds?)   However, anything that drops the threads will cause us to spend cycles (and time) recreating them repeatedly, hurting our perf and pageload latency (a little) and out power use (a very little).  These hits might be minimal, especially if we keep (say 4) threads alive, but requires checking.
(Assignee)

Comment 3

a year ago
I do have some patches that rework our threading a bit. One part spawns the one decoder thread always (and release asserts its creation) and the remaining threads on demand (if there are pending items in the queue, and we have capacity in our pool, attempt to spawn another). It never releases threads from the pool however. My own local testing suggests that it will save us from spawning more than 1 thread in the parent process (since it does limited decoding...) and 1/2 to 2/3s of the threads get spawned for content processes on the new tab page. I need to do some testing on an optimized build to see if that changes things.
(Assignee)

Updated

a year ago
Assignee: nobody → aosmond
Reducing it in the Master process would be a big win (doubly so until we can get the Profiler to stop allocating 32K+ per thread, just in case).

You might want to throw a pref in for how many to keep alive, and there definitely should be *some* hysteresis to dropping threads to avoid churn.  I'd also suggest playing with these parameters a timing something like page-down or End on heavy-image pages.

Things like speedometer (and probably talos page-load tests) won't show any issues here, or won't show them much, so other ideas are welcome.
(Assignee)

Comment 5

a year ago
This patch makes us spawn a new thread when we have more tasks than idle threads. It will always spawn one at the beginning.
Attachment #8950325 - Flags: review?(tnikkel)
(Assignee)

Comment 6

a year ago
This patch will shutdown threads after a configurable timeout. I set the default to 10 minutes, although that was not chosen for any particular reason other than "long." It will keep at least half of the maximum number of threads, rounded up -- only the excess threads can be shutdown due to being idle.
Attachment #8950327 - Flags: review?(tnikkel)
(Assignee)

Comment 7

a year ago
Not mandatory, just the FILO task ordering always seemed strange to me vs FIFO?
Attachment #8950328 - Flags: review?(tnikkel)
(Assignee)

Comment 9

a year ago
There is a related crash (see signature) where we cannot create all the desired threads on startup. These patches will fix the crash most likely (unless it is the first thread we fail to spawn), albeit incidentally. If we get hammered with requests, it will try to spawn a new thread and fail later -- and it will keep retrying to spawn that thread as long as the queue depth is greater than the number of idle threads. There will clearly be a runtime cost to this, but maybe I'm worrying over nothing, as a system which cannot spawn a thread will probably crash due to OOM or something? :)
Crash Signature: mozilla::image::DecodePool::DecodePool
Priority: -- → P3
Whiteboard: [MemShrink] → [MemShrink][gfx-noted]
(Assignee)

Updated

a year ago
Crash Signature: mozilla::image::DecodePool::DecodePool → [@ mozilla::image::DecodePool::DecodePool ]
Comment on attachment 8950325 [details] [diff] [review]
0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1

Review of attachment 8950325 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/DecodePool.cpp
@@ +63,4 @@
>      , mShuttingDown(false)
> +  {
> +    MonitorAutoLock lock(mMonitor);
> +    bool rv = CreateThread();

Usually rv is for nsresult I thought? So just |success| would be okay here.

@@ +110,5 @@
>        // Drop any new work on the floor if we're shutting down.
>        return;
>      }
>  
> +    // If there are pending tasks, create more workers if and only if we have

It seems like this block would make more sense after we append to the queue? Or am I missing something?
Attachment #8950325 - Flags: review?(tnikkel) → review+
Attachment #8950329 - Flags: review?(tnikkel) → review+
Attachment #8950328 - Flags: review?(tnikkel) → review+
Attachment #8950327 - Flags: review?(tnikkel) → review+
(Assignee)

Comment 11

a year ago
(In reply to Timothy Nikkel (:tnikkel) from comment #10)
> Comment on attachment 8950325 [details] [diff] [review]
> 0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1
> 
> Review of attachment 8950325 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: image/DecodePool.cpp
> @@ +63,4 @@
> >      , mShuttingDown(false)
> > +  {
> > +    MonitorAutoLock lock(mMonitor);
> > +    bool rv = CreateThread();
> 
> Usually rv is for nsresult I thought? So just |success| would be okay here.
> 

Yes, my bad. Will do.

> @@ +110,5 @@
> >        // Drop any new work on the floor if we're shutting down.
> >        return;
> >      }
> >  
> > +    // If there are pending tasks, create more workers if and only if we have
> 
> It seems like this block would make more sense after we append to the queue?
> Or am I missing something?

You are correct. What you are missing was how the check to decide to spawn a thread changed :). I will move it.

Comment 12

a year ago
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c32ead4e3525
Part 1. Spawn image decoder threads on demand, rather than at startup. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/3650631487c7
Part 2. Shutdown idle image decoder threads after the configured timeout. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/4ce2bfe462a0
Part 3. Process image decoding tasks in a more predictable order. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/858d629f761d
Part 4. Fix image/DecodePool.h inclusions. r=tnikkel
(Assignee)

Updated

a year ago
Depends on: 1421150
(Assignee)

Comment 14

a year ago
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=43f4fd3a4b65c5ee81b1a41fe518472f25b35055

This suggests part 3 is the problematic piece. It is interesting because I guess if you are a test, and you only have one image, it *is* to your advantage that the most recently requested image is decoded first...

I'm going to try landing parts 1, 2 and 4, and consider landing part 3 in the future once I get other machinery in place / tests fixed.
Flags: needinfo?(aosmond)

Comment 15

a year ago
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7c913713301f
Part 1. Spawn image decoder threads on demand, rather than at startup. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/e65b7fee0060
Part 2. Shutdown idle image decoder threads after the configured timeout. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/38557fb9f111
Part 3. Fix image/DecodePool.h inclusions. r=tnikkel

Comment 16

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7c913713301f
https://hg.mozilla.org/mozilla-central/rev/e65b7fee0060
https://hg.mozilla.org/mozilla-central/rev/38557fb9f111
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
(Assignee)

Comment 17

a year ago
I'm a little worried here where I'm only seeing one image decoder thread on the latest nightly in my content processes. Not at all what I saw during my local dev testing. I need to investigate further.
Flags: needinfo?(aosmond)
(Assignee)

Comment 18

a year ago
My local builds (debug and optimized) have the expected behaviour. Maybe it is related to pgo.
(Assignee)

Comment 19

a year ago
User error. I had changed image.multithreaded_decoding.limit to 1 for testing in my nightly profile at some point, and completely forgot that I did so.
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.