Closed Bug 855923 Opened 12 years ago Closed 12 years ago

Investigate whether ~100 image decoding threads + beginTimePeriod(1) cause main thread jank

Categories

(Core :: Graphics: ImageLib, defect)

x86_64
Windows 7
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: mayhemer, Assigned: joe)

Details

(Keywords: perf)

Attachments

(1 file)

This may be a red herring, but worth to file anyway IMO. STR: - go to pinterest.com/all - scroll all the way down to load more and more images - UI gets stuck for really long time (seconds) (in optimized build) Optionally, to speed up your network: - apply network speedup patch(s) from bug 819734 - turn network pipelining on One of callstacks on the main thread during the hang (pause in VC++): nss3.dll!_PR_MD_WAIT_CV(_MDCVar * cv=0x13938694, _MDLock * lock=0x1393a76c, unsigned int timeout=4294967295) Line 250 C nss3.dll!_PR_WaitCondVar(PRThread * thread=0x020378f8, PRCondVar * cvar=0x13938620, PRLock * lock=0x1393a750, unsigned int timeout=4294967295) Line 173 C nss3.dll!PR_Wait(PRMonitor * mon=0x156b58b0, unsigned int ticks=4294967295) Line 152 + 0x1a bytes C xul.dll!nsThreadStartupEvent::Wait() Line 181 + 0xb bytes C++ xul.dll!nsThread::Init() Line 349 + 0x7 bytes C++ xul.dll!nsThreadManager::NewThread(unsigned int creationFlags=0, unsigned int stackSize=0, nsIThread * * result=0x0023d3b8) Line 215 + 0x7 bytes C++ xul.dll!nsThreadPool::PutEvent(nsIRunnable * event=0x156b3648) Line 90 C++ xul.dll!nsThreadPool::Dispatch(nsIRunnable * event=0x156b3648, unsigned int flags=0) Line 232 C++ > xul.dll!mozilla::image::RasterImage::DecodePool::RequestDecode(mozilla::image::RasterImage * aImg=0x0fef6fe8) Line 3635 + 0x9 bytes C++ xul.dll!mozilla::image::RasterImage::DecodeDoneWorker::Run() Line 3925 C++ xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x0023d447) Line 627 + 0x6 bytes C++ xul.dll!NS_ProcessNextEvent(nsIThread * thread=0x010407f0, bool mayWait=true) Line 238 + 0xd bytes C++ There were about 125 threads in the whole application where about 80 were image decoding threads. The biggest thread pool number were even more scary: 39571 ! Why so many threads? Is it possible that combination of fast thread switching setup by beginTimePeriod(1) and so many threads + the sync code to wait for each start up could cause a significant UI jank?
We definitely should bound the number of image decoding threads!
Assignee: nobody → joe
3516 if (gDecodingThreadLimit <= 0) { 3517 mThreadPool->SetThreadLimit(std::max(PR_GetNumberOfProcessors() - 1, 1)); 3518 } else { 3519 mThreadPool->SetThreadLimit(static_cast<uint32_t>(gDecodingThreadLimit)); 3520 } Is that not sufficient? Maybe there's a bug in the thread pool? Or maybe I should set a custom idle thread limit too?
gDecodingThreadLimit is by default -1, so you set the limit to the number of cores - 1. I checked PR_GetNumberOfProcessors() returns a sane number. (BTW, when it returns 0, we pass 0xffffffff to SetThreadLimit.) idleThreadTimeout is by default 60 seconds and you don't change it. Problem is that you leave idleThreadLimit at 1. Threads are then dying almost instantly, even though timeout is those 60 seconds. I think it's wasting. Use mThreadPool->SetIdleThreadLimit() to set the number of idle thread to leave, I've played with setting it to the same number as the overall thread limit. Works well. However, I still don't understand why I've seen active 125 threads at a single time... BTW, I cannot reproduce it!
Attached patch set idle limitSplinter Review
I originally had this, but I thought it would be unnecessary. I guess it isn't!
Attachment #739377 - Flags: review?(seth)
Attachment #739377 - Flags: feedback?(honzab.moz)
Comment on attachment 739377 [details] [diff] [review] set idle limit Review of attachment 739377 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, though it seems like this bug overall still needs more investigation.
Attachment #739377 - Flags: review?(seth) → review+
Comment on attachment 739377 [details] [diff] [review] set idle limit Review of attachment 739377 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/src/RasterImage.cpp @@ +3521,2 @@ > if (gDecodingThreadLimit <= 0) { > + limit = std::max(PR_GetNumberOfProcessors() - 1, 1); Could we also fix a case PR_GetNumberOfProcessors() returns a 0 or even a negative number? Like limit = std::max(PR_GetNumberOfProcessors(), 2) - 1 or so...
Attachment #739377 - Flags: feedback?(honzab.moz) → feedback+
And BTW, could you please push also a talos-only try run whether it somehow affects performance?
(In reply to Honza Bambas (:mayhemer) from comment #7) > Comment on attachment 739377 [details] [diff] [review] > set idle limit > > Review of attachment 739377 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: image/src/RasterImage.cpp > @@ +3521,2 @@ > > if (gDecodingThreadLimit <= 0) { > > + limit = std::max(PR_GetNumberOfProcessors() - 1, 1); > > Could we also fix a case PR_GetNumberOfProcessors() returns a 0 or even a > negative number? > > Like limit = std::max(PR_GetNumberOfProcessors(), 2) - 1 or so... Sure, but is this something we really need to worry about? (In reply to Honza Bambas (:mayhemer) from comment #8) > And BTW, could you please push also a talos-only try run whether it somehow > affects performance? https://tbpl.mozilla.org/?tree=Try&rev=69ebe52158f6
I'm pushing this as-is; Honza, if you ever see this again, please report another bug! https://hg.mozilla.org/integration/mozilla-inbound/rev/bb028a52ffe3
This appears to cause a regression in the Tp5 Optimized "Private Bytes" memory counter. This regression showed up on Try, but it's not visible in the compare-talos link above because compare-talos doesn't know about Ubuntu: https://bitbucket.org/mconnor/compare-talos/issue/15/add-ubuntu-platforms I'm not sure why this would cause an Ubuntu-only regression (it seems to actually *improve* the same benchmark on Fedora). I retriggered some old changesets to verify whether it's actually caused by a code change or an infrastructure change. There was another suspicious Ubuntu-only "Private Bytes" regression shortly after the Ubuntu platform was first deployed, earlier this month...
It would make some sense that we're using more private bytes, since we actually have more threads in the common case.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Honza, do you think I should ask for this to be uplifted?
Flags: needinfo?(honzab.moz)
(In reply to Joe Drew (:JOEDREW! \o/) from comment #15) > Honza, do you think I should ask for this to be uplifted? Maybe b2g could benefit? I have nothing against, but only if there is at least some measurable benefit. Up to you, I'll be glad to see this in sooner :)
Flags: needinfo?(honzab.moz)
Heh, I'll just leave it as is unless we hear complaints. :)
Reading bug 856670 comment 38, it sounds like we probably should uplift this to Aurora. There's a stack overflow that's the #2 topcrash there, and there's some reason to believe that the patch in this bug may fix the problem.
Wow, I'd be surprised, but... sure.
Comment on attachment 739377 [details] [diff] [review] set idle limit [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 716140 User impact if declined: Possibly large numbers of threads, maybe some jank or crashes? Testing completed (on m-c, etc.): On m-c for a few days Risk to taking this patch (and alternatives if risky): Not terribly risky. String or IDL/UUID changes made by this patch: none
Attachment #739377 - Flags: approval-mozilla-aurora?
I spoke too soon. It looks like bug 854803 is what needed to get backported. Sorry.
Please confirm comment 21 by removing the approval nomination.
Comment on attachment 739377 [details] [diff] [review] set idle limit I'll go with what Bill says :)
Attachment #739377 - Flags: approval-mozilla-aurora?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: