Closed
Bug 855923
Opened 12 years ago
Closed 12 years ago
Investigate whether ~100 image decoding threads + beginTimePeriod(1) cause main thread jank
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
FIXED
mozilla23
People
(Reporter: mayhemer, Assigned: joe)
Details
(Keywords: perf)
Attachments
(1 file)
1.17 KB,
patch
|
seth
:
review+
mayhemer
:
feedback+
|
Details | Diff | Splinter Review |
This may be a red herring, but worth to file anyway IMO.
STR:
- go to pinterest.com/all
- scroll all the way down to load more and more images
- UI gets stuck for really long time (seconds) (in optimized build)
Optionally, to speed up your network:
- apply network speedup patch(s) from bug 819734
- turn network pipelining on
One of callstacks on the main thread during the hang (pause in VC++):
nss3.dll!_PR_MD_WAIT_CV(_MDCVar * cv=0x13938694, _MDLock * lock=0x1393a76c, unsigned int timeout=4294967295) Line 250 C
nss3.dll!_PR_WaitCondVar(PRThread * thread=0x020378f8, PRCondVar * cvar=0x13938620, PRLock * lock=0x1393a750, unsigned int timeout=4294967295) Line 173 C
nss3.dll!PR_Wait(PRMonitor * mon=0x156b58b0, unsigned int ticks=4294967295) Line 152 + 0x1a bytes C
xul.dll!nsThreadStartupEvent::Wait() Line 181 + 0xb bytes C++
xul.dll!nsThread::Init() Line 349 + 0x7 bytes C++
xul.dll!nsThreadManager::NewThread(unsigned int creationFlags=0, unsigned int stackSize=0, nsIThread * * result=0x0023d3b8) Line 215 + 0x7 bytes C++
xul.dll!nsThreadPool::PutEvent(nsIRunnable * event=0x156b3648) Line 90 C++
xul.dll!nsThreadPool::Dispatch(nsIRunnable * event=0x156b3648, unsigned int flags=0) Line 232 C++
> xul.dll!mozilla::image::RasterImage::DecodePool::RequestDecode(mozilla::image::RasterImage * aImg=0x0fef6fe8) Line 3635 + 0x9 bytes C++
xul.dll!mozilla::image::RasterImage::DecodeDoneWorker::Run() Line 3925 C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x0023d447) Line 627 + 0x6 bytes C++
xul.dll!NS_ProcessNextEvent(nsIThread * thread=0x010407f0, bool mayWait=true) Line 238 + 0xd bytes C++
There were about 125 threads in the whole application where about 80 were image decoding threads. The biggest thread pool number were even more scary: 39571 !
Why so many threads?
Is it possible that combination of fast thread switching setup by beginTimePeriod(1) and so many threads + the sync code to wait for each start up could cause a significant UI jank?
We definitely should bound the number of image decoding threads!
Assignee: nobody → joe
Assignee | ||
Comment 2•12 years ago
|
||
3516 if (gDecodingThreadLimit <= 0) {
3517 mThreadPool->SetThreadLimit(std::max(PR_GetNumberOfProcessors() - 1, 1));
3518 } else {
3519 mThreadPool->SetThreadLimit(static_cast<uint32_t>(gDecodingThreadLimit));
3520 }
Is that not sufficient? Maybe there's a bug in the thread pool? Or maybe I should set a custom idle thread limit too?
Reporter | ||
Comment 3•12 years ago
|
||
gDecodingThreadLimit is by default -1, so you set the limit to the number of cores - 1. I checked PR_GetNumberOfProcessors() returns a sane number. (BTW, when it returns 0, we pass 0xffffffff to SetThreadLimit.)
idleThreadTimeout is by default 60 seconds and you don't change it.
Problem is that you leave idleThreadLimit at 1. Threads are then dying almost instantly, even though timeout is those 60 seconds. I think it's wasting.
Use mThreadPool->SetIdleThreadLimit() to set the number of idle thread to leave, I've played with setting it to the same number as the overall thread limit. Works well.
However, I still don't understand why I've seen active 125 threads at a single time... BTW, I cannot reproduce it!
Assignee | ||
Comment 4•12 years ago
|
||
I originally had this, but I thought it would be unnecessary. I guess it isn't!
Attachment #739377 -
Flags: review?(seth)
Attachment #739377 -
Flags: feedback?(honzab.moz)
Assignee | ||
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Comment on attachment 739377 [details] [diff] [review]
set idle limit
Review of attachment 739377 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, though it seems like this bug overall still needs more investigation.
Attachment #739377 -
Flags: review?(seth) → review+
Reporter | ||
Comment 7•12 years ago
|
||
Comment on attachment 739377 [details] [diff] [review]
set idle limit
Review of attachment 739377 [details] [diff] [review]:
-----------------------------------------------------------------
::: image/src/RasterImage.cpp
@@ +3521,2 @@
> if (gDecodingThreadLimit <= 0) {
> + limit = std::max(PR_GetNumberOfProcessors() - 1, 1);
Could we also fix a case PR_GetNumberOfProcessors() returns a 0 or even a negative number?
Like limit = std::max(PR_GetNumberOfProcessors(), 2) - 1 or so...
Attachment #739377 -
Flags: feedback?(honzab.moz) → feedback+
Reporter | ||
Comment 8•12 years ago
|
||
And BTW, could you please push also a talos-only try run whether it somehow affects performance?
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #7)
> Comment on attachment 739377 [details] [diff] [review]
> set idle limit
>
> Review of attachment 739377 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: image/src/RasterImage.cpp
> @@ +3521,2 @@
> > if (gDecodingThreadLimit <= 0) {
> > + limit = std::max(PR_GetNumberOfProcessors() - 1, 1);
>
> Could we also fix a case PR_GetNumberOfProcessors() returns a 0 or even a
> negative number?
>
> Like limit = std::max(PR_GetNumberOfProcessors(), 2) - 1 or so...
Sure, but is this something we really need to worry about?
(In reply to Honza Bambas (:mayhemer) from comment #8)
> And BTW, could you please push also a talos-only try run whether it somehow
> affects performance?
https://tbpl.mozilla.org/?tree=Try&rev=69ebe52158f6
Assignee | ||
Comment 10•12 years ago
|
||
This doesn't seem to have affected performance much: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=7712e67a7c1e,5bc732a49eae&newRev=69ebe52158f6&submit=true
Assignee | ||
Comment 11•12 years ago
|
||
I'm pushing this as-is; Honza, if you ever see this again, please report another bug!
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb028a52ffe3
Comment 12•12 years ago
|
||
This appears to cause a regression in the Tp5 Optimized "Private Bytes" memory counter. This regression showed up on Try, but it's not visible in the compare-talos link above because compare-talos doesn't know about Ubuntu:
https://bitbucket.org/mconnor/compare-talos/issue/15/add-ubuntu-platforms
I'm not sure why this would cause an Ubuntu-only regression (it seems to actually *improve* the same benchmark on Fedora). I retriggered some old changesets to verify whether it's actually caused by a code change or an infrastructure change. There was another suspicious Ubuntu-only "Private Bytes" regression shortly after the Ubuntu platform was first deployed, earlier this month...
Assignee | ||
Comment 13•12 years ago
|
||
It would make some sense that we're using more private bytes, since we actually have more threads in the common case.
Comment 14•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Assignee | ||
Comment 15•12 years ago
|
||
Honza, do you think I should ask for this to be uplifted?
Flags: needinfo?(honzab.moz)
Reporter | ||
Comment 16•12 years ago
|
||
(In reply to Joe Drew (:JOEDREW! \o/) from comment #15)
> Honza, do you think I should ask for this to be uplifted?
Maybe b2g could benefit? I have nothing against, but only if there is at least some measurable benefit. Up to you, I'll be glad to see this in sooner :)
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 17•12 years ago
|
||
Heh, I'll just leave it as is unless we hear complaints. :)
Reading bug 856670 comment 38, it sounds like we probably should uplift this to Aurora. There's a stack overflow that's the #2 topcrash there, and there's some reason to believe that the patch in this bug may fix the problem.
Assignee | ||
Comment 19•12 years ago
|
||
Wow, I'd be surprised, but... sure.
Assignee | ||
Comment 20•12 years ago
|
||
Comment on attachment 739377 [details] [diff] [review]
set idle limit
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 716140
User impact if declined: Possibly large numbers of threads, maybe some jank or crashes?
Testing completed (on m-c, etc.): On m-c for a few days
Risk to taking this patch (and alternatives if risky): Not terribly risky.
String or IDL/UUID changes made by this patch: none
Attachment #739377 -
Flags: approval-mozilla-aurora?
I spoke too soon. It looks like bug 854803 is what needed to get backported. Sorry.
Comment 22•12 years ago
|
||
Please confirm comment 21 by removing the approval nomination.
Assignee | ||
Comment 23•12 years ago
|
||
Comment on attachment 739377 [details] [diff] [review]
set idle limit
I'll go with what Bill says :)
Attachment #739377 -
Flags: approval-mozilla-aurora?
You need to log in
before you can comment on or make changes to this bug.
Description
•