855923 - Investigate whether ~100 image decoding threads + beginTimePeriod(1) cause main thread jank

Reporter

Description

•

12 years ago

This may be a red herring, but worth to file anyway IMO. STR: - go to pinterest.com/all - scroll all the way down to load more and more images - UI gets stuck for really long time (seconds) (in optimized build) Optionally, to speed up your network: - apply network speedup patch(s) from bug 819734 - turn network pipelining on One of callstacks on the main thread during the hang (pause in VC++): nss3.dll!_PR_MD_WAIT_CV(_MDCVar * cv=0x13938694, _MDLock * lock=0x1393a76c, unsigned int timeout=4294967295) Line 250 C nss3.dll!_PR_WaitCondVar(PRThread * thread=0x020378f8, PRCondVar * cvar=0x13938620, PRLock * lock=0x1393a750, unsigned int timeout=4294967295) Line 173 C nss3.dll!PR_Wait(PRMonitor * mon=0x156b58b0, unsigned int ticks=4294967295) Line 152 + 0x1a bytes C xul.dll!nsThreadStartupEvent::Wait() Line 181 + 0xb bytes C++ xul.dll!nsThread::Init() Line 349 + 0x7 bytes C++ xul.dll!nsThreadManager::NewThread(unsigned int creationFlags=0, unsigned int stackSize=0, nsIThread * * result=0x0023d3b8) Line 215 + 0x7 bytes C++ xul.dll!nsThreadPool::PutEvent(nsIRunnable * event=0x156b3648) Line 90 C++ xul.dll!nsThreadPool::Dispatch(nsIRunnable * event=0x156b3648, unsigned int flags=0) Line 232 C++ > xul.dll!mozilla::image::RasterImage::DecodePool::RequestDecode(mozilla::image::RasterImage * aImg=0x0fef6fe8) Line 3635 + 0x9 bytes C++ xul.dll!mozilla::image::RasterImage::DecodeDoneWorker::Run() Line 3925 C++ xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x0023d447) Line 627 + 0x6 bytes C++ xul.dll!NS_ProcessNextEvent(nsIThread * thread=0x010407f0, bool mayWait=true) Line 238 + 0xd bytes C++ There were about 125 threads in the whole application where about 80 were image decoding threads. The biggest thread pool number were even more scary: 39571 ! Why so many threads? Is it possible that combination of fast thread switching setup by beginTimePeriod(1) and so many threads + the sync code to wait for each start up could cause a significant UI jank?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 1

•

12 years ago

We definitely should bound the number of image decoding threads!

Assignee: nobody → joe

Joe Drew (not getting mail)

Assignee

Comment 2

•

12 years ago

3516 if (gDecodingThreadLimit <= 0) { 3517 mThreadPool->SetThreadLimit(std::max(PR_GetNumberOfProcessors() - 1, 1)); 3518 } else { 3519 mThreadPool->SetThreadLimit(static_cast<uint32_t>(gDecodingThreadLimit)); 3520 } Is that not sufficient? Maybe there's a bug in the thread pool? Or maybe I should set a custom idle thread limit too?

Honza Bambas (:mayhemer)

Reporter

Comment 3

•

12 years ago

gDecodingThreadLimit is by default -1, so you set the limit to the number of cores - 1. I checked PR_GetNumberOfProcessors() returns a sane number. (BTW, when it returns 0, we pass 0xffffffff to SetThreadLimit.) idleThreadTimeout is by default 60 seconds and you don't change it. Problem is that you leave idleThreadLimit at 1. Threads are then dying almost instantly, even though timeout is those 60 seconds. I think it's wasting. Use mThreadPool->SetIdleThreadLimit() to set the number of idle thread to leave, I've played with setting it to the same number as the overall thread limit. Works well. However, I still don't understand why I've seen active 125 threads at a single time... BTW, I cannot reproduce it!

Joe Drew (not getting mail)

Assignee

Comment 4

•

12 years ago

Attached patch set idle limit — Details — Splinter Review

I originally had this, but I thought it would be unnecessary. I guess it isn't!

Attachment #739377 - Flags: review?(seth)

Attachment #739377 - Flags: feedback?(honzab.moz)

Joe Drew (not getting mail)

Assignee

Comment 5

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=c88f73227b4a

Seth Fowler [:seth] [:s2h]

Comment 6

•

12 years ago

Comment on attachment 739377 [details] [diff] [review] set idle limit Review of attachment 739377 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, though it seems like this bug overall still needs more investigation.

Attachment #739377 - Flags: review?(seth) → review+

Honza Bambas (:mayhemer)

Reporter

Comment 7

•

12 years ago

Comment on attachment 739377 [details] [diff] [review] set idle limit Review of attachment 739377 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/src/RasterImage.cpp @@ +3521,2 @@ > if (gDecodingThreadLimit <= 0) { > + limit = std::max(PR_GetNumberOfProcessors() - 1, 1); Could we also fix a case PR_GetNumberOfProcessors() returns a 0 or even a negative number? Like limit = std::max(PR_GetNumberOfProcessors(), 2) - 1 or so...

Attachment #739377 - Flags: feedback?(honzab.moz) → feedback+

Honza Bambas (:mayhemer)

Reporter

Comment 8

•

12 years ago

And BTW, could you please push also a talos-only try run whether it somehow affects performance?

Joe Drew (not getting mail)

Assignee

Comment 9

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #7) > Comment on attachment 739377 [details] [diff] [review] > set idle limit > > Review of attachment 739377 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: image/src/RasterImage.cpp > @@ +3521,2 @@ > > if (gDecodingThreadLimit <= 0) { > > + limit = std::max(PR_GetNumberOfProcessors() - 1, 1); > > Could we also fix a case PR_GetNumberOfProcessors() returns a 0 or even a > negative number? > > Like limit = std::max(PR_GetNumberOfProcessors(), 2) - 1 or so... Sure, but is this something we really need to worry about? (In reply to Honza Bambas (:mayhemer) from comment #8) > And BTW, could you please push also a talos-only try run whether it somehow > affects performance? https://tbpl.mozilla.org/?tree=Try&rev=69ebe52158f6

Joe Drew (not getting mail)

Assignee

Comment 10

•

12 years ago

This doesn't seem to have affected performance much: http://perf.snarkfest.net/compare-talos/index.html?oldRevs=7712e67a7c1e,5bc732a49eae&newRev=69ebe52158f6&submit=true

Joe Drew (not getting mail)

Assignee

Comment 11

•

12 years ago

I'm pushing this as-is; Honza, if you ever see this again, please report another bug! https://hg.mozilla.org/integration/mozilla-inbound/rev/bb028a52ffe3

Matt Brubeck (:mbrubeck)

Comment 12

•

12 years ago

This appears to cause a regression in the Tp5 Optimized "Private Bytes" memory counter. This regression showed up on Try, but it's not visible in the compare-talos link above because compare-talos doesn't know about Ubuntu: https://bitbucket.org/mconnor/compare-talos/issue/15/add-ubuntu-platforms I'm not sure why this would cause an Ubuntu-only regression (it seems to actually *improve* the same benchmark on Fedora). I retriggered some old changesets to verify whether it's actually caused by a code change or an infrastructure change. There was another suspicious Ubuntu-only "Private Bytes" regression shortly after the Ubuntu platform was first deployed, earlier this month...

Joe Drew (not getting mail)

Assignee

Comment 13

•

12 years ago

It would make some sense that we're using more private bytes, since we actually have more threads in the common case.

Ryan VanderMeulen [:RyanVM]

Comment 14

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/bb028a52ffe3

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla23

Joe Drew (not getting mail)

Assignee

Comment 15

•

12 years ago

Honza, do you think I should ask for this to be uplifted?

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Reporter

Comment 16

•

12 years ago

(In reply to Joe Drew (:JOEDREW! \o/) from comment #15) > Honza, do you think I should ask for this to be uplifted? Maybe b2g could benefit? I have nothing against, but only if there is at least some measurable benefit. Up to you, I'll be glad to see this in sooner :)

Flags: needinfo?(honzab.moz)

Joe Drew (not getting mail)

Assignee

Comment 17

•

12 years ago

Heh, I'll just leave it as is unless we hear complaints. :)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 18

•

12 years ago

Reading bug 856670 comment 38, it sounds like we probably should uplift this to Aurora. There's a stack overflow that's the #2 topcrash there, and there's some reason to believe that the patch in this bug may fix the problem.

Joe Drew (not getting mail)

Assignee

Comment 19

•

12 years ago

Wow, I'd be surprised, but... sure.

Joe Drew (not getting mail)

Assignee

Comment 20

•

12 years ago

Comment on attachment 739377 [details] [diff] [review] set idle limit [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 716140 User impact if declined: Possibly large numbers of threads, maybe some jank or crashes? Testing completed (on m-c, etc.): On m-c for a few days Risk to taking this patch (and alternatives if risky): Not terribly risky. String or IDL/UUID changes made by this patch: none

Attachment #739377 - Flags: approval-mozilla-aurora?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 21

•

12 years ago

I spoke too soon. It looks like bug 854803 is what needed to get backported. Sorry.

Alex Keybl [:akeybl]

Comment 22

•

12 years ago

Please confirm comment 21 by removing the approval nomination.

Joe Drew (not getting mail)

Assignee

Comment 23

•

12 years ago

Comment on attachment 739377 [details] [diff] [review] set idle limit I'll go with what Bill says :)

Attachment #739377 - Flags: approval-mozilla-aurora?