Some sp3 tests (especially NewsSite) spend an appreciable amount of time in RasterImage::NotifyDecodeComplete ("notify images")
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
People
(Reporter: mstange, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
NewsSite-Next spends 4% of its time in RasterImage::NotifyDecodeComplete
, NewsSite-Nuxt spends 1.1% in it.
I can't find the equivalent work in the Chrome profiles anywhere.
NewsSite-Next: https://share.firefox.dev/3UcDiqv (4753 samples)
NewsSite-Nuxt: https://share.firefox.dev/3Ytgw06 (1051 samples)
Furthermore, some time is spent in AsyncNotifyRunnable::Run
.
NewsSite-Next: https://share.firefox.dev/40b4hGy (345 samples)
NewsSite-Nuxt: https://share.firefox.dev/40azYju (309 samples)
If, somehow, we were able to eliminate all of this time completely, we would improve the overall sp3 score by 0.5%.
Updated•4 months ago
|
Comment 1•4 months ago
|
||
Nice find.
Wow, basically 3 seconds in addref is most of the first profile. Maybe our observers are more well behaved and we can avoid addref'ing now?
Comment 2•4 months ago
|
||
These profiles have some surprising results. Filtering stacks on "addref" gives us about 17% of all samples (seems reasonable). Of these samples 87% are AddRef'ing when sending image notifications. Seems too high for image notifications to be responsible for 87% of all addref time.
I tried to replicate this myself. I used the built in profiler on a Windows and a mac machine. I did 100 iterations of NewsSite-Next and NewsSite-Nuxt. In both cases addref'ing for image notifications take much much less of the time.
https://share.firefox.dev/4f8tgP4
https://share.firefox.dev/4eSmTQc
So something is different about the profiles that I'm doing and the ones posted in the first comment here. Maybe it's a difference of the hardware being used? Perhaps its a difference between the built in profiler and whatever profiler was used for the profiles in comment 0?
Markus, do you have any thoughts about that difference?
Comment 3•4 months ago
|
||
In fact, just addref'ing to send image notification is fully 15% (72k of 483k samples) of the entire profile from the first profile in comment 0 (after expanding to show all samples). If that figure is accurate then is there something that makes these specific virtual addref calls more expensive then other virtual addref calls? Or is that the price we pay for all virtual addref calls?
Reporter | ||
Comment 4•4 months ago
|
||
That is very curious and I don't have an answer at the moment. I'll try to see if it's a machine issue by comparing a samply profile with a Gecko profiler profile.
Comment 5•4 months ago
|
||
What is the hardware that that profile was done on?
I will also try using samply to see if I get different results.
Comment 6•4 months ago
|
||
I tried on a very old under powered machine (with the theory that slower memory access, smaller cpu caches, and less intelligent pre-fetching might cause slower virtual function calls) and as a percent of total time the image notifications work is actually less there.
Comment 7•4 months ago
|
||
I tried samply on macos, it's results are in line with what I see from the built in profiler.
![]() |
||
Updated•4 months ago
|
Comment 8•4 months ago
|
||
I tried the built in profiler on what should be an identical machine to what was used for the profile in comment 0
https://share.firefox.dev/3AkUB1R
The sum of time spent in stacks that include IProgressObserver or imgINotificationObserver (which double counts a lot) is about 3%. This is inline with what I've been seeing locally.
Comment 9•4 months ago
|
||
And a profile with samply
https://share.firefox.dev/3YOHPlp
Sum of IProgressObserver or imgINotificationObserver stacks is about 4%. This number is 30% for the profile in comment 1.
Comment 10•4 months ago
|
||
And a samply profile with the recommended command line from Markus
/c/samply/samply/target/release/samply.exe record --browsers -r 4000 --breakpad-symbol-server https://symbols.mozilla.org/ ETW_ENABLED=1 JIT_OPTION_enableICFramePointers=true JIT_OPTION_onlyInlineSelfHosted=true JIT_OPTION_emitInterpreterEntryTrampoline=true "C:\Program Files\Firefox Nightly\firefox.exe" --wait-for-browser
https://share.firefox.dev/3UC08YI
Similar result, about 4% of time in IProgressObserver or imgINotificationObserver.
Comment 11•4 months ago
|
||
Working with Markus, it seems the difference was the iteration count. I was using something around 40. Markus' profile used 200 iterations. Doing 200 iterations I can reproduce a similar looking profile.
Comment 12•4 months ago
•
|
||
Some observations.
The number of notifications per iteration stays the same whether we are doing 20 iters or 200 iters. So the time taken to do a single notification increases with the total number of iterations. The time taken to do notifications seems to monotonically increase as we do iterations.
It takes approx 20000 ms to do 10000 discard notifications, 2ms per notification (almost all of the time is spent doing addref). This is the average over all 200 iters, so it's probably at least 4ms per notification/addref towards the end of the iterations.
All of the discard notifications are for the same RasterImage object (makes sense since the test uses the same image repeatedly as a stand in). So this makes it less likely (though still possible) that it is long memory accesses to fetch something that is not in any cache (due to memory contention/thrashing elsewhere). Perhaps the long cycle collections that bug 1900470 is about plays a role here. Towards the end of the iterations I can see that we are basically alternating a long CC, followed by a long time in these notifications.
Samply seems to be more reliable to trigger the issue. Using the built in profiler the effect doesn't seem as exaggerated.
Comment 13•4 months ago
|
||
And the problem is not limited to Windows/the specific model the original profile was on. I can reproduce on a mac as well.
Comment 14•4 months ago
|
||
Doing 40 iterations, then manually reloading the page and repeating 5 times (so 200 total iters) doesn't show the same amount of time spent addref during these image notifications. So that sort of feels like it is something about putting pressure on the memory somehow.
Comment 15•3 months ago
|
||
This comment https://bugzilla.mozilla.org/show_bug.cgi?id=1927919#c3 sounds like it could be quite relevant here as well. Posting so I remember.
Comment 16•2 months ago
|
||
The evidence right now is pointing towards the overhead of the sampling at 4000Hz by the profiler is skewing the impact of this code because it is cache bound towards the end of the profile. That seems more like an S3.
Description
•