OSXDisplay::EnableVsync hangs in CVDisplayLinkCreateWithActiveCGDisplays
Categories
(Core :: Widget: Cocoa, defect, P2)
Tracking
()
Performance Impact | high |
People
(Reporter: florian, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: perf:responsiveness, Whiteboard: [bhr:CVDisplayLinkCreateWithActiveCGDisplays])
In BHR reports from the 20191207 nightly, we see lots[*] of hangs in:
CVDisplayLinkCreateWithActiveCGDisplays CoreVideo
OSXVsyncSource::OSXDisplay::EnableVsync() XUL
http://queze.net/bhr/test/#row=39&filter=CVDisplayLinkCreateWithActiveCGDisplays
- We have a total of 23,515 hang reports, for a total of 66,610s.
Comment 1•5 years ago
|
||
Did these start with the 20191207 nightly? And exactly which one are you referring to -- https://ftp.mozilla.org/pub/firefox/nightly/2019/12/2019-12-07-09-26-17-mozilla-central/ or https://ftp.mozilla.org/pub/firefox/nightly/2019/12/2019-12-07-21-55-34-mozilla-central/?
Updated•5 years ago
|
Comment 2•5 years ago
|
||
I can't find any of these at crash-stats.mozilla.com, even searching on the "proto signature" (the two hits I do see are irrelevant):
Reporter | ||
Comment 3•5 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #1)
Did these start with the 20191207 nightly?
No, we already had some on the 20191206 nightly, but at a much lower volume. I don't know the exact build id. What made me notice this today is that it's in the second position on my BHR dashboard (http://queze.net/bhr/test/ This ranks the hang stacks by how much user time was wasted). This is more than 6% of the total reported hang time, which seems a lot given it's OS X only.
Reporter | ||
Comment 4•5 years ago
|
||
By 'hangs' for BHR we mean "a period of more than 128ms during which the event loop was blocked".
Comment 5•5 years ago
•
|
||
Thanks for the info. Before I can do anything here, I really do need the build id for the nightly on which these "hangs" increased dramatically.
Markus, do you have any idea which patch might have triggered this problem?
Comment 6•5 years ago
|
||
No, I don't know if that's even the case. I just CC'd you in case you might have ideas what could be going on.
I agree, we need a better "regression range" to be able to find out what's going on here.
Comment 7•5 years ago
•
|
||
What is BHR? Is there any documentation on it? Where does its data come from? How far back does it go?
CVDisplayLinkCreateWithActiveCGDisplays() is a macOS system call, for which there are no obvious alternatives. We probably have to keep using it. And if the reported "hangs" aren't noticeable in everyday use, or even if they are and aren't too annoying, we may not need to do anything about them.
The one case where further examination might be worthwhile is if a number of users are very badly effected. Is there any way to find this out from the BHR data?
Updated•5 years ago
|
Comment 8•5 years ago
|
||
CVDisplayLinkCreateWithActiveCGDisplays() is a macOS system call, for which there are no obvious alternatives. We probably have to keep using it.
That said, we do call this method much more often than Safari or Chrome (see bug 1201401 comment 128). So at some point Mozilla may want to refactor the code that uses CVDisplayLinkCreateWithActiveCGDisplays() to call it less often. That's not something I want to do, though :-)
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #7)
What is BHR?
BHR is Background Hang Reporter.
Is there any documentation on it?
I'm not sure; I don't know any.
Where does its data come from?
It comes from Telemetry, from our Nightly users (it's disabled for beta and release). Whenever the main thread is unresponsive for 128+ms, we capture a stack and send it through telemetry.
How far back does it go?
Telemetry servers have quite a bit of history, but it's not in an easy to access form. I don't have easy access to data for previous days (it's something I would like to add at some point).
The one case where further examination might be worthwhile is if a number of users are very badly effected. Is there any way to find this out from the BHR data?
I don't have an easy way to find out right now, but the data exists on the servers as for every hang report a stack + the hang duration is sent to the telemetry servers.
Comment 10•5 years ago
|
||
From looking at the hang data from comment 0, it appears a "typical" hang is an average of 2-4s, which is a pretty long and user-observable hang
Comment 11•5 years ago
|
||
it appears a "typical" hang is an average of 2-4s
That surprises me. While working on bug 1201401 I hooked CVDisplayLinkCreateWithActiveCGDisplays() as part of my research into the bug. I'd have noticed such a long hang, but didn't. Florian talked about the problem having got much worse on 2019-12-07. We may have wonky data. Let's see how things change over time.
Comment 12•5 years ago
|
||
Here, "typical" means "typical among hangs that already take more than 128ms", or in other words: Calls to this function are either fast or really slow.
Maybe this happens when connecting or disconnecting external monitors? During those transitions, the window server is often blocked for a second or more, so any function that waits for a reply from the window server will show up as hanging, and this function might just be the first of those that we call when this happens. But that still doesn't explain why its hang rate has increased recently. Unless this is just new data from all the users that would have crashed in the past.
Comment 13•5 years ago
|
||
The priority flag is not set for this bug.
:spohl, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•5 years ago
|
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Updated•3 years ago
|
Comment 15•3 years ago
|
||
And with that I mean, I'll take a look at rewriting our vsync code. I'd like to stop using CVDisplayLinkCreateWithActiveCGDisplays altogether, and instead switch to per-window vsync where we create the display link for the correct (virtual) screen. We should also switch to a model where we keep the display link alive and disable+reenable it, rather than destroying and recreating it, which is what we do at the moment.
Updated•3 years ago
|
Comment 16•2 years ago
|
||
The Performance Priority Calculator has determined this bug's performance priority to be P1.
Platforms: macOS
Impact on browser UI: Renders browser effectively unusable
[x] Bug affects multiple sites
[x] Multiple reporters
Comment 17•2 years ago
|
||
The severity field for this bug is set to S3. However, the Performance Impact
field flags this bug as having a high impact on the performance.
:mstange, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact
flag to ?
?
For more information, please visit auto_nag documentation.
Comment 18•2 years ago
|
||
I haven't gotten a chance to work on bug 1759232 more, but I don't think there's an alternative for this particular bug. This is definitely still worth fixing properly.
Comment 19•8 months ago
|
||
:sphol can you review this bug and determine if the severity matches the performance impact? Same message as comment 17 - a High impact would translate to a S2. If you feel this is an S3, can you comment on the mismatch between our calculations and yours and we can reassess the impact?
Comment 20•8 months ago
|
||
Markus is better positioned to evaluate this.
Comment 21•7 months ago
|
||
I don't see CVDisplayLinkCreateWithActiveCGDisplays
on http://queze.net/bhr/test/ at the moment, so maybe this is lower priority now.
Reporter | ||
Comment 22•7 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #21)
I don't see
CVDisplayLinkCreateWithActiveCGDisplays
on http://queze.net/bhr/test/ at the moment, so maybe this is lower priority now.
It's not in the top 50 hangs, but if you search for it, you see it has 225 hangs reported for a total of 128s on the 20240417 Nightly builds.
Comment 23•7 months ago
|
||
Oh! I missed that. Ok, then the high priority on this bug is probably still adequate.
Comment 24•7 months ago
|
||
:spohl based on comment 23 this still seems to be a High impact, which we suggest be associated with an S2 bug. Do you agree with the assessment or are there other factors which would keep this bug from being marked S2?
Comment 25•6 months ago
|
||
Adding NI for attention to comment 24
Updated•6 months ago
|
Description
•