Closed Bug 1446553 Opened 3 years ago Closed 3 years ago

Sometimes gfxPlatform is not initialized when we're in nsWindow::Create()

Categories

(Core :: Graphics, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: snorp, Assigned: snorp)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file)

In bug 1401455, we ran into the problem where using gfxVars crashes in nsWindow::Create() because gfxPlatform had not initialized it yet. Some questions around this:

1) Is it ok to use gfxVars without first checking if gfxPlatform is initialized?
2) Should gfxVars fail gracefully or should we just make sure it's available in expected places?
Whiteboard: [gfx-noted]
Comment on attachment 8959713 [details]
Bug 1446553 - Init gfxPlatform before checking if WebRender will be used

https://reviewboard.mozilla.org/r/228536/#review235996

::: commit-message-b84f9:2
(Diff revision 2)
> +
> +MozReview-Commit-ID: IzjHxnQ6ayk

What is the problem that this is fixing?
Have you observed the first window in typical usage being created before gfxPlatform is initialized?
Would initializing gfxPlatform earlier resolve this problem?

::: widget/gtk/nsWindow.cpp:3646
(Diff revision 2)
> -        // before the widget is realized.
> -        if (mIsX11Display && useWebRender) {
> +        // before the widget is realized. We cannot use gfxVars to determine
> +        // if WebRender will be used, because that is not reliable until
> +        // the Compositor is started, which is too late.
> +        if (mIsX11Display) {

gfxPlatform::Init() calls InitWebRenderConfig(), which conditionally calls SetUseWebRender(true).
GPUProcessManager may subsequently disable this, but that doesn't prevent calling UseWebRender() before the compositor is started.

Calling UseWebRender() to check whether it might be used would be better than just assuming that it will be used.
Attachment #8959713 - Flags: review?(karlt)
Duplicate of this bug: 1448298
Comment on attachment 8959713 [details]
Bug 1446553 - Init gfxPlatform before checking if WebRender will be used

https://reviewboard.mozilla.org/r/228536/#review235996

> gfxPlatform::Init() calls InitWebRenderConfig(), which conditionally calls SetUseWebRender(true).
> GPUProcessManager may subsequently disable this, but that doesn't prevent calling UseWebRender() before the compositor is started.
> 
> Calling UseWebRender() to check whether it might be used would be better than just assuming that it will be used.

Right, but gfx folks are not up for changing the init order here. And the value could change anyway, so why rely on it at all? There is little downside to preparing for WR even if we don't use it. I guess there could be more memory usage due to the unused depth buffer, but my guess is that gets allocated on-demand anyway. I can try to check.
Jumping in since I was discussing this with snorp on IRC...

(In reply to Karl Tomlinson (:karlt) from comment #3)
> What is the problem that this is fixing?
> Have you observed the first window in typical usage being created before
> gfxPlatform is initialized?

Yes, this is the problem as I understand it (see comment 0, and that's why the fix in place at the moment checks to see if gfxPlatform is initialized).

> Would initializing gfxPlatform earlier resolve this problem?

Probably, but I don't know if that's a good idea as moving this init earlier can introduce seemingly-random timing changes. In the past I've seen this cause startup performance to be impacted. I don't want to rule this out as an option completely but I'd prefer looking at other options first.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> (In reply to Karl Tomlinson (:karlt) from comment #3)
> > What is the problem that this is fixing?
> > Have you observed the first window in typical usage being created before
> > gfxPlatform is initialized?
> 
> Yes, this is the problem as I understand it (see comment 0, and that's why
> the fix in place at the moment checks to see if gfxPlatform is initialized).

Thank you for explaining the assumptions being made here.

I assume comment 0 refers to
https://bugzilla.mozilla.org/show_bug.cgi?id=1401455#c71

The analysis is somewhat superficial, but if there are few mochitest failures,
then I assume the issue is specific to migration, or opening windows without a
profile or something.  This is not a typical window.

It is not clear that there is any real problem, beyond web render not being
used for the migration window.  It is not clear why a shipping configuration
should suffer so that an in-progress configuration can be supported on the
migration window.

> > Would initializing gfxPlatform earlier resolve this problem?
> 
> Probably, but I don't know if that's a good idea as moving this init earlier
> can introduce seemingly-random timing changes. In the past I've seen this
> cause startup performance to be impacted. I don't want to rule this out as
> an option completely but I'd prefer looking at other options first.

gfxPlatform must be initialized before drawing to the first window.
Perhaps initializing on creation of the window may cause some timing changes,
but there is no clear extra work required.

dlopen()ing libGL.so OTOH is clearly extra work.

James reply to 
https://reviewboard.mozilla.org/r/228536/#comment300398
(which for some reason was not recorded in bugzilla)
suggests this became a problem only recently,
perhaps due to a change in the timing of gfxPlatform initialization so that it
now happens after the window has been created.  Restoring the order to the
previous order does not sound like it would be anything to fear.
Comment on attachment 8959713 [details]
Bug 1446553 - Init gfxPlatform before checking if WebRender will be used

https://reviewboard.mozilla.org/r/228536/#review235996

> Right, but gfx folks are not up for changing the init order here. And the value could change anyway, so why rely on it at all? There is little downside to preparing for WR even if we don't use it. I guess there could be more memory usage due to the unused depth buffer, but my guess is that gets allocated on-demand anyway. I can try to check.

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #5)
> And the value could change anyway, so why rely on it at all?

It won't change in the common case of the shipping default configuration.

> There is little downside to preparing for WR even if we don't use it.

How can you make that assertion when doing so requires loading a library in a
process that will not otherwise use the library?

> I guess there could be more memory usage due to the unused depth buffer, but
> my guess is that gets allocated on-demand anyway. I can try to check.

If the depth buffer is allocated on demand, then why would the depth buffer
need to be requested at window creation?
Comment on attachment 8959713 [details]
Bug 1446553 - Init gfxPlatform before checking if WebRender will be used

https://reviewboard.mozilla.org/r/228536/#review236502
Attachment #8959713 - Flags: review?(karlt)
(In reply to Karl Tomlinson (:karlt) from comment #7)
> The analysis is somewhat superficial, but if there are few mochitest
> failures,
> then I assume the issue is specific to migration, or opening windows without
> a
> profile or something.  This is not a typical window.
> 
> It is not clear that there is any real problem, beyond web render not being
> used for the migration window.  It is not clear why a shipping configuration
> should suffer so that an in-progress configuration can be supported on the
> migration window.

Agreed. I didn't realize this codepath dlopen's libGL.so and therefore has a perf impact. I thought it was just passing a different set of flags.

> James reply to 
> https://reviewboard.mozilla.org/r/228536/#comment300398
> (which for some reason was not recorded in bugzilla)
> suggests this became a problem only recently,
> perhaps due to a change in the timing of gfxPlatform initialization so that
> it
> now happens after the window has been created.  Restoring the order to the
> previous order does not sound like it would be anything to fear.

Fair enough. I'd be interested in knowing when this change in timing got introduced, but I don't know how reliably it happens and how easy it would be to bisect. But yeah, I guess we can just do the gfxPlatform initialization earlier and keep an eye out for talos/telemetry regressions.
(In reply to Karl Tomlinson (:karlt) from comment #8)
> Comment on attachment 8959713 [details]
> Bug 1446553 - Unconditionally prepare a WebRender-ready window on Linux/X11
> 
> https://reviewboard.mozilla.org/r/228536/#review235996
> 
> > Right, but gfx folks are not up for changing the init order here. And the value could change anyway, so why rely on it at all? There is little downside to preparing for WR even if we don't use it. I guess there could be more memory usage due to the unused depth buffer, but my guess is that gets allocated on-demand anyway. I can try to check.
> 
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #5)
> > And the value could change anyway, so why rely on it at all?
> 
> It won't change in the common case of the shipping default configuration.
> 
> > There is little downside to preparing for WR even if we don't use it.
> 
> How can you make that assertion when doing so requires loading a library in a
> process that will not otherwise use the library?

Are you worried about the address space occupied or the time to load the lib and make a call? Or something else?

If it's a startup perf regression you're concerned about, unconditionally calling the GLX function takes ~60-70ms on my machine. Calling gfxPlatform::GetPlatform() takes 100-150ms.

> 
> > I guess there could be more memory usage due to the unused depth buffer, but
> > my guess is that gets allocated on-demand anyway. I can try to check.
> 
> If the depth buffer is allocated on demand, then why would the depth buffer
> need to be requested at window creation?

There is a difference between requesting access to a thing and actually using that thing. I'm just guessing that it's lazily allocated, I'm sure it differs based on drivers/hardware. I couldn't really see much of a difference here locally.

Anyway I don't really care what we do here, I just want WebRender to be able to work. If you want the early gfx init, that's fine by me.
Duplicate of this bug: 1448941
@karlt: FYI the fact that snorp is hitting this all the time is a result of bug 1447719 which landed last week. I believe it changed things so that the window is now somehow initialized sooner, before what would have been the first gfxPlatform init call. So moving the gfxPlatform init call sooner is probably the right fix here, although it might take away some of the perf improvements reported in bug 1447719.
Blocks: 1447719
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #11)
> Are you worried about the address space occupied or the time to load the lib
> and make a call? Or something else?

I'm concerned about the time from start-up to drawing something useful.

I'm not concerned with address space.  The extra real memory used by a shared
library load should also not be significant, but I don't know how much
initialization the library performs, and the load may cost significantly
more if there are still implementations that do not use position independent
code.

> If it's a startup perf regression you're concerned about, unconditionally
> calling the GLX function takes ~60-70ms on my machine.

That's a big cost if startup times are of the order of 200-300ms.
It's significant even if startup times are of the order of 400-500ms.

I'm expecting that setting up for WR should not add anything to the startup
time for builds that are not expected to use WR.

> Calling gfxPlatform::GetPlatform() takes 100-150ms.

It is disturbing that GetPlatform() takes this long, but GetPlatform will need
to be called before anything useful can be displayed, and so it is a price
that we need to pay at some point.

nsWindow::OnExposeEvent() calls GetLayerManager() calls CreateCompositor()
calls GetPlatform().

I'm hoping that calling it earlier is not going to add to the total startup time.
If for some reason total startup time is changed, then we need to look for another solution.

> Anyway I don't really care what we do here, I just want WebRender to be able
> to work. If you want the early gfx init, that's fine by me.

Opening libGL is not an option because of the impact on startup.

I would have expected that calling GetPlatform() earlier would have little
effect on the time to show something useful, and so I suggest trying that
approach.
Flags: needinfo?(karlt)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14)
> @karlt: FYI the fact that snorp is hitting this all the time is a result of
> bug 1447719 which landed last week. I believe it changed things so that the
> window is now somehow initialized sooner, before what would have been the
> first gfxPlatform init call. So moving the gfxPlatform init call sooner is
> probably the right fix here, although it might take away some of the perf
> improvements reported in bug 1447719.

Thank you for finding that out.

I'm somewhat puzzled how that change made such a difference to ts_paint
because ts_paint measures time to load and paint tspaint_test.html.  I wonder
how such a chunk of work could be skipped when browser.startup.blankWindow is
false.

gfxPlatform would need to be initialized before tspaint_test.html can be
painted, and so if the benchmark is working correctly, then it needs to
include the time to initialize gfxPlatform.  The huge change in results with
browser.startup.blankWindow makes me wonder whether it is working correctly.

I also suspect the changes for bug 1447719 may not be producing the desired
result because the window appears black, not white as would be expected for
about:blank, and so I wonder whether even about:blank is really being painted.
(In reply to Karl Tomlinson (:karlt) from comment #16)

> I'm somewhat puzzled how that change made such a difference to ts_paint
> because ts_paint measures time to load and paint tspaint_test.html.  I wonder
> how such a chunk of work could be skipped when browser.startup.blankWindow is
> false.
> 
> gfxPlatform would need to be initialized before tspaint_test.html can be
> painted, and so if the benchmark is working correctly, then it needs to
> include the time to initialize gfxPlatform.  The huge change in results with
> browser.startup.blankWindow makes me wonder whether it is working correctly.

It's very likely that ts_paint is broken, because it reported a huge improvement on Linux and no change on Windows. Telemetry reports a 50+% win on both Linux and Windows.

Like for many talos tests, it's non-obvious what it measures, and it's pretty clear that it doesn't match what real users experience (eg. for users cold startup matters a lot, and Talos excludes that case to reduce noise).

> I also suspect the changes for bug 1447719 may not be producing the desired
> result because the window appears black, not white as would be expected for
> about:blank, and so I wonder whether even about:blank is really being
> painted.

A few people have reported the blank window being black. This seems to happen mostly when hardware acceleration is disabled (bug 1448146). I suspect the black paint there is us painting a buffer that has been initialized to 0 and could be initialized to 1 instead, but I really don't know this code. Any help here would be appreciated.


About the cost of various things discussed above:

nsBaseWidget::CreateCompositor is very expensive (at least on Windows) when it creates a GPU process (see https://perfht.ml/2pLpOCW for a dramatically slow example).
I would like to avoid creating the GPU process to show the blank window for 2 reasons: 1. It's sometimes very slow (especially when antivirus software is installed on the machine, and insists on scanning the process at startup) 2. Starting GPU-enabled stuff before we had a chance to run the SanityTest that's supposed to disable GPU usage when its driver is crashy sounds like a recipe for startup crashes on release users with odd driver versions. I'll probably need help to figure out how we can draw about:blank without graphic acceleration and turn it on later.

gfxPlatform::GetPlatform is typically called before we paint in the profiles I've seen with browser.startup.blankWindow enabled, eg https://perfht.ml/2pOt5RP
The things that have a visible cost in gfxPlatform::GetPlatform are:
- stuff related to color profiles
- gfxWindowsPlatform::CreatePlatformFontList Not visible in the above profile, but this does a lot of main thread I/O, and on cold startup if the disk is busy, this can take *several seconds*, see bug 1358927 comment 22. I'm hoping it may be possible to skip initializing the list of fonts to paint about:blank.
Thank you for helpful comments, Florian.

(In reply to Florian Quèze [:florian] from comment #17)
> nsBaseWidget::CreateCompositor is very expensive (at least on Windows) when
> it creates a GPU process (see https://perfht.ml/2pLpOCW for a dramatically
> slow example).
> I would like to avoid creating the GPU process to show the blank window for
> 2 reasons: 1. It's sometimes very slow (especially when antivirus software
> is installed on the machine, and insists on scanning the process at startup)

That should be doable, on Linux at least.

> 2. Starting GPU-enabled stuff before we had a chance to run the SanityTest
> that's supposed to disable GPU usage when its driver is crashy sounds like a
> recipe for startup crashes on release users with odd driver versions.

Bug 1401455 (on Linux) was presumed to be due to not selecting the right kind
of window for use in drawing with webrender (GPU).  With the right kind of
window, we could first draw without the GPU and then switch to the GPU, but
apparently we need to query the graphics system for the right kind of window,
which would be a cost.  Linux has a different sanity test, and so that may not
be such an issue.

This is not a problem we need to care about right now because this will only
happen when web render is enabled, which is not the shipping configuration,
and I doubt Linux will be shipping this configuration soon.

> gfxPlatform::GetPlatform is typically called before we paint in the profiles
> I've seen with browser.startup.blankWindow enabled, eg
> https://perfht.ml/2pOt5RP

That's great to know thanks.

> The things that have a visible cost in gfxPlatform::GetPlatform are:
> - stuff related to color profiles
> - gfxWindowsPlatform::CreatePlatformFontList Not visible in the above
> profile, but this does a lot of main thread I/O, and on cold startup if the
> disk is busy, this can take *several seconds*, see bug 1358927 comment 22.
> I'm hoping it may be possible to skip initializing the list of fonts to
> paint about:blank.

I suspect a rethink of gfxPlatform initialization to lazily initialize smaller
chunks may help.  e.g. about:blank should not need fonts.  However, layout can
be very text focused and it may need major changes to avoid accessing fonts.
Blocks: 1401455
Comment on attachment 8959713 [details]
Bug 1446553 - Init gfxPlatform before checking if WebRender will be used

https://reviewboard.mozilla.org/r/228536/#review238708

::: widget/gtk/nsWindow.cpp:3646
(Diff revision 3)
>          if (Preferences::GetBool("mozilla.widget.use-argb-visuals", false))
>              useAlphaVisual = true;
>  
> -        bool useWebRender = gfxPlatform::Initialized() &&
> -            gfx::gfxVars::UseWebRender() &&
> +        // Ensure gfxPlatform is initialized, since that is what initializes
> +        // gfxVars, used below.
> +        Unused << gfxPlatform::GetPlatform();

GetPlatform is not MOZ_MUST_USE, and so Unused should not be required, but I assume it doesn't hurt either.
Attachment #8959713 - Flags: review?(karlt) → review+
(In reply to Florian Quèze [:florian] from comment #17)

> I would like to avoid creating the GPU process to show the blank window for
> 2 reasons: 1. It's sometimes very slow (especially when antivirus software
> is installed on the machine, and insists on scanning the process at startup)
> 2. Starting GPU-enabled stuff before we had a chance to run the SanityTest
> that's supposed to disable GPU usage when its driver is crashy sounds like a
> recipe for startup crashes on release users with odd driver versions. I'll
> probably need help to figure out how we can draw about:blank without graphic
> acceleration and turn it on later.

I filed bug 1450293 for this.
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/38dc14aea22e
Init gfxPlatform before checking if WebRender will be used r=karlt
https://hg.mozilla.org/mozilla-central/rev/38dc14aea22e
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Assignee: nobody → snorp
You need to log in before you can comment on or make changes to this bug.