Closed Bug 1511726 Opened 6 years ago Closed 5 years ago

webrender_bindings::program_cache: shader-cache: Shader disk cache is not supported

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
FreeBSD
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox63 --- unaffected
firefox64 --- unaffected
firefox65 --- disabled
firefox66 --- fixed

People

(Reporter: jbeich, Assigned: jbeich)

References

Details

(Keywords: regression)

Attachments

(1 file)

$ pkg info -x mesa
mesa-demos-8.4.0
mesa-dri-18.3.0.rc5
mesa-libs-18.3.0.rc5

$ ./mach bootstrap
$ ./mach build
$ MOZ_ACCELERATED=1 MOZ_WEBRENDER=1 ./mach run about:blank
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: shader-cache: Shader disk cache is not supported (t=0.51388)
[GFX1-]: shader-cache: Shader disk cache is not supported
ERROR 2018-12-02T18:06:21Z: webrender_bindings::program_cache: shader-cache: Shader disk cache is not supported
<Quit>
$ du -sh ~/.cache/mesa_shader_cache
218K    /home/foo/.cache/mesa_shader_cache
Reverting mozilla-central changeset 1ccc107e3483 makes the error go away.
Flags: needinfo?(bobbyholley)
Looks like use_disk_cache requires prof_path which is only set when
- gfx.webrender.program-binary=true
- gfx.webrender.program-binary-disk=true

https://searchfox.org/mozilla-central/rev/2edebf41a2b2/gfx/webrender_bindings/RenderThread.cpp#745-747

suggesting that error!() line confuses "not supported" and "disabled".
I'm not sure I understand the bug. Is it just that we're printing a message to the console that is arguably inaccurate?

I think we'll likely want to get the shader cache working on any platform where we ship WebRender, so I think it's useful to warn when it isn't working. I'm certainly happy to take a patch that replaces "not supported" with "not available", if you think that's better.

> /home/foo/.cache/mesa_shader_cache

To be clear, the mesa shader cache is different from the WR shader cache.
Flags: needinfo?(bobbyholley)
Actually gfxCriticalNote shouldn't crash, so maybe that's not the problem.
Attached patch v0Splinter Review
Like this?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=45dd5fae521c7b783086d7081b0ac04d471bd3e8

(In reply to Bobby Holley (:bholley) from comment #3)
> want to get the shader cache working on any platform where we ship WebRender

Why the shader cache is only enabled on Windows then?

https://searchfox.org/mozilla-central/rev/cfaa5a1d48d6/modules/libpref/init/all.js#922-923

> happy to take a patch replaces "not supported" with "not available", if you think that's better.

Only use_disk_cache=true + cache failure (aka "not supported") needs an error!() but not use_disk_cache=false (aka "disabled"). Or am I missing something?

> To be clear, the mesa shader cache is different from the WR shader cache.

Indeed. After flipping the prefs <profile-dir>/shader-cache is populated.

$ echo 'pref("gfx.webrender.program-binary", true);' >>$MOZ_OBJDIR/tmp/profile-default/user.js
$ echo 'pref("gfx.webrender.program-binary-disk", true);' >>$MOZ_OBJDIR/tmp/profile-default/user.js
$ MOZ_ACCELERATED=1 MOZ_WEBRENDER=1 ./mach run about:blank
<Quit>
$ ls $MOZ_OBJDIR/tmp/profile-default/shader-cache
1797b97650e9574b3f79782ec27557dbc9492cd415920ca10ce7396bc4ca18f2
473d518d4f9446405dda8977f1b34358aa32d51fb814959cbfff65e0c4869f0d
4f88248b3b51cfeea056bd69b7e05e3b6b7c1f1cfdee04330e874b9c269602f5
5ce8f18cfb27a9f91190a2fb8da0c2c772c1bcdb639c7c4c280c0a66979f99de
8b757c7cbcaf3d0df9a4f80fdd6949503db4c8f499085276a0071c8c0d9e5bfb
9a83a70a35d9e2eaf91de1c361bee74940029f6b28fd0c6dabfb58993a8fea30
b836be82bf29c86b278f8d6f32d7ee021d09e1676ef0ead6faca535444b6e560
c67f9b0947e0e5703407f110d91923050a58d90cf07b873299ea11215068f347
ede4b5092e40fa7fe079e4211b4af64cd3e19033815a6c1c924b26b3655af87f
fbf97f06b9fffa12180e2461c9aa51f8a1a59b7c983c21ea497220c300818fef
Attachment #9029376 - Flags: feedback?(bobbyholley)
Attachment #9029376 - Flags: feedback?(bobbyholley) → feedback?(sotaro.ikeda.g)
(In reply to Jan Beich from comment #6)
> Created attachment 9029376 [details] [diff] [review]
> v0
> 
> Like this?
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=45dd5fae521c7b783086d7081b0ac04d471bd3e8
> 
> (In reply to Bobby Holley (:bholley) from comment #3)
> > want to get the shader cache working on any platform where we ship WebRender
> 
> Why the shader cache is only enabled on Windows then?

When I tested locally on Linux(Ubuntu) and mac. I did not saw performance improvement. If there is data that improve the performance. It should be enabled.

> 
> https://searchfox.org/mozilla-central/rev/cfaa5a1d48d6/modules/libpref/init/
> all.js#922-923
> 
> > happy to take a patch replaces "not supported" with "not available", if you think that's better.
> 
> Only use_disk_cache=true + cache failure (aka "not supported") needs an
> error!() but not use_disk_cache=false (aka "disabled"). Or am I missing
> something?

Yes, it is better!
Attachment #9029376 - Flags: feedback?(sotaro.ikeda.g) → feedback+
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> When I tested locally on Linux(Ubuntu) and mac. I did not saw performance
> improvement. If there is data that improve the performance. It should be
> enabled.

On mac, the drivers I tested don't support shader serialization (i.e. glProgramBinary returns null). When we ship on Mac, we'll need to test whether WR is a startup regression. If it is, we may not be able to fix it on OpenGL (i.e. we may need to ship Mac on gfx-rs).
Bobby, we have a user report about a crash with this annotation "Shader disk cache is not supported" https://crash-stats.mozilla.com/report/index/87849bab-d29e-4ab3-b2e2-efa8a0181204#tab-metadata and the user reported that crashes started a couple days ago, do you think this is caused  by this bug? Thanks
Flags: needinfo?(bobbyholley)
(In reply to Pascal Chevrel:pascalc from comment #9)
> Bobby, we have a user report about a crash with this annotation "Shader disk
> cache is not supported"
> https://crash-stats.mozilla.com/report/index/87849bab-d29e-4ab3-b2e2-
> efa8a0181204#tab-metadata and the user reported that crashes started a
> couple days ago, do you think this is caused  by this bug? Thanks

No, I don't think so. We already weren't using the disk cache on Linux, so the error here is probably spurious.

That said, it looks like this user force-enabled WebRender, which isn't yet a supported configuration on Linux. We should ask the user to stop force-enabling WR and see if the problem goes away.
Flags: needinfo?(bobbyholley)
Comment on attachment 9029376 [details] [diff] [review]
v0

We could also get rid of the null check in wr_try_load_shader_from_disk while we're at it. r=me with that fix.
Attachment #9029376 - Flags: review+
Not sure how it does relates, but I've started seeing the same error message since a few days, and it seems to be related to at some point, some WebRender thread eating a lot lot of CPU, slowing down my system, and spamming with errors like:
> Dec  7 17:45:44 portable-alex firefox-bin[7488]: message repeated 55 times: [ Failed to measure available space: L’emplacement indiqué n’est pas pris en charge]
(In reply to Alexandre LISSY :gerard-majax from comment #13)
> Not sure how it does relates, but I've started seeing the same error message
> since a few days, and it seems to be related to at some point, some
> WebRender thread eating a lot lot of CPU, slowing down my system, and
> spamming with errors like:
> > Dec  7 17:45:44 portable-alex firefox-bin[7488]: message repeated 55 times: [ Failed to measure available space: L’emplacement indiqué n’est pas pris en charge]

And yes, I've force-enabled WR on Linux, so I should just deal with it, but still, there's something going on.
(In reply to Alexandre LISSY :gerard-majax from comment #13)
> Not sure how it does relates, but I've started seeing the same error message
> since a few days, and it seems to be related to

Can you clarify what leads you to believe the message is related? I'm pretty sure behavior on non-Windows hasn't changed, aside from displaying the message when we decide not to use the disk cache, whereas we previously did this silently.

> at some point, some
> WebRender thread eating a lot lot of CPU, slowing down my system, and
> spamming with errors like:
> > Dec  7 17:45:44 portable-alex firefox-bin[7488]: message repeated 55 times: [ Failed to measure available space: L’emplacement indiqué n’est pas pris en charge]

If you could post a profiler link, that would be helpful!


(In reply to Alexandre LISSY :gerard-majax from comment #14)
> And yes, I've force-enabled WR on Linux, so I should just deal with it, but
> still, there's something going on.

We definitely still want to know about Linux issues, especially if you can help track them down. If the fix is complicated we may not be able to work on it immediately, but we'll be shipping on Linux eventually so the reports are still important.
(In reply to Bobby Holley (:bholley) from comment #15)
> (In reply to Alexandre LISSY :gerard-majax from comment #13)
> > Not sure how it does relates, but I've started seeing the same error message
> > since a few days, and it seems to be related to
> 
> Can you clarify what leads you to believe the message is related? I'm pretty
> sure behavior on non-Windows hasn't changed, aside from displaying the
> message when we decide not to use the disk cache, whereas we previously did
> this silently.

My suspiction comes from the combination of behavior: error message, looking at the rust code flow, seems to have some link with the cache_path not being setup properly, since I'm on an unsupported platform, and with the spamming of syslog, my take was that maybe it was checking available space on the cache_path with some empty / unexpected value, thus triggering the high CPU usage and the syslog spamming.

> 
> > at some point, some
> > WebRender thread eating a lot lot of CPU, slowing down my system, and
> > spamming with errors like:
> > > Dec  7 17:45:44 portable-alex firefox-bin[7488]: message repeated 55 times: [ Failed to measure available space: L’emplacement indiqué n’est pas pris en charge]
> 
> If you could post a profiler link, that would be helpful!

Sure, but the browser being hardly usable when it starts, and since it does start at random, it's going to be complicated to get that :)

> 
> 
> (In reply to Alexandre LISSY :gerard-majax from comment #14)
> > And yes, I've force-enabled WR on Linux, so I should just deal with it, but
> > still, there's something going on.
> 
> We definitely still want to know about Linux issues, especially if you can
> help track them down. If the fix is complicated we may not be able to work
> on it immediately, but we'll be shipping on Linux eventually so the reports
> are still important.

I can surely understand that.
(In reply to Alexandre LISSY :gerard-majax from comment #16)
> (In reply to Bobby Holley (:bholley) from comment #15)
> > (In reply to Alexandre LISSY :gerard-majax from comment #13)
> > > Not sure how it does relates, but I've started seeing the same error message
> > > since a few days, and it seems to be related to
> > 
> > Can you clarify what leads you to believe the message is related? I'm pretty
> > sure behavior on non-Windows hasn't changed, aside from displaying the
> > message when we decide not to use the disk cache, whereas we previously did
> > this silently.
> 
> My suspiction comes from the combination of behavior: error message, looking
> at the rust code flow, seems to have some link with the cache_path not being
> setup properly, since I'm on an unsupported platform, and with the spamming
> of syslog, my take was that maybe it was checking available space on the
> cache_path with some empty / unexpected value, thus triggering the high CPU
> usage and the syslog spamming.

The rust code doesn't do any checking for available disk space, though I guess it's possible that the system does somewhere under the hood. I wouldn't expect any of that code to run if gfx.webrender.program-binary-disk isn't set, which it shouldn't be on linux.

> 
> > 
> > > at some point, some
> > > WebRender thread eating a lot lot of CPU, slowing down my system, and
> > > spamming with errors like:
> > > > Dec  7 17:45:44 portable-alex firefox-bin[7488]: message repeated 55 times: [ Failed to measure available space: L’emplacement indiqué n’est pas pris en charge]
> > 
> > If you could post a profiler link, that would be helpful!
> 
> Sure, but the browser being hardly usable when it starts, and since it does
> start at random, it's going to be complicated to get that :)

Maybe you could can run with perf, and then import that into perf.html? That's supported now [1]. :-)

[1] https://mail.mozilla.org/pipermail/firefox-dev/2018-October/006845.html
(In reply to Bobby Holley (:bholley) from comment #17)
> (In reply to Alexandre LISSY :gerard-majax from comment #16)
> > (In reply to Bobby Holley (:bholley) from comment #15)
> > > (In reply to Alexandre LISSY :gerard-majax from comment #13)
> > > > Not sure how it does relates, but I've started seeing the same error message
> > > > since a few days, and it seems to be related to
> > > 
> > > Can you clarify what leads you to believe the message is related? I'm pretty
> > > sure behavior on non-Windows hasn't changed, aside from displaying the
> > > message when we decide not to use the disk cache, whereas we previously did
> > > this silently.
> > 
> > My suspiction comes from the combination of behavior: error message, looking
> > at the rust code flow, seems to have some link with the cache_path not being
> > setup properly, since I'm on an unsupported platform, and with the spamming
> > of syslog, my take was that maybe it was checking available space on the
> > cache_path with some empty / unexpected value, thus triggering the high CPU
> > usage and the syslog spamming.
> 
> The rust code doesn't do any checking for available disk space, though I
> guess it's possible that the system does somewhere under the hood. I
> wouldn't expect any of that code to run if gfx.webrender.program-binary-disk
> isn't set, which it shouldn't be on linux.

As much as I could check, I don't have anything under gfx.webreder.program*, so your assumption holds.

> 
> > 
> > > 
> > > > at some point, some
> > > > WebRender thread eating a lot lot of CPU, slowing down my system, and
> > > > spamming with errors like:
> > > > > Dec  7 17:45:44 portable-alex firefox-bin[7488]: message repeated 55 times: [ Failed to measure available space: L’emplacement indiqué n’est pas pris en charge]
> > > 
> > > If you could post a profiler link, that would be helpful!
> > 
> > Sure, but the browser being hardly usable when it starts, and since it does
> > start at random, it's going to be complicated to get that :)
> 
> Maybe you could can run with perf, and then import that into perf.html?
> That's supported now [1]. :-)
> 
> [1] https://mail.mozilla.org/pipermail/firefox-dev/2018-October/006845.html

Oh nice. When it reproduces, I'll check if I can profile with the GeckoProfiler, and if not, I'll fallback to that!
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d3f7202b48cd
Don't try to use shader disk cache if disabled. r=bholley
Assignee: nobody → jbeich
https://hg.mozilla.org/mozilla-central/rev/d3f7202b48cd
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: