Open Bug 1479372 Opened 6 years ago Updated 2 years ago

images being replaced with grey rectangles; looks like gfx/platform issues (Spin Off from Bug 1237654)

Categories

(Core :: Graphics, defect, P3)

defect

Tracking

()

Tracking Status
firefox-esr60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- fix-optional
firefox66 --- fix-optional

People

(Reporter: pevar, Unassigned)

References

Details

(Keywords: multiprocess, regression, regressionwindow-wanted, Whiteboard: [gfx-noted])

Attachments

(6 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Firefox/60.0
Build ID: 20180720133932

Steps to reproduce:

Created from that bug as per request of :Gijs


images being replaced with grey rectangles, or just a plain dark/black background without the image, look like gfx/platform issues


Actual results:

https://bugzilla.mozilla.org/show_bug.cgi?id=1237654#c40

Stared with multi-process enabled by default,disabled browser.tabs.remote.autostart;false fixes it
Summary: Spin Off from Bug 1237654 → images being replaced with grey rectangles; looks like gfx/platform issues (Spin Off from Bug 1237654)
Has Regression Range: --- → irrelevant
Has STR: --- → yes
Component: Untriaged → Tabbed Browser
Depends on: 1442573, 1237654
Priority: -- → P2
Summary: images being replaced with grey rectangles; looks like gfx/platform issues (Spin Off from Bug 1237654) → Spin Off from Bug 1237654
Summary: Spin Off from Bug 1237654 → images being replaced with grey rectangles; looks like gfx/platform issues (Spin Off from Bug 1237654)
Version: 35 Branch → unspecified
Status: UNCONFIRMED → NEW
Has Regression Range: irrelevant → no
Has STR: yes → ---
Component: Tabbed Browser → ImageLib
Ever confirmed: true
Keywords: singleprocess
Priority: P2 → --
Product: Firefox → Core
For a screencast of this issue see:

https://bug1237654.bmoattachments.org/attachment.cgi?id=8995803

esp. around e.g. 6 seconds in.

I don't really know if this is something imagelib needs to investigate or gfx or compositor folks or..., but either way it looks broken.

From the reporter:

> sessionstore.jsonlz4 file by u614034

This is https://bugzilla.mozilla.org/attachment.cgi?id=8959923&action=edit .

> without disabling the animation settings,

ie toolkit.cosmeticAnimations.enabled .

> closing tabs causes flashes but not spinners,
> ctrl+tabs sometimes causes flash.
> 
> with cosmetic animation disabled
> closing tabs causes flashes always,
> ctrl+tabs is instant.

The pref influences tab closing behavior - presumably the animation delay gives us more time to do whatever it is we're doing that we're not doing as quickly in the cases where this breaks.

What I don't understand is why there's any difference here between tab closing or tab switching, and particularly how tab switching would be better with the animation disabled than with it enabled.

(In reply to Jason Mechelynck from comment #41)
> posted support info, maybe you guys can fix it soon as you can see how it's
> breaking the work flow

Can you post the support info here? I don't see it here on either of the 2 other bugs...
Flags: needinfo?(pevar)
Attached file aboutsupport.txt
Flags: needinfo?(pevar)
Given info of all laptops showing the issues
with webrender enabled/disabled also.
Flags: needinfo?(gijskruitbosch+bugs)
Going to forward this to folks who know about gfx/imglib.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(aosmond)
Jason, just to confirm, you see this in all of the about:support configurations you attached? Direct3D (about:support.txt), WebRender, and HWA disabled? The screencast from comment 1 gives me some ideas of what went wrong (the buffer data was cleared, reminiscent of bug 1380649, but probably (?) not the same) but the actual backend may be relevant.
Flags: needinfo?(pevar)
Priority: -- → P3
Whiteboard: [gfx-noted]
(In reply to Andrew Osmond [:aosmond] from comment #8)
> Jason, just to confirm, you see this in all of the about:support
> configurations you attached? Direct3D (about:support.txt), WebRender, and
> HWA disabled? The screencast from comment 1 gives me some ideas of what went
> wrong (the buffer data was cleared, reminiscent of bug 1380649, but probably
> (?) not the same) but the actual backend may be relevant.

Hi there
yes it can be seen on all but a bit less when HWA is off.
The screen cast one is Direct3D and the other with Webrender The only way this does not happen is if
using the workaround from the other bug browser.tabs.remote.autostart=false

In the other bug you mentioned there seems to be something about buffer being cleared or something like that,
can the buffer or cache  sizes be set by user very high for images and firefox in general for ram and disk? have plenty ram on two of the laptops system so would like firefox to cache as much as possible and keep it.
Flags: needinfo?(pevar)
AMD NVidia Intel happens on all.
image.mem.surfacecache.size_factor and image.mem.surfacecache.max_size_kb control the size of the image cache.  You can think of their relationship like:

cache size in KB = min( total physical memory in KB / size_factor, max_size_kb )

I'm not sure if the cache size is the problem though, because the main process and each content process gets its own cache. So with e10s you get a lot more potential memory to work with than non-e10s. If increasing the size on non-e10s (or shrinking the cache size on e10s) changed the behaviour, that would certainly be an interesting data point.
(In reply to Andrew Osmond [:aosmond] from comment #11)
> image.mem.surfacecache.size_factor=4 and image.mem.surfacecache.max_size_kb=1048576
> control the size of the image cache.  You can think of their relationship
> like:
> 
> cache size in KB = min( total physical memory in KB / size_factor,
> max_size_kb )
> 
> I'm not sure if the cache size is the problem though, because the main
> process and each content process gets its own cache. So with e10s you get a
> lot more potential memory to work with than non-e10s. If increasing the size
> on non-e10s (or shrinking the cache size on e10s) changed the behaviour,
> that would certainly be an interesting data point.

So what should be the size of image.mem.surfacecache.size_factor & image.mem.surfacecache.discard_factor
on a 32gb laptop

image.mem.surfacecache.max_size_kb for the same
image.mem.surfacecache.min_expiration_ms for the same

would like to test out if increasing the size of cache to the max to see if helps out
Is it like 1048576/4= cache for image?
image.mem.surfacecache.size_factor if set to two menas more cache?

No idea how these work
If the size_factor is 4, then 32 GB / 4 = 8 GB, so if max_size_kb is smaller than that, it will use max_size_kb. I'd recommend changing max_size_kb to 2097152. I forgot that it clamps the size to fit into a 32-bit unsigned integer, so 2GB is the practical limit right now. That at least doubles what you have now, so hopefully that is enough to make a meaningful difference if it is indeed related.

If you don't mind, the other thing you can try for me is run Firefox with image logging turned on, e.g. set the environment variable MOZ_LOG to "imgRequest:5". You can then run firefox and redirect its output to a file. You should see lots of logs. Note that this will log the URLs of the images loaded into the browser into the output file, so be mindful of what you visit/share. If you prefer you can send it to me directly. It would be very useful to have the screencast collected at the same time so I can line up the logs with what is happening on screen. This will give me a sense of whether or not it had to reload the data from the network, etc.
image.mem.surfacecache.min_expiration_ms=360000
image.mem.surfacecache.size_factor=1
image.mem.surfacecache.max_size_kb=3145728

Set this in FF63b64bit

Can see a bit of improvement but need to test it more.

(In reply to Andrew Osmond [:aosmond] from comment #14)
> If the size_factor is 4, then 32 GB / 4 = 8 GB, so if max_size_kb is smaller
> than that, it will use max_size_kb. I'd recommend changing max_size_kb to
> 2097152. I forgot that it clamps the size to fit into a 32-bit unsigned
> integer, so 2GB is the practical limit right now. That at least doubles what
> you have now, so hopefully that is enough to make a meaningful difference if
> it is indeed related.
> 
> If you don't mind, the other thing you can try for me is run Firefox with
> image logging turned on, e.g. set the environment variable MOZ_LOG to
> "imgRequest:5". You can then run firefox and redirect its output to a file.
> You should see lots of logs. Note that this will log the URLs of the images
> loaded into the browser into the output file, so be mindful of what you
> visit/share. If you prefer you can send it to me directly. It would be very
> useful to have the screencast collected at the same time so I can line up
> the logs with what is happening on screen. This will give me a sense of
> whether or not it had to reload the data from the network, etc.

No problem
But how to do that? right now only have windows 10 laptop,
Linux one is with my sister for next few days and
it looks like the set the environment variable MOZ_LOG to
"imgRequest:5". You can then run firefox and redirect its output to a file.
You should see lots of logs. is for linux?

If available on windows can you guide me on how to?

Sorry but not very good at these but still willing to help.
forgot to NI?
(In reply to Andrew Osmond [:aosmond] from comment #14)
> If the size_factor is 4, then 32 GB / 4 = 8 GB, so if max_size_kb is smaller
> than that, it will use max_size_kb. I'd recommend changing max_size_kb to
> 2097152. 

Set it but not a lot of difference

>I forgot that it clamps the size to fit into a 32-bit unsigned
> integer, so 2GB is the practical limit right now.

Why is that? can't it be changed so user can set 4gb or whatever if they want so? 

> If you don't mind, the other thing you can try for me is run Firefox with
> image logging turned on, e.g. set the environment variable MOZ_LOG to
> "imgRequest:5". You can then run firefox and redirect its output to a file.
> You should see lots of logs. Note that this will log the URLs of the images
> loaded into the browser into the output file, so be mindful of what you
> visit/share. If you prefer you can send it to me directly. It would be very
> useful to have the screencast collected at the same time so I can line up
> the logs with what is happening on screen. This will give me a sense of
> whether or not it had to reload the data from the network, etc.

Can you provide more info on this
(In reply to Jason Mechelynck from comment #17)
> (In reply to Andrew Osmond [:aosmond] from comment #14)
> > If the size_factor is 4, then 32 GB / 4 = 8 GB, so if max_size_kb is smaller
> > than that, it will use max_size_kb. I'd recommend changing max_size_kb to
> > 2097152. 
> 
> Set it but not a lot of difference
> 

Okay, that's about what I expected. Thanks for trying.

> >I forgot that it clamps the size to fit into a 32-bit unsigned
> > integer, so 2GB is the practical limit right now.
> 
> Why is that? can't it be changed so user can set 4gb or whatever if they
> want so? 
> 

I don't think there is a reason. The code just hasn't been updated for the modern world.

> > If you don't mind, the other thing you can try for me is run Firefox with
> > image logging turned on, e.g. set the environment variable MOZ_LOG to
> > "imgRequest:5". You can then run firefox and redirect its output to a file.
> > You should see lots of logs. Note that this will log the URLs of the images
> > loaded into the browser into the output file, so be mindful of what you
> > visit/share. If you prefer you can send it to me directly. It would be very
> > useful to have the screencast collected at the same time so I can line up
> > the logs with what is happening on screen. This will give me a sense of
> > whether or not it had to reload the data from the network, etc.
> 
> Can you provide more info on this

I think these instructions should work:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Logging_HTTP_activity_by_manually_setting_environment_variables

Just use set MOZ_LOG=imgRequest:5 instead of what the wiki suggests.
Going to try once now

Can you provide an email address to mail you the info?
(In reply to Jason Mechelynck from comment #19)
> Going to try once now
> 
> Can you provide an email address to mail you the info?

aosmond@mozilla.com
Don't forget the screencast, if possible :).
Attached file log.txt
first run, if you need something specif let me know

https://drive.google.com/open?id=1CNs5H2eMlibquId8_WGasRNHFlpi6BgH
(In reply to Jason Mechelynck from comment #22)
> Created attachment 8996029 [details]
> log.txt

Hm, there were no images loaded over http in that log, just stuff for the UI. It seems like it only captured it from the main process, and not the content processes. That's strange, must be specific to Windows. I'm not sure if there is a workaround.

(In reply to Jason Mechelynck from comment #23)
> first run, if you need something specif let me know
> 
> https://drive.google.com/open?id=1CNs5H2eMlibquId8_WGasRNHFlpi6BgH

This doesn't have the grey rectangle issue. I see flickering obviously, but nothing gets stuck on grey like it was before.

I can at least reproduce showing the grey temporarily now. It shows up after decoding the metadata from an image, but before any surface is produced by imagelib. This would be something outside of imagelib, perhaps one of the texture clients is getting misinitialized.
Component: ImageLib → Graphics
Flags: needinfo?(aosmond)
(In reply to Andrew Osmond [:aosmond] from comment #24)
> I can at least reproduce showing the grey temporarily now. It shows up after
> decoding the metadata from an image, but before any surface is produced by
> imagelib. This would be something outside of imagelib, perhaps one of the
> texture clients is getting misinitialized.

Isn't that the color that is behind an image loaded at the top level? It's defined in the image document and can be seen while the image is loading. Or am I misremembering?
(In reply to Timothy Nikkel (:tnikkel) from comment #25)
> (In reply to Andrew Osmond [:aosmond] from comment #24)
> > I can at least reproduce showing the grey temporarily now. It shows up after
> > decoding the metadata from an image, but before any surface is produced by
> > imagelib. This would be something outside of imagelib, perhaps one of the
> > texture clients is getting misinitialized.
> 
> Isn't that the color that is behind an image loaded at the top level? It's
> defined in the image document and can be seen while the image is loading. Or
> am I misremembering?

It is different between images in an image document, but when I do see it, it is suspiciously like #e5e5e5 (gray). That said, I don't think there is a security concern here since it is just a pixel buffer, but still, it seems worrisome.
anyone landing a patch for this in 63?
Updating tracking flags as we get closer to the 64 release.
Since this is triaged and has a priority set, marking this fix-optional to remove it from regression triage. 
Happy to still take a patch in nightly.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: