Closed Bug 1609672 Opened 4 years ago Closed 2 years ago

Images sometimes missing

Categories

(Core :: Layout: Images, Video, and HTML Frames, defect, P2)

All
macOS
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mstange, Unassigned)

References

Details

(Keywords: steps-wanted, testcase-wanted)

I don't have steps to reproduce, unfortunately.

Today, I ran into a problem where the screenshot track in a profile (https://perfht.ml/309OGGf) didn't display all of the images when moving the mouse through the track; at some positions, the enlarged screenshot box was just empty.
Vicky reported a similar problem to me yesterday: In a large Google Slide document, an image with a graph wouldn't render after edits were made elsewhere in the document.

It seems like a really intermittent issue. Maybe based on the memory taken up by images overall?
It also seems like a recent regression, maybe around January 12/13.

Andrew, have there been any changes recently that could have caused something like this?

Flags: needinfo?(aosmond)

This is not the only report recently where people are not seeing images they expected to see. Bug 1338652 may be related. I can't think of any changes, and going through the recent history of image related code doesn't yield any serious candidates. It is very worrisome. It does not appear to be related to volatile memory as it has in the past, but thus far everyone hasn't been using WebRender.

Can you confirm whether or not you were using WR at the time?

Flags: needinfo?(aosmond) → needinfo?(mstange)

Both Vicky and I were not using WR.

Flags: needinfo?(mstange)

I believe this happened to me today with WebRender enabled. The download icon on an attachment did not render correctly however hovering over the image URL in inspector showed that it decoded correctly. Toggling the URL on and off resolved the problem.

Oh and this happened on Linux, which confirms that volatile memory is not the problem. That isn't implemented on Linux at all.

Priority: -- → P2
See Also: → 1338652

So I can repro this easily with image.mem.surfacecache.max_size_kb=100 and image.mem.surfacecache.min_expiration_ms=0... Just switching tabs on GMail back and forth causes a bunch of icons to disappear.

Does that ring a bell Andrew? I suspect that means that either some images are not getting locked correctly, or that the locking mechanism for images is not working as expected.

Flags: needinfo?(aosmond)

That being said with those two prefs see similar behavior in 70... But maybe some heuristics around that have changed?

This is another bug about missing images: bug 1606262

See Also: → 1606262

If the image cache is set really small, I would expect us to miss many images. For reference, 100kb can fit a single 160x160 image. I've landed patches in bug 1610381 which exposes more of the image/request state. This will include if we hit errors during decoding, loading the initial request, any progress events, if the surface is incomplete, as well as making many of these conditions "notable" so they are included in the memory report, even if the amount of memory consumed is small. We also now track all error paths when inserting a surface into the cache, just in case we drop something as result of a gracefully handled out of memory. This should allow us to gather even anonymous memory reports and quickly see if any images are stuck in a bad state.

Once bug 1610381 has made its way into a build, I would appreciate the following from anyone encountering the issue:

  1. When you see a tab in this state, open a new window; do not tab away from the problematic tab.
  2. In the new window, go to about:memory, and save a report to share. It is useful even if anonymized, although for anyone willing to send me directly the complete report, I will be grateful.
  3. If you sent the complete report, please use inspector to find the URI of the missing image so I can line it up.

Extra credit if you flip on the image.mem.debug-reporting pref. I don't think it is necessary, but it does cause us to do things like dump every frame for an animated image, provide extra info for WebRender and mark all images as notable (handy just in case the image you are missing is still not notable despite the changes in bug 1610381).

I'll comment with the build ID once my changes are in.

Depends on: 1610381
Flags: needinfo?(aosmond)

An alternative to giving me a full report would be to collect the report as in comment 8, but rather than save, you can just do measure / show memory report immediately. From inspector figure out the name of the relevant image, and use the filter on the right hand side to copy/paste the whole tree that matches the URI. I would also appreciate copying/pasting anything that matches "imagelib-" as well.

Nightly builds 20200121215203 and later should now contain the extended memory reporting. Would you be able to follow the steps in comment 8 or 9 next time this happens and send me a report? Thanks.

Flags: needinfo?(mconca)
Flags: needinfo?(awagner)

I am running beta as my main instance. I am happy to test on beta once it's there!

Flags: needinfo?(awagner)

(In reply to Andreas Wagner [:TheOne] [use NI] from comment #11)

I am running beta as my main instance. I am happy to test on beta once it's there!

Is there any way to use Nightly for a few days? Once that change is on Beta, Beta will be on release, which means we'd have shipped the problem to our users, which is not great.

Alternatively, maybe Andrew can uplift the patch for the next beta, but...

Flags: needinfo?(awagner)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #12)

(In reply to Andreas Wagner [:TheOne] [use NI] from comment #11)

I am running beta as my main instance. I am happy to test on beta once it's there!

Is there any way to use Nightly for a few days? Once that change is on Beta, Beta will be on release, which means we'd have shipped the problem to our users, which is not great.

Alternatively, maybe Andrew can uplift the patch for the next beta, but...

I'm optimistic on its upliftability, although I'm not sure if relman is willing to accept. Stay tuned.

(In reply to Andrew Osmond [:aosmond] from comment #10)

Nightly builds 20200121215203 and later should now contain the extended memory reporting. Would you be able to follow the steps in comment 8 or 9 next time this happens and send me a report? Thanks.

I think I ran across this on LinkedIn just now. Report sent in email.

Flags: needinfo?(mconca)

The identified image which is having trouble has the entry:

0.06 MB (100.0%) -- explicit
└──0.06 MB (100.0%) -- images/uncached/raster/used/progress=10f/image(720x720, ** PID SNIP 1 **)
├──0.05 MB (99.09%) ── source
└──0.00 MB (00.91%) -- (2 tiny)
├──0.00 MB (00.61%) -- locked
│ ├──0.00 MB (00.30%) ── surface(552x552)/decoded-heap
│ └──0.00 MB (00.30%) ── surface(607x607)/decoded-heap
└──0.00 MB (00.30%) ── unlocked/surface(720x720)/decoded-heap

The progress flags indicate it thinks it has fully decoded an opaque image. The surface data does not look like it got allocated, despite their being an entry in the cache; not sure how to interpret that. The source data for the image seems suspiciously small for the 720x720 size, which leads me to believe we erroneously think we got all the data for the image, but we did not. We got enough for metadata decoding however, hence why we know the size.

Also in the report, I noticed this:

├──0.07 MB (27.20%) -- images/uncached/raster/unused/err/progress=30f
│ ├──0.07 MB (26.38%) ── image(0x0, ** PID SNIP 2 **)/source
│ └──0.00 MB (00.82%) ── image(0x0, ** PID SNIP 3 **)/source

Supposedly these raster images fully decoded themselves, but we failed to get even the metadata.

There are two problems:

  1. Why does the image cache have fully decoded surfaces that don't actually have a surface?

  2. What happened to the source data?

For the latter, I think the chain of events is something like:

  1. We got all the source data, and successfully decoded.
  2. We tossed our decoded surfaces after being away from the page.
  3. We had to revalidate the content when we visited the page again, this caused us to redownload the source data. But this time it got truncated quick.
  4. We try to redecode but now it can't get any actual pixels out. (???? On the image cache state ????)
Flags: needinfo?(awagner)

I don't believe bug 1611127 is the sole cause, because it is Windows only, but it is something that came up as a result of my investigations.

See Also: → 1611127

If the surface is optimized, it will not show anything in the memory reports. We could easily add something to make this clear.

I think there is a remote possibility OMTP users can be affected by the new bug filed / patch attached. I'm not hopeful, but there is a chance, so I will ask for vigilance after it lands :).

URL: 1612207
See Also: → 1612207

Andreas, if you disable OMTP, does the problem go away? You can flip the layers.omtp.enabled pref to false and restart. If it does, then I hopefully have an incoming fix as part of bug 1612207.

Flags: needinfo?(awagner)

I haven't seen this problem in beta for at least at week. Would you still like me to flip that pref on beta and keep using gmail?

Flags: needinfo?(awagner) → needinfo?(aosmond)

Ah, never mind. We'll let the changes ride the train and see if anyone still sees it in that case.

Flags: needinfo?(aosmond)
See Also: → 1612589
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 5 See Also bugs.
:emilio, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(emilio)

I presume per the above this hasn't kept happening. But please reopen otherwise?

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(emilio) → needinfo?(awagner)
Resolution: --- → WORKSFORME

I can still reproduce this in 106 beta.

Status: RESOLVED → REOPENED
Flags: needinfo?(awagner)
Resolution: WORKSFORME → ---

Apologies, bug mixup. This should be fine.

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.