Closed Bug 1126490 Opened 5 years ago Closed 5 years ago

Failure to display images after imgFrame::mOptSurface is cleared

Categories

(Core :: ImageLib, defect)

x86_64
Windows 8.1
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla38
Tracking Status
firefox36 + fixed
firefox37 + fixed
firefox38 --- verified

People

(Reporter: bas.schouten, Assigned: seth)

References

Details

Crash Data

Attachments

(4 files, 4 obsolete files)

4.29 KB, patch
Details | Diff | Splinter Review
3.38 KB, patch
Details | Diff | Splinter Review
4.47 KB, patch
Details | Diff | Splinter Review
3.36 KB, patch
Details | Diff | Splinter Review
When due to a GPU reset an optimized surface becomes invalid and is cleared. (https://hg.mozilla.org/mozilla-central/file/08e41ea36f6d/image/src/imgFrame.cpp#l1017) we fail to display the image since we've already tossed out all the existing copies of the surface data (https://hg.mozilla.org/mozilla-central/file/08e41ea36f6d/image/src/imgFrame.cpp#l502), in this situation we need to somehow trigger a re-decode.
Component: Graphics → ImageLib
Here's the patch. If it's not clear, as long as we have a DrawableFrameRef
imgFrame::Draw can only return false in a situation like this, because other
than GPU driver crashes and the like, nothing can release an imgFrame's surface
while a DrawableFrameRef is alive.

imgFrame::GetSurface() is slightly more subtle because, even if we have a
DrawableFrameRef, it could still return null if the single-color optimization
has been applied. We do handle that case in CopyFrame(), though. (And
GetFrameInternal() calls CopyFrame(), so we don't need to handle it separately
there.)

Unfortunately, to maintain the invariants the code expects I think we are forced
to sync decode animated images when this happens. This should be a very rare
circumstance, so it shouldn't be a big problem in practice. Still, we really
need to enable animated image discarding...
Attachment #8555508 - Flags: review?(tnikkel)
Assignee: nobody → seth
Status: NEW → ASSIGNED
Attachment #8555508 - Flags: review?(tnikkel) → review+
Duplicate of this bug: 1123544
Changing commit message to reflect the addition of part 2.
Attachment #8555508 - Attachment is obsolete: true
This is a patch analogous to that in part 1, but for VectorImage. It should
resolve the issue reported in bug 1123544, which seems to be the VectorImage
version of the issue we're fixing in this bug.

As I mentioned in bug 1123544, both of these patches are in some sense temporary
fixes, in that DrawableFrameRef should guarantee that these issues are
impossible. Unfortunately, that can't be done in an efficient way without some
refactoring, and such a patch would not be upliftable, so in this bug we'll take
an uglier but upliftable approach.
Attachment #8558747 - Flags: review?(dholbert)
New try job including part 2 here:

https://tbpl.mozilla.org/?tree=Try&rev=f33c3c57e5b9
Comment on attachment 8558747 [details] [diff] [review]
(Part 2) - Recover from loss of surfaces in VectorImage

>diff --git a/image/src/VectorImage.cpp b/image/src/VectorImage.cpp
[...]
>   // Draw.
>   if (frameRef) {
>     RefPtr<SourceSurface> surface = frameRef->GetSurface();
>+    if (!surface) {
>+      RecoverFromLossOfSurfaces();
>+      CreateSurfaceAndShow(params);
>+      return NS_OK;
>+    }
>+
>     nsRefPtr<gfxDrawable> svgDrawable =
>       new gfxSurfaceDrawable(surface, ThebesIntSize(frameRef->GetSize()));
>     Show(svgDrawable, params);
>   } else {
>     CreateSurfaceAndShow(params);
>   }
> 
>   return NS_OK;

Maybe it'd be better to share the existing "CreateSurfaceAndShow()" codepath here? (the idea being: if we lose our surface, we just bail out via RecoverFromLossOfSurfaces, and then we proceed as if we never had a surface in the first place)

Seems like we could do that pretty easily by just moving the "found a surface" case to be an early-return, and then handle missing-surfaces (due to loss or just never-being-present) after that.

e.g.:
  if (frameRef) {
    RefPtr<SourceSurface> surface = frameRef->GetSurface();
    if (surface) {
      nsRefPtr<gfxDrawable> svgDrawable =
        new gfxSurfaceDrawable(surface, ThebesIntSize(frameRef->GetSize()));
      Show(svgDrawable, params);
      return NS_OK;
    }

    RecoverFromLossOfSurfaces();
  }
  CreateSurfaceAndShow(params);
  return NS_OK;

What you have is fine, too, though, and is maybe better than my suggestion if the special-case for RecoverFromLossOfSurfaces is going to go away (which it sounds like it is).

So, r=me either way.
Attachment #8558747 - Flags: review?(dholbert) → review+
Thanks for the review! I like the early-return approach; I'll switch the patch around to use it.
Attachment #8558741 - Attachment is obsolete: true
Ack. Actually it needed even more rebasing than that.
Attachment #8558825 - Attachment is obsolete: true
Switched part 2 to use an early return, and added more comments.
Attachment #8558747 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/3f55b5618d6e
https://hg.mozilla.org/mozilla-central/rev/7ee38d92f251
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Blocks: 1122643
Comment on attachment 8558832 [details] [diff] [review]
(Part 1) - Recover when catastrophic circumstances cause us to lose frames in RasterImage

Approval Request Comment
[Feature/regressing bug #]: Unknown. Possibly bug 1060869, but the regression range is dubious.
[User impact if declined]: On Windows: black images after a GPU driver crash or upgrade. On Android: black images periodically because some unknown factor makes us lose our surfaces.
[Describe test coverage new/current, TreeHerder]: On m-c.
[Risks and why]: Low risk.
[String/UUID change made/needed]: None.

Note that a manual rebase will be required to uplift this to 36.
Attachment #8558832 - Flags: approval-mozilla-beta?
Comment on attachment 8558833 [details] [diff] [review]
(Part 2) - Recover from loss of surfaces in VectorImage

(We need part 2 as well.)
Attachment #8558833 - Flags: approval-mozilla-beta?
I should've stated explicitly, but this is a fix for bug 1122643, which is tracked.
Seth, I guess aurora (37) is affected too, right?
Flags: needinfo?(seth)
Yes, 37 is affected too. I'd like to uplift this to Aurora a little later, though, because if I uplift some other things (that I wanted to uplift anyway) 37 and 38 can use the same version of the patch. 36 is unavoidably going to need a manually rebased version of the patch, though.
Flags: needinfo?(seth)
Here's a try job of these patches on top of 36 tip.

https://tbpl.mozilla.org/?tree=Try&rev=0357ed919a57
Comment on attachment 8559729 [details] [diff] [review]
(Part 1) - Recover when catastrophic circumstances cause us to lose frames in RasterImage. (FF36 Backport)

Transferring the approval request to the backported patches.
Attachment #8559729 - Flags: approval-mozilla-beta?
Attachment #8558832 - Flags: approval-mozilla-beta?
Attachment #8559735 - Flags: approval-mozilla-beta?
Attachment #8558833 - Flags: approval-mozilla-beta?
Attachment #8559729 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8559735 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Seth Fowler [:seth] from comment #18)
> Yes, 37 is affected too. I'd like to uplift this to Aurora a little later,
> though, because if I uplift some other things (that I wanted to uplift
> anyway) 37 and 38 can use the same version of the patch. 36 is unavoidably
> going to need a manually rebased version of the patch, though.

Could you make sure those are all tracked and get uplifted really soon? Given that the high-ranking mozilla::gfx::GetCairoSurfaceForSourceSurface crash bug 1123544 is duplicated to this and bug 1122643 is a really bad experience as well, we really should not expose Dev Edition to this for too long.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #24)
> Could you make sure those are all tracked and get uplifted really soon?
> Given that the high-ranking mozilla::gfx::GetCairoSurfaceForSourceSurface
> crash bug 1123544 is duplicated to this and bug 1122643 is a really bad
> experience as well, we really should not expose Dev Edition to this for too
> long.

Yes, I'm aiming to start requesting uplift for those things today. They were blocked on bug 1125490 landing, and it's been on inbound for 9 hours now, so things look promising that it will stick this time.
Crash Signature: [@ mozilla::gfx::GetCairoSurfaceForSourceSurface(mozilla::gfx::SourceSurface*, bool, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&)]
(In reply to Seth Fowler [:seth] from comment #25)
> Yes, I'm aiming to start requesting uplift for those things today. They were
> blocked on bug 1125490 landing, and it's been on inbound for 9 hours now, so
> things look promising that it will stick this time.

ping?
Flags: needinfo?(seth)
It's not clear to me if everything needed to potentially fix bug 1122643 is on Beta right now. We just shipped Beta 8 so we're getting really close to end-game on 36.
(In reply to Aaron Train [:aaronmt] from comment #27)
> It's not clear to me if everything needed to potentially fix bug 1122643 is
> on Beta right now. We just shipped Beta 8 so we're getting really close to
> end-game on 36.

It is. The discussion above is about *Aurora*.
I was able to reproduce the crash from bug 1123544 during GPU driver update with tagesschau.de open, using Nightly from January 27 on Windows 8.1 x64. I did not get any crash while doing the same with the  latest Nightly 38, while 36 Beta 10 crashes with signature from bug 1116812. Socorro (http://bit.ly/1zqsWAp) shows no more crashes on Nightly 38 after February 4, and only 2 crashes after this landed in 36 Beta 7 (36.0b9), so it seems the crashing part of the bug is now OK.

I was able to reproduce white images during GPU driver update with heise.de open, using Nightly from January 27 on Windows 8.1 x64. The issue did not show on the latest Nightly 38 (could not verify on 36 Beta 10 due to bug 1116812).

I'm marking this as verified for 38, but will hold off for doing so on 36, since it's affected by crash from bug 1116812.
Status: RESOLVED → VERIFIED
Seth, you mention in comment 18 and comment 25 that some blockers for bug 1125490 should get uplifted to 37 so I wanted to make sure that happens before we accept this for approval on aurora. 

It sounds like we are pretty well covered for 38 and 36 but may need to go back and do some cleanup to get this working on 37.   Should bug 1125490 and some of its blocking bugs get nomimated for aurora uplift?
Liz, I missed telling you about this one earlier. I've already reviewed the list of bugs that require uplift with Seth. He should be nomming the dependencies and handling the uplift himself this week.
Comment on attachment 8558832 [details] [diff] [review]
(Part 1) - Recover when catastrophic circumstances cause us to lose frames in RasterImage

Approval Request Comment
[Feature/regressing bug #]: Unknown.
[User impact if declined]: Failure to draw images after a GPU driver reset or after we hit a bug that causes us to lose our volatile buffers.
[Describe test coverage new/current, TreeHerder]: On m-c for some time. We currently can't write tests for this but I plan to add support for testing the volatile buffers issue at least.
[Risks and why]: Very low risk.
[String/UUID change made/needed]: None.
Flags: needinfo?(seth)
Attachment #8558832 - Flags: approval-mozilla-aurora?
Attachment #8558833 - Flags: approval-mozilla-aurora?
Comment on attachment 8558832 [details] [diff] [review]
(Part 1) - Recover when catastrophic circumstances cause us to lose frames in RasterImage

This fix has already landed on 36 and 38 and been verified. Taking this on 37 for consistency. Aurora+ (Seth had verbal approval from me to land before approval was added to the bug.)
Attachment #8558832 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8558833 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.