Closed Bug 1089214 Opened 5 years ago Closed 5 years ago

crash in mozilla::layers::D3D9SurfaceImage::EnsureSynchronized()

Categories

(Core :: Graphics: Layers, defect, critical)

x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla36
Tracking Status
firefox35 --- verified
firefox36 --- verified

People

(Reporter: jesup, Assigned: pehrsons)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-264f1d35-b95a-436e-9711-65d542141011.
=============================================================

Null-deref crash in D3D9SurfaceImage::EnsureSynchronized() (typically from GetTextureClient() from UpdateImage/UpdateImageInternal()).

Thread 0 was in gdi, called from mozilla::gfx::DataSourceSurfaceD2DTarget::EnsureMapped().

About 200 crash reports per week

Changes ready to land in bug 879717 require a new test, but our test shows about 50% failure rate even without the changes we're landing, so this would block us landing an important change.

See https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=00462507ee04 (note about 50% of the oranges end in a crash - it's permaorange since we don't have the bug 879717 changes in the Try).
Bas, isn't this your field of expertise? Do you have the time to take a look?
Flags: needinfo?(bas)
I can reproduce the crash by going to http://html5demos.com/video-canvas
1 press play then on 4,6,8 seconds pause and unpause, then change speed of video to ludicous and wait till video has played then press play again it should crash
This is from the video code. Chris Pearce is the right person here.
Assignee: nobody → cpearce
Status: NEW → ASSIGNED
Flags: needinfo?(bas)
Chris, will you have time to look at this anytime soon?
Flags: needinfo?(cpearce)
Chris is pretty occupied at the moment.

Milan, could take a look at this and prioritize/assign as appropriate?
Flags: needinfo?(cpearce) → needinfo?(milan)
Looks like a nullptr deref, but only a few lines after we check for nullptr.

I guess we're calling EnsureSynchronized from both the main-thread and image bridge thread, and that's causing us to get mQuery cleared while we're using it.

We can probably just store mQuery in a local RefPtr variable to stop this happening, assuming d3d9 queries are threadsafe.

Nical might want to look at this, since it appears to be async-video related.
Crash, but not topcrash, what's the urgency for this?  It doesn't show up in the current "urgent stability issues to consider" that QA maintains for the graphics team.
Flags: needinfo?(milan)
My only concern is that it's a blocker for bug 879717 - a test there reproduces it at a ~50% rate.

Since you are busy in other areas though, I can follow up on Matt's comment and see what results that yields us.
Per comment 6. (Thanks for the analysis Matt!)

Seems to work.
try before: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=0e55ba93595a
try after: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=10a081c5129a

The log parsing is behind at the moment but checking the build dirs there were 4 of 5 crashes on the runs before the patch. All other test runs are taking significantly longer so at least they have not crashed.

nical, are you the man for review here?
Assignee: cpearce → pehrsons
Attachment #8529536 - Flags: review?(nical.bugzilla)
Not sure what's up with treeherder today, but tbpl have parsed the results:
before patch: https://tbpl.mozilla.org/?tree=Try&rev=0e55ba93595a
after patch: https://tbpl.mozilla.org/?tree=Try&rev=10a081c5129a

They're all orange but the crash is fixed. The other oranges are due to the other patches (needed in order to trigger the crash), so we should be good here.
Attachment #8529536 - Flags: review?(nical.bugzilla) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/163b23f8ac7c
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
I am sorry if I did the wrong thing to reply to this bug.

Why is this bug going to be fixed in 36 and not 35? The reason why I ask is I get this bug mainly 1 to 4 times a day and it is driving me want to go back in a few version before this bug.
nical/Milan: Is this a reasonable candidate for uplift to beta?  (I would have nominated it for Aurora after it landed if I'd though about it.)

Simple patch. well-baked
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
(In reply to Randell Jesup [:jesup] from comment #15)
> nical/Milan: Is this a reasonable candidate for uplift to beta?  (I would
> have nominated it for Aurora after it landed if I'd though about it.)
> 
> Simple patch. well-baked

Looks reasonable to me.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8529989 [details] [diff] [review]
Avoid using null query in D3D9SurfaceImage::EnsureSynchronized (carries r=nical)

Approval Request Comment
[Feature/regressing bug #]: unknown

[User impact if declined]: Crashes on D3D9 devices randomly until 36 goes to release.

[Describe test coverage new/current, TBPL]: Not testable in automation.  Baking on nightly/aurora for considerable time now

[Risks and why]: Low risk - holds a reference while using a resource instead of a raw pointer
 
[String/UUID change made/needed]: none
Attachment #8529989 - Flags: approval-mozilla-beta?
Attachment #8529989 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(milan)
You need to log in before you can comment on or make changes to this bug.