crash in mozilla::layers::D3D9SurfaceImage::EnsureSynchronized()

VERIFIED FIXED in Firefox 35

Status

()

Core
Graphics: Layers
--
critical
VERIFIED FIXED
3 years ago
3 years ago

People

(Reporter: jesup, Assigned: pehrsons)

Tracking

({crash})

Trunk
mozilla36
x86
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox35 verified, firefox36 verified)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-264f1d35-b95a-436e-9711-65d542141011.
=============================================================

Null-deref crash in D3D9SurfaceImage::EnsureSynchronized() (typically from GetTextureClient() from UpdateImage/UpdateImageInternal()).

Thread 0 was in gdi, called from mozilla::gfx::DataSourceSurfaceD2DTarget::EnsureMapped().

About 200 crash reports per week

Changes ready to land in bug 879717 require a new test, but our test shows about 50% failure rate even without the changes we're landing, so this would block us landing an important change.

See https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=00462507ee04 (note about 50% of the oranges end in a crash - it's permaorange since we don't have the bug 879717 changes in the Try).
(Assignee)

Comment 1

3 years ago
Bas, isn't this your field of expertise? Do you have the time to take a look?
Flags: needinfo?(bas)

Comment 2

3 years ago
I can reproduce the crash by going to http://html5demos.com/video-canvas
1 press play then on 4,6,8 seconds pause and unpause, then change speed of video to ludicous and wait till video has played then press play again it should crash
This is from the video code. Chris Pearce is the right person here.
Assignee: nobody → cpearce
Status: NEW → ASSIGNED
Flags: needinfo?(bas)
(Assignee)

Comment 4

3 years ago
Chris, will you have time to look at this anytime soon?
Flags: needinfo?(cpearce)
(Assignee)

Comment 5

3 years ago
Chris is pretty occupied at the moment.

Milan, could take a look at this and prioritize/assign as appropriate?
Flags: needinfo?(cpearce) → needinfo?(milan)
Looks like a nullptr deref, but only a few lines after we check for nullptr.

I guess we're calling EnsureSynchronized from both the main-thread and image bridge thread, and that's causing us to get mQuery cleared while we're using it.

We can probably just store mQuery in a local RefPtr variable to stop this happening, assuming d3d9 queries are threadsafe.

Nical might want to look at this, since it appears to be async-video related.
Crash, but not topcrash, what's the urgency for this?  It doesn't show up in the current "urgent stability issues to consider" that QA maintains for the graphics team.
Flags: needinfo?(milan)
(Assignee)

Comment 8

3 years ago
My only concern is that it's a blocker for bug 879717 - a test there reproduces it at a ~50% rate.

Since you are busy in other areas though, I can follow up on Matt's comment and see what results that yields us.
(Assignee)

Comment 9

3 years ago
Created attachment 8529536 [details] [diff] [review]
Avoid using null query in D3D9SurfaceImage::EnsureSynchronized

Per comment 6. (Thanks for the analysis Matt!)

Seems to work.
try before: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=0e55ba93595a
try after: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=10a081c5129a

The log parsing is behind at the moment but checking the build dirs there were 4 of 5 crashes on the runs before the patch. All other test runs are taking significantly longer so at least they have not crashed.

nical, are you the man for review here?
Assignee: cpearce → pehrsons
Attachment #8529536 - Flags: review?(nical.bugzilla)
(Assignee)

Comment 10

3 years ago
Not sure what's up with treeherder today, but tbpl have parsed the results:
before patch: https://tbpl.mozilla.org/?tree=Try&rev=0e55ba93595a
after patch: https://tbpl.mozilla.org/?tree=Try&rev=10a081c5129a

They're all orange but the crash is fixed. The other oranges are due to the other patches (needed in order to trigger the crash), so we should be good here.

Updated

3 years ago
Attachment #8529536 - Flags: review?(nical.bugzilla) → review+
(Assignee)

Comment 11

3 years ago
Created attachment 8529989 [details] [diff] [review]
Avoid using null query in D3D9SurfaceImage::EnsureSynchronized (carries r=nical)

Commit msg fix.
Attachment #8529536 - Attachment is obsolete: true
Attachment #8529989 - Flags: review+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/163b23f8ac7c
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/163b23f8ac7c
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36

Comment 14

3 years ago
I am sorry if I did the wrong thing to reply to this bug.

Why is this bug going to be fixed in 36 and not 35? The reason why I ask is I get this bug mainly 1 to 4 times a day and it is driving me want to go back in a few version before this bug.
(Reporter)

Comment 15

3 years ago
nical/Milan: Is this a reasonable candidate for uplift to beta?  (I would have nominated it for Aurora after it landed if I'd though about it.)

Simple patch. well-baked
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(milan)
(In reply to Randell Jesup [:jesup] from comment #15)
> nical/Milan: Is this a reasonable candidate for uplift to beta?  (I would
> have nominated it for Aurora after it landed if I'd though about it.)
> 
> Simple patch. well-baked

Looks reasonable to me.
Flags: needinfo?(nical.bugzilla)
(Reporter)

Comment 17

3 years ago
Comment on attachment 8529989 [details] [diff] [review]
Avoid using null query in D3D9SurfaceImage::EnsureSynchronized (carries r=nical)

Approval Request Comment
[Feature/regressing bug #]: unknown

[User impact if declined]: Crashes on D3D9 devices randomly until 36 goes to release.

[Describe test coverage new/current, TBPL]: Not testable in automation.  Baking on nightly/aurora for considerable time now

[Risks and why]: Low risk - holds a reference while using a resource instead of a raw pointer
 
[String/UUID change made/needed]: none
Attachment #8529989 - Flags: approval-mozilla-beta?
Attachment #8529989 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(milan)
https://hg.mozilla.org/releases/mozilla-beta/rev/166f199a7fb6
status-firefox35: --- → fixed
status-firefox36: --- → fixed
Socorro [1] shows zero crashes over the past two weeks for 35 Beta 6 & 8, 36 Aurora and 37 Nightly.

[1] - https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=14&signature=mozilla%3A%3Alayers%3A%3AD3D9SurfaceImage%3A%3AEnsureSynchronized%28%29#tab-sigsummary
Status: RESOLVED → VERIFIED
status-firefox35: fixed → verified
status-firefox36: fixed → verified
You need to log in before you can comment on or make changes to this bug.