Closed
Bug 966543
Opened 10 years ago
Closed 10 years ago
Crashes at MacIOSurface::GetDevicePixelWidth() with OMTC
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: smichaud, Assigned: mattwoodrow)
Details
(Keywords: topcrash-mac)
Crash Data
Attachments
(3 files)
1.42 KB,
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
1.61 KB,
patch
|
nical
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
3.95 KB,
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
These are all crashes on a secondary compositor thread. The oldest ones I can find date from 2013-11: bp-4c458196-4734-45f9-8ffc-777392131114 bp-0d3aa9a6-8677-4d7a-af77-1d3202131121 These are the #2 topcrasher on the 27 branch, and the #9 topcrasher on the 29 branch. These crashes happen here: http://hg.mozilla.org/mozilla-central/annotate/540d85f60c57/gfx/2d/MacIOSurface.cpp#l299 and here: http://hg.mozilla.org/mozilla-central/annotate/540d85f60c57/gfx/layers/opengl/MacIOSurfaceTextureHostOGL.cpp#l97 I'd guess that MacIOSurfaceTextureHostOGL.mSurface is null. MacIOSurface::LookupSurface(), called in that class's constructor, can return null. Matt, I'm CCing you because these crashes happen in code you wrote. Here are a few recent examples: bp-1256c709-a51a-47f3-b064-fbc0d2140103 bp-6f4bc490-d0a3-48e2-ada4-16d5c2140105 bp-f6bd0adc-03b4-4421-a2ba-1bdd02140104
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Updated•10 years ago
|
Crash Signature: [@ MacIOSurface::GetDevicePixelWidth() ]
Assignee | ||
Comment 1•10 years ago
|
||
I think this is most likely the issue. The ID we pass in the SurfaceDescriptor doesn't hold a reference to the surface, so it seems possible that the TextureClient was being destroyed (and the surface with it) before the TextureHost was created.
Attachment #8369261 -
Flags: review?(nical.bugzilla)
Comment 2•10 years ago
|
||
Comment on attachment 8369261 [details] [diff] [review] macio-crash Review of attachment 8369261 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/opengl/MacIOSurfaceTextureClientOGL.cpp @@ +75,5 @@ > + virtual void DeallocateSharedData(ISurfaceAllocator*) MOZ_OVERRIDE > + { > + mSurface = nullptr; > + } > + nit: white space
Attachment #8369261 -
Flags: review?(nical.bugzilla) → review+
Updated•10 years ago
|
Comment 3•10 years ago
|
||
Tracking for Fx27 as this is a 27 specific top mac crasher, but not something we can hold tomorrow's release for. In case we have chemspill/dot release, will consider this as a ride along if the patch sticks in Fx28 and is low risk.
Assignee | ||
Comment 4•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d0317f079e3
Comment 5•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/2d0317f079e3
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
I still see crashes with this in Firefox 30.0a1 after this landed. For example: https://crash-stats.mozilla.com/report/index/ebc9ad2f-beff-496e-919a-a93512140206
Reporter | ||
Comment 7•10 years ago
|
||
> bp-ebc9ad2f-beff-496e-919a-a93512140206
This is so far the only crash in a build with this bug's patch. I suspect there will be more. But it'll be a week before we can tell whether or not the frequency has changed.
Reporter | ||
Comment 8•10 years ago
|
||
> https://hg.mozilla.org/mozilla-central/rev/2d0317f079e3 There have been 6 crashes with this patch (on the 30 branch) in the week since it landed: bp-9bea4c69-d7ed-4f7b-8197-64d3c2140209 bp-ebc9ad2f-beff-496e-919a-a93512140206 bp-15a84672-e3e6-47d2-bf8f-9b8912140211 bp-bfcd2bf6-dcdd-450f-ab32-e6f9b2140210 bp-f855a4e2-027a-4db5-93f2-304f32140209 bp-4d7d68f1-0aa5-4d4f-be53-45d2f2140211 There were five of these crashes (on the 29 branch) in the week before this patch landed. So this patch has made no difference at all to these crashes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 9•10 years ago
|
||
> I'd guess that MacIOSurfaceTextureHostOGL.mSurface is null.
I've confirmed this, by looking at XUL in Hopper Disassembler.
According to the raw crash logs, these crashes take place trying to read memory at offset 0x8, in MacIOSurface::GetDevicePixelWidth(). The assembly code for this method is very short, so I'll paste it in here. This method is called with a pointer to "this" (a MacIOSurface pointer) in the $rdi register, in this code snippet: mSurface->GetDevicePixelWidth(). mSurface is MacIOSurfaceTextureHostOGL.mSurface.
push rbp
mov rbp, rsp
mov rdi, qword [ds:rdi+0x8]
pop rbp
jmp qword [ds:__ZN15MacIOSurfaceLib6sWidthE] ; MacIOSurfaceLib::sWidth
Almost certainly the crashes happen at "line 3", whose pseudo code is something like this:
$rdi = *(rdi + 0x8)
In other words, MacIOSurface::GetDevicePixelWidth() is called with this == NULL.
Assignee | ||
Comment 10•10 years ago
|
||
Hrm, so it looks like DropTextureData is only called for TEXTURE_DEALLOCATE_CLIENT textures. Nico, I think what is happening here is that the texture host hasn't been created at the time when we destroy the texture client, because it's still sitting in the message queue. Creating the texture host is what takes a reference to the shared object, so we have a race condition. Do we have anything in place to protect against this? I think we could potentially have this issue for a few different texture types.
Flags: needinfo?(nical.bugzilla)
Comment 11•10 years ago
|
||
Hmm, I don't think we have anything to protect about destroying a TextureClient we just sent to the compositor before the TextureHost got created. I am actually surprised that this can happen at all. It seems like if we run into this we shouldn't have shared the TextureClient in the first place. We could give the TextureData to the TextureChild even when TEXTURE_DEALLOCATE_CLIENT is not set, and tell the TextureChild to forget the texture in this case, instead of deallocating it (since the texture will be deallocated on the host side).
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 12•10 years ago
|
||
Don't we need a synchronous transaction for that though? I can't even do an extra AddRef on the client side, and skip the AddRef when receiving on the host side since we might be passing cross-process. MacIOSurface is a bit weird in that it's a c++ object (which we AddRef/Release) but wrapping a system level object which we can't manually refcount. There's no way to guarantee the id number we pass through IPDL stays valid without keeping the c++ object in the client alive. I think we need an asynchronous version of the delete message so we can get notified when the host has processed all the messages.
Comment 13•10 years ago
|
||
Assigning to Matt since it looks like that's who's doing the work of finding the fix here. Matt, we still have a couple more weeks of Beta where slightly more speculative fixes can be tried to see if they impact the crash volume.
Assignee: nobody → matt.woodrow
Assignee | ||
Comment 14•10 years ago
|
||
Probably wallpapering over the problem a bit, but this should stop people from crashing. If my guess about the race condition is correct, then we're only trying to render this for a single frame anyway, so not a big loss.
Attachment #8376870 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 15•10 years ago
|
||
Just puts all the **Host code together instead of mixed through the file.
Attachment #8376871 -
Flags: review?(nical.bugzilla)
Comment 16•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #12) > Don't we need a synchronous transaction for that though? TextureClient/Host async shutdown goes as follows: * client side sends the RemoveTexture message * host side receives it, deallocates the shared data (or not depending on the TextureFlags) * host sends __delete__ So if we need to keep the data alive, we can pass it to the TextureParent, and the latter will be deleted when receiving __delete__ asynchronously. > I think we need an asynchronous version of the delete message so we can get > notified when the host has processed all the messages. We already sort of have this if we overload Recv__delete__ on PTextureChild (which we don't currently do but we should).
Comment 17•10 years ago
|
||
Comment on attachment 8376870 [details] [diff] [review] Check for null mSurface Review of attachment 8376870 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/opengl/MacIOSurfaceTextureHostOGL.cpp @@ +95,5 @@ > > gfx::SurfaceFormat > MacIOSurfaceTextureHostOGL::GetFormat() const { > + if (!mSurface) { > + return gfx::SurfaceFormat::R8G8B8A8; Should we return SurfaceFormat::UNKNOWN here?
Attachment #8376870 -
Flags: review?(nical.bugzilla) → review+
Updated•10 years ago
|
Attachment #8376871 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 18•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/48cd98deecd7 https://hg.mozilla.org/integration/mozilla-inbound/rev/6bd01c3b4945
Comment 19•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/48cd98deecd7 https://hg.mozilla.org/mozilla-central/rev/6bd01c3b4945
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Comment 20•10 years ago
|
||
Flagging for verification based on crash-stats. We'll need to give this a few days for data collection.
Reporter | ||
Comment 21•10 years ago
|
||
Comment on attachment 8376871 [details] [diff] [review] Move some code around There haven't been any of these crashes in builds that contain this patch (starting with firefox-2014-02-18-03-02-02-mozilla-central). So it looks like this bug is now actually fixed!
Comment 22•10 years ago
|
||
We have just seen this crash on Mac OS X 10.7.5 (x86_64) on Nightly de https://crash-stats.mozilla.com/report/index/1b0e40f2-e96e-4e3e-b752-12f2e2140224 https://crash-stats.mozilla.com/report/index/0fb87fbf-7fdb-40bd-ae63-75d3b2140224
Reporter | ||
Comment 23•10 years ago
|
||
Those crashes were in the firefox-2014-02-09-03-02-03-mozilla-central nightly. This bug's actual fix only landed as of the firefox-2014-02-18-03-02-02-mozilla-central nightly.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Comment 24•10 years ago
|
||
I looked at the data for Firefox 30.0a1 in the last week and there are 5 crashes reported. However all of these crashes occur with a build ID of 20140217030202 or earlier. Since the fix landed on February 17 and would have appeared first in the February 18 Nightly builds, I think we can call this fixed. Source: https://crash-stats.mozilla.com/report/list?signature=MacIOSurface%3A%3AGetDevicePixelWidth%28%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A30.0a1&hang_type=any&date=2014-02-24+20%3A00%3A00&range_value=1#tab-reports
status-firefox30:
--- → verified
Assignee | ||
Comment 25•10 years ago
|
||
Comment on attachment 8376870 [details] [diff] [review] Check for null mSurface [Approval Request Comment] Bug caused by (feature/regressing bug #): Unknown User impact if declined: Occasional crashes with plugins/webgl on OSX. Testing completed (on m-c, etc.): Been on m-c for a week, confirmed it fixes the issue using crashstats. Risk to taking this patch (and alternatives if risky): Very low risk, just adds null checks. String or IDL/UUID changes made by this patch: None
Attachment #8376870 -
Flags: approval-mozilla-beta?
Attachment #8376870 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8376870 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 26•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/79b269235c03 https://hg.mozilla.org/releases/mozilla-aurora/rev/b8e64e9f9e66
Updated•10 years ago
|
Attachment #8376870 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•10 years ago
|
Comment 27•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/cf9032d58612 https://hg.mozilla.org/releases/mozilla-beta/rev/f5e8807d2a48
Comment 28•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/cf9032d58612 https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/f5e8807d2a48
Updated•10 years ago
|
status-b2g-v1.3:
--- → fixed
Comment 29•10 years ago
|
||
I just checked crash-stats and I'm seeing 0 crashes with Firefox 28.0b7 and 0 crashes with Firefox 29.0a2 after 2014-02-25. I'm calling this verified fixed based on these stats.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Updated•10 years ago
|
status-b2g-v1.3T:
--- → fixed
status-b2g-v1.4:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•