Closed
Bug 1145106
Opened 10 years ago
Closed 9 years ago
crash in atidxx32.dll@0x196f7
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: bmaris, Unassigned)
Details
(Keywords: crash)
Crash Data
Attachments
(3 files)
3.11 MB,
image/gif
|
Details | |
1000 bytes,
patch
|
bas.schouten
:
feedback+
|
Details | Diff | Splinter Review |
2.59 KB,
patch
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-003957b4-d7d5-4adc-91d1-4b2302150319.
=============================================================
STR:
1. Start Aurora
2. Change loop.server pref to https://loop-dev.stage.mozaws.net/v0
2. Click Hello icon
3. Click Start a conversation
4. Copy conversation link and past it in the same tab
5. Select 'Share my tabs' from 'Share your screen' button
6. Click 'Got it' from the infobar
Notes:
1. Only reproduced on Windows platform
2. It may not crash from the first try.
Another crash from a different machine:
bp-286443d4-97be-40e4-9f67-81de82150319
Updated•10 years ago
|
Flags: needinfo?(mreavy)
Comment 1•10 years ago
|
||
These crashes seem to be deep in graphics drivers, with no obvious connection to Hello/WebRTC (though perhaps this was indirectly requested by the tab-sharing's "render page to canvas" request. I think this needs to go over to graphcs/layers (though I'm keeping it in WebRTC:audio/video for now). The other possibility is that there's some ownership issue with the data being manipulated. Possibly related to Bug 1145102.
Milan -- I'd like someone with the right knowledge (I believe it'd be someone on your team) to look at this bug and Bug 1145102 together to help analyze how concerning these are and what we can do to fix them. They are impacting the stability of tab sharing which are very important to Hello and WebRTC. (Tab sharing for Hello is scheduled to ship to release in Fx38.) Thanks.
Rank: 10
Component: WebRTC → WebRTC: Audio/Video
Flags: needinfo?(mreavy) → needinfo?(milan)
Priority: -- → P1
Comment 2•10 years ago
|
||
Start with Bas, just because of the Windows expertise.
Flags: needinfo?(milan) → needinfo?(bas)
Comment 3•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #2)
> Start with Bas, just because of the Windows expertise.
There does not appear to be any obvious connection between the two crashes mentioned in comment 0 from the crashes themselves. (if I understand correctly both were triggered with the same STR?) Other than that both are invalid memory reads at a seemingly not ridiculous address on the heap.
The first crash however, does have an interesting stack as well as an interesting crash address. It appears to me that the crash address may very well be related to the image data being uploaded. I'll attempt to catch this in a debugger, but in case someone beats me to it, it'd be interesting to know where the address where the read access violation occurs has any relation to the address of the image data that we're trying to upload.
That would explain this crashing in a variety of locations as depending on the driver version the upload would occur in different locations inside of the DLL. If my suspicion is correct this would not be a driver bug but possibly more of an issue in our image bridge code. Nical, what do you think?
Flags: needinfo?(bas) → needinfo?(nical.bugzilla)
Comment 4•10 years ago
|
||
In the first report, an ImageContainer contains a TextureClient that has been used on the main thread and has a D2D texture which, I assume, is using the main thread's D2D device. in this stack we are on the ImageBridge thread, which uses a different D2D device for its texture uploads (as of recently).
So my guess is that this crash comes from trying to copy from a texture into another with the textures not belonging to the same D2D device.
I remember reviewing the patch that introduced the separate D2D device for the ImageBridge thread but I don't have it handy. We could either try to back the patch out, or read the texture back on the cpu and reupload it. The later would be sorta horrible but safe. The risk with the former is that we landed the patch to avoid a deadlock inside of D2D if you do certain things with a D2D device from several threads. IIRC we landed that patch and another one which was attempting to work around the deadlock in a different way, so maybe we can just back it out. Matt, what do you think (as you wrote both the patches for the deadlock issue)?
Flags: needinfo?(nical.bugzilla) → needinfo?(matt.woodrow)
Comment 5•10 years ago
|
||
My changes only affected uploads for D3D9SurfaceImage, so this crash will still be using the 'content' d3d device for uploading.
It looks like the main thread is within d2d as well, so we're probably because both threads are using the same device at the same time.
We could try using SetMultithreadProtected on the d3d device (see the gfxWindowsPlatform changes in bug 875247).
Alternatively we could create and upload the TextureClient when we construct the Image (on the main-thread) rather than waiting for the ImageBridge thread to do it.
Flags: needinfo?(matt.woodrow)
Reporter | ||
Comment 6•10 years ago
|
||
Here are some more crashes with a slightly different signature.
Attached is a gif showing how Firefox behaves before crashes (ugly sight...). It`s quite easy for me to crash now.
bp-f9f1fb7c-2428-4022-bfef-efb892150423
bp-e280c757-710d-4817-9c09-d01d92150423
Comment 7•10 years ago
|
||
Is there a try build we can produce with some of these suggestions, perhaps more instrumentation (I'd like to know if any TDRs are happening, for instance), and pass it to Bogdan to test with?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bas)
Comment 8•10 years ago
|
||
Sure, here's a patch for the first suggestion.
Try builds: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b9f787a0bb1
The second suggestion I had doesn't actually make much sense, because we allocate the CairoImage off the main thread as well.
The other solution would be to have a separate d3d device for the image bridge thread (we could probably use the existing 'media' one), and have TextureClientD3D11::AllocateSurface choose the appropriate one given the current thread.
That might be preferable, since setting MultiThreadProtected on the content device might hurt performance.
Flags: needinfo?(matt.woodrow)
Comment 9•10 years ago
|
||
Test for option 2.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=715263e65af1
Comment 10•10 years ago
|
||
Working try builds:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1b6324cbf4a
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e11d0a5bebbe
Bogdan, could you please try those builds and see if the crashes are fixed?
Flags: needinfo?(bogdan.maris)
Updated•10 years ago
|
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 11•10 years ago
|
||
1) Build https://treeherder.mozilla.org/#/jobs?repo=try&revision=e11d0a5bebbe:
- Windows 7 64-bit (AMD Radeon HD6450) - machine where I originally crashed with this signature
bp-b7eae424-d5ef-40c0-a2fc-01a3d2150424
We also tested on a second machine:
- Windows 7 64-bit (NVidia Geforce 210) - machine where it originally crashed with signature from bug 1154271
bp-41721af3-829b-45c5-9536-8817f2150424
2) Build https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1b6324cbf4a:
Looks good with both machines from above, was unable to crash the build.
Could this build contain the fix for bug 1154271 as well, and bug 1154771 (I don`t have setup to verify this crash)?
Flags: needinfo?(bogdan.maris) → needinfo?(matt.woodrow)
Comment 12•10 years ago
|
||
Bogdan, could you clarify comment 11 a bit? Did you mean that build 1) still crashes for you and build 2) doesn't?
Flags: needinfo?(bogdan.maris)
Reporter | ||
Comment 13•10 years ago
|
||
Yes
Build 1) - I can still crash Firefox
Build 2) - Works fine, unable to crash Firefox
Flags: needinfo?(bogdan.maris)
Comment 14•10 years ago
|
||
Bogdan, do you have any idea why the stacks changed between comment #1 and comment #6?
Bug 1154271 and bug 1154771 both have very similar stacks to comment #1, where the crash is coming from a thread with CreateLayeredChild on the stack.
The stacks in comment #6 and comment #11 seem different and I'm not sure why.
Bas: The newer crashes aren't obviously on the ImageBridge thread (unlike the earlier ones), and avoiding using the content d3d device isn't fixing the crash. I guess that means that yet another thread has access to our content device, but I can't find anywhere that would happen.
Flags: needinfo?(matt.woodrow) → needinfo?(bogdan.maris)
Comment 15•10 years ago
|
||
It's possible that a gecko code change caused the different stacks, if so, getting a regression range would be really helpful.
Reporter | ||
Comment 16•10 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> Bogdan, do you have any idea why the stacks changed between comment #1 and
> comment #6?
>
> Bug 1154271 and bug 1154771 both have very similar stacks to comment #1,
> where the crash is coming from a thread with CreateLayeredChild on the stack.
>
> The stacks in comment #6 and comment #11 seem different and I'm not sure why.
Using the initial build where I got the signature from comment 1 ([@ atidxx32.dll@0x196f7]), now I get the signature from comment 6 ([@ atidxx32.dll@0x49299e ]).
bp-20eba869-da76-4309-8ec4-2a2a42150428
Since mozregression was of no help here, I did a manual regression and tested builds from the first time I got this crash until nightly from 2015.04.23.
Here are the results:
2015.03.19 - [@ atidxx32.dll@0x49299e ]
bp-38d4d3af-5c0b-4d3b-9c21-ae3b52150428
2015.03.20 - [@ atidxx32.dll@0x49299e ]
bp-af376f30-9bde-493f-9182-9cb712150428
2015.03.28 - [@ atidxx32.dll@0x49299e ]
bp- 97dacb01-c60d-4698-a020-277402150428
2015.04.06 - [@ atidxx64.dll@0x57d7bb ]
bp-9223dfa9-7853-43fa-9eed-6305f2150428
2015.04.23 - [@ atidxx32.dll@0x49299e ]
Latest Nightly:
2015.04.28 - [@ atidxx32.dll@0x49299e ]
bp-20825854-6c12-4af9-9fc5-c08bd2150428
Note:
- I did not actually tested every build from 2015.03.18 to 2015.04.28 and Today, I bisected just as mozregression would.
I don`t know what has changed but I can only get the signature from comment 6 on almost all builds, see above. I hope this helps...
Flags: needinfo?(bogdan.maris) → needinfo?(matt.woodrow)
Reporter | ||
Comment 17•10 years ago
|
||
I encountered the first signature again using latest Nightly and 38.0.5beta1 build 2:
- [@ atidxx32.dll@0x196f7 ]
bp-17baf221-d1f7-4122-a5c9-76d582150512
bp-88a80823-de88-4d1f-bf20-8332d2150512
bp-40a112bb-d067-46fb-90a9-fae702150512
Sometimes I get other signatures:
- [@ afhook32.dll@0x54de ] bp-3c07bf93-f9b4-45de-8e07-16db32150512
- [@ atidxx32.dll@0x509db0 ] bp-aaef48b2-87fc-4d57-8a01-c29c72150512
Steps:
1. Make a tab sharing call.
2. Send and Open link in other PC.
2. Resize Firefox from first PC.
Comment 18•10 years ago
|
||
It looks like our D2DFactory is created with multithreading, so I'm not sure what else could be causing this.
I guess we should just go ahead with marking the D3D device as multithreaded unless Bas has a better idea.
Flags: needinfo?(matt.woodrow)
Updated•10 years ago
|
backlog: --- → webRTC+
Comment 19•10 years ago
|
||
-> GFX, because it is. Waiting on bas to reply to matt I guess.
backlog: webRTC+ → ---
Rank: 10
Component: WebRTC: Audio/Video → Graphics
Priority: P1 → --
Updated•10 years ago
|
Attachment #8596969 -
Flags: review?(bas)
Comment 20•10 years ago
|
||
Comment on attachment 8596962 [details] [diff] [review]
multi-thread-protect
Review of attachment 8596962 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +2015,5 @@
>
> mD3D11ContentDevice->SetExceptionMode(0);
>
> + nsRefPtr<ID3D10Multithread> multi;
> + mD3D1ContentDevice->QueryInterface(__uuidof(ID3D10Multithread), getter_AddRefs(multi));
I'm assuming this is a typo, and it should be mD3D11ContentDevice?
Attachment #8596962 -
Flags: feedback?(bas)
Comment 21•10 years ago
|
||
Yes, it's spelt correctly in the try push :)
Comment 22•10 years ago
|
||
Comment on attachment 8596969 [details] [diff] [review]
Choose d3d11 device based on thread
This patch didn't fix the crash. I'm not sure why.
Attachment #8596969 -
Flags: review?(bas)
Comment 23•10 years ago
|
||
Comment on attachment 8596962 [details] [diff] [review]
multi-thread-protect
Review of attachment 8596962 [details] [diff] [review]:
-----------------------------------------------------------------
If this helps let's do it, although we really need to fix the fact we have this bug in the first place, there's no way this will totally fix things.
Attachment #8596962 -
Flags: feedback?(bas) → feedback+
Comment 24•9 years ago
|
||
Is the patch still needed? It looks like there have been zero reports of this crash in at least 28 days.
This got landed as a part of bug 1176363.
Updated•9 years ago
|
Flags: needinfo?(bas)
Comment 26•9 years ago
|
||
I suspect this was actually fixed by that bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•