Closed Bug 900248 Opened 11 years ago Closed 11 years ago

Reconstruct d3d9 device when mSwapChain->PrepareForRendering() fails

Categories

(Core :: Graphics: Layers, defect)

x86_64
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox28 --- fixed

People

(Reporter: nrc, Assigned: nrc)

References

Details

(Whiteboard: [qa-])

Attachments

(11 files, 8 obsolete files)

18.20 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
3.89 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
24.02 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
28.77 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
32.43 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
3.53 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
4.34 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
9.90 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
13.13 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
5.97 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
1.68 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
No description provided.
Assignee: nobody → ncameron
Attachment #792524 - Flags: review?(matt.woodrow)
Attachment #792525 - Flags: review?(bas)
Attachment #792525 - Attachment description: Rebuild the d3d9 device when it goes away → Patch 2: Rebuild the d3d9 device when it goes away
Attachment #792529 - Flags: review?(bas)
Attachment #792524 - Flags: review?(matt.woodrow) → review+
Comment on attachment 792529 [details] [diff] [review] Patch 4: making an effect might fail Review of attachment 792529 [details] [diff] [review]: ----------------------------------------------------------------- Why does creating an affect fail? If we understand that this is fine with me :P It seems bad though.
Attachment #792529 - Flags: review?(bas) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #5) > Comment on attachment 792529 [details] [diff] [review] > Patch 3: making an effect might fail > > Review of attachment 792529 [details] [diff] [review]: > ----------------------------------------------------------------- > > Why does creating an affect fail? If we understand that this is fine with me > :P It seems bad though. It fails because our texture cannot be created and we don't allocate a texture. So we don't know what format our texture is and so the switch in CreateTexturedEffect hits the default and we don't have an Effect. An alternative is to just set a dummy texture format and create an effect for it (in theory it should never get used). But I feel better doing things this way and not worrying about effects built around non-existent textures.
Attachment #795246 - Flags: review?(bas)
Attachment #792525 - Attachment is obsolete: true
Attachment #792525 - Flags: review?(bas)
Attachment #795247 - Flags: review?(bas)
Attachment #792529 - Attachment description: Patch 3: making an effect might fail → Patch 4: making an effect might fail
Just rebased slightly
Attachment #792530 - Attachment is obsolete: true
Attachment #792530 - Flags: review?(bas)
Attachment #795249 - Flags: review?(bas)
Attachment #795246 - Flags: review?(bas) → review+
Comment on attachment 795249 [details] [diff] [review] Patch 5: don't expect device() to succeed Review of attachment 795249 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d9/CompositorD3D9.cpp @@ +210,5 @@ > return; > } > > + IDirect3DDevice9* d3d9Device = device(); > + MOZ_ASSERT(d3d9Device, "We should be able to get a device now"); As far as I can tell you're still expecting it to succeed? It's just triggering an Assert in debug builds now. I sort of like this change though.
Attachment #795249 - Flags: review?(bas) → review+
Comment on attachment 795247 [details] [diff] [review] Patch 3: rebuild d3d9 patch (v2, don't kill SYSTEMMEM textures) Review of attachment 795247 [details] [diff] [review]: ----------------------------------------------------------------- I'm sure there's a good reason, but I'm a little unclear as to why we need to change this much (gfxWindowsPlatform for example), when we -already- need to be able to recreate our device currently for D3D9, could you explain this in the bug? ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp @@ -705,5 @@ > > hr = mDevice->Reset(&pp); > ++mDeviceResetCount; > > - if (hr == D3DERR_DEVICELOST) { Can we just remove this? I'd like to understand more of this.
(In reply to Bas Schouten (:bas.schouten) from comment #10) > Comment on attachment 795249 [details] [diff] [review] > Patch 5: don't expect device() to succeed > > Review of attachment 795249 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/layers/d3d9/CompositorD3D9.cpp > @@ +210,5 @@ > > return; > > } > > > > + IDirect3DDevice9* d3d9Device = device(); > > + MOZ_ASSERT(d3d9Device, "We should be able to get a device now"); > > As far as I can tell you're still expecting it to succeed? It's just > triggering an Assert in debug builds now. I sort of like this change though. In this case if we fail to get a device, we should never get past the first return guard
(In reply to Bas Schouten (:bas.schouten) from comment #11) > Comment on attachment 795247 [details] [diff] [review] > Patch 3: rebuild d3d9 patch (v2, don't kill SYSTEMMEM textures) > > Review of attachment 795247 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'm sure there's a good reason, but I'm a little unclear as to why we need > to change this much (gfxWindowsPlatform for example), when we -already- need > to be able to recreate our device currently for D3D9, could you explain this > in the bug? > Most of the code for actually recreating the device does not change. The changes in gfxWindowsPlatform is because, previously, the LayerManager had the only reference to a device. Now gfxPlatform keeps a reference (because we need to get to it from main and compositor threads) so when we destroy the device we need to drop the reference in gfxWindowsPlatform. We need to invalidate everything to cause it to be redrawn after resetting the device, there is currently no way to post such a message from the compositor to the main thread (obv, that is not an issue without a compositor thread). Most of the small changes in random files are due to this. > ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp > @@ -705,5 @@ > > > > hr = mDevice->Reset(&pp); > > ++mDeviceResetCount; > > > > - if (hr == D3DERR_DEVICELOST) { > > Can we just remove this? I'd like to understand more of this. Removing this code was needed to make resetting work correctly. Without removing it when we lock and reset we end up waiting an age. I'm not sure why that isn't an issue with main thread compositing. Removing the code didn't seem to have any negative effects.
Do you need anything more than the info in comment 13 for a re-review?
Flags: needinfo?(bas)
(In reply to Nick Cameron [:nrc] from comment #14) > Do you need anything more than the info in comment 13 for a re-review? Nope, I've just been on Vacation. I'll have this reviewed before my EOD thursday.
Flags: needinfo?(bas)
(In reply to Nick Cameron [:nrc] from comment #13) > (In reply to Bas Schouten (:bas.schouten) from comment #11) > > ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp > > @@ -705,5 @@ > > > > > > hr = mDevice->Reset(&pp); > > > ++mDeviceResetCount; > > > > > > - if (hr == D3DERR_DEVICELOST) { > > > > Can we just remove this? I'd like to understand more of this. > > Removing this code was needed to make resetting work correctly. Without > removing it when we lock and reset we end up waiting an age. I'm not sure > why that isn't an issue with main thread compositing. Removing the code > didn't seem to have any negative effects. I think this code solves some exotic multi-monitor problem. Note you should -only- reset on the same thread as where you create the device. That's a D3D9 behavioral restriction.
(In reply to Bas Schouten (:bas.schouten) from comment #16) > (In reply to Nick Cameron [:nrc] from comment #13) > > (In reply to Bas Schouten (:bas.schouten) from comment #11) > > > ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp > > > @@ -705,5 @@ > > > > > > > > hr = mDevice->Reset(&pp); > > > > ++mDeviceResetCount; > > > > > > > > - if (hr == D3DERR_DEVICELOST) { > > > > > > Can we just remove this? I'd like to understand more of this. > > > > Removing this code was needed to make resetting work correctly. Without > > removing it when we lock and reset we end up waiting an age. I'm not sure > > why that isn't an issue with main thread compositing. Removing the code > > didn't seem to have any negative effects. > > I think this code solves some exotic multi-monitor problem. Note you should > -only- reset on the same thread as where you create the device. That's a > D3D9 behavioral restriction. I tested as much as I could think of on multiple monitors - moving monitor to monitor, docking to the various sides of both monitors, using the window across the two monitors - everything was OK. I am on Win7, although everything else seems to be dependent on the d3d version, not win version. I wonder if creating textures on the main thread is causing me to create the device there. We should only be creating on the compositor thread. I'll investigate tomorrow.
(In reply to Bas Schouten (:bas.schouten) from comment #16) > (In reply to Nick Cameron [:nrc] from comment #13) > > (In reply to Bas Schouten (:bas.schouten) from comment #11) > > > ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp > > > @@ -705,5 @@ > > > > > > > > hr = mDevice->Reset(&pp); > > > > ++mDeviceResetCount; > > > > > > > > - if (hr == D3DERR_DEVICELOST) { > > > > > > Can we just remove this? I'd like to understand more of this. > > > > Removing this code was needed to make resetting work correctly. Without > > removing it when we lock and reset we end up waiting an age. I'm not sure > > why that isn't an issue with main thread compositing. Removing the code > > didn't seem to have any negative effects. > > I think this code solves some exotic multi-monitor problem. Note you should > -only- reset on the same thread as where you create the device. That's a > D3D9 behavioral restriction. So, you were spot on - if I only create and reset on the compositor thread, I can leave this code in and everything is OK. So, I'll do that. Interdiff patch coming up...
Attachment #807563 - Flags: review? → review?(bas)
This patch undoes the removal of the code from device manager d3d9, but adds notification that we have reset the device (because we have). It also makes a minor change to CompositorD3D9::BeginFrame - after the device is reset we skip compositing because we need to wait for the content thread to regenerate all our textures - this improves the visual effect because we never see a blank window.
Attachment #807571 - Flags: review?(bas)
Attachment #807571 - Flags: review?(bas) → review+
Attachment #807563 - Flags: review?(bas) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #15) > (In reply to Nick Cameron [:nrc] from comment #14) > > Do you need anything more than the info in comment 13 for a re-review? > > Nope, I've just been on Vacation. I'll have this reviewed before my EOD > thursday. Can I please get this review? I realise you've been on vacation, but the patch has been here for a month now without a proper review. This is the last thing blocking turning on Windows OMTC, so it would be awesome if you could bump it up your todo list. Thanks!
Flags: needinfo?(bas)
Attachment #795246 - Attachment is obsolete: true
Attachment #811634 - Flags: review?(bas)
Flags: needinfo?(bas)
Updated to use device manager's list of textures
Attachment #811635 - Flags: review?(bas)
Comment on attachment 811634 [details] [diff] [review] Release based on a list on the device manager. Review of attachment 811634 [details] [diff] [review]: ----------------------------------------------------------------- I'm a lot happier about the new method. ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp @@ +854,5 @@ > + return nullptr; > + } > + > + if (aPool != D3DPOOL_SYSTEMMEM) { > + MOZ_ASSERT(aTextureHost, "We need a texture host to track so we can release the texture."); I believe you can make this == D3DPOOL_DEFAULT, I do not believe you need to track Managed textures but I might be wrong. ::: gfx/layers/d3d9/TextureD3D9.cpp @@ +316,3 @@ > data, > surf->Stride(), > + IntSize(tileRect.width, tileRect.height), You should now just be able to use tileRect.size()! There's a couple of more things like this, let's fix them while we're at it :) It'll make life better! @@ +359,5 @@ > MOZ_ASSERT(aImage.type() == SurfaceDescriptor::TYCbCrImage); > > YCbCrImageDataDeserializer yuvDeserializer(aImage.get_YCbCrImage().data().get<uint8_t>()); > > gfxIntSize gfxCbCrSize = yuvDeserializer.GetCbCrSize(); Use ToIntSize() from gfx2DGlue.h and avoid this temporary. @@ +361,5 @@ > YCbCrImageDataDeserializer yuvDeserializer(aImage.get_YCbCrImage().data().get<uint8_t>()); > > gfxIntSize gfxCbCrSize = yuvDeserializer.GetCbCrSize(); > gfxIntSize size = yuvDeserializer.GetYSize(); > mSize = IntSize(size.width, size.height); Same here. @@ +367,3 @@ > mStereoMode = yuvDeserializer.GetStereoMode(); > > + mTextures[0] = DataToTexture(gfxWindowsPlatform::GetPlatform()->GetD3D9DeviceManager(), It's probably better to store this on the stack in a temp deviceManager, both for readability and avoiding needless work.
Attachment #795247 - Attachment is obsolete: true
Attachment #795247 - Flags: review?(bas)
I totally forgot to upload this, stupid jet-lag.
Attachment #811634 - Attachment is obsolete: true
Attachment #811634 - Flags: review?(bas)
Attachment #812950 - Flags: review?(bas)
Attachment #812950 - Attachment is obsolete: true
Attachment #812950 - Flags: review?(bas)
Attachment #827278 - Flags: review?(bas)
Updated so it works with LayerManagerD3D9 - change to how device manager is created in LayerManagerD3D9 and gfxWindowsPlatform::GetD3D9DeviceManager only.
Attachment #807563 - Attachment is obsolete: true
Comment on attachment 827278 [details] [diff] [review] patch 2: Release based on a list on the device manager. Review of attachment 827278 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d9/DeviceManagerD3D9.h @@ +180,5 @@ > + > + /** > + * Creates a texture using our device. > + * If needed, we keep a record of the new texture, so the texture can be > + * released. Add a comment that we need a texturesource in that case. @@ +274,5 @@ > > /* Our vertex declaration */ > nsRefPtr<IDirect3DVertexDeclaration9> mVD; > > + /* We maintain a doubly linked list of all d3d9 texture hosts which host What's wrong with an std::list of TextureSource pointers? :)
Attachment #827280 - Flags: review+
(In reply to Bas Schouten (:bas.schouten) from comment #28) > Comment on attachment 827278 [details] [diff] [review] > patch 2: Release based on a list on the device manager. > > Review of attachment 827278 [details] [diff] [review]: > ----------------------------------------------------------------- > > @@ +274,5 @@ > > > > /* Our vertex declaration */ > > nsRefPtr<IDirect3DVertexDeclaration9> mVD; > > > > + /* We maintain a doubly linked list of all d3d9 texture hosts which host > > What's wrong with an std::list of TextureSource pointers? :) Explained over irc, that we need access to the nodes for O(1) removal. Bas says we can keep hold of the iterators which is essentially access to the nodes of the list. But then we are getting complicated again...
Whilst investigating some questions from Bas I discovered an intermittent crash with these patches when locking and unlocking the screen. There seems to be a number of reasons that happened, some of which relate to review questions. The most important one is that we cannot rely on destruction of the device manager to release texture resources. If there is more than one compositor around (which happens when we have more than one window, but also (and only sometimes, thus the intermitent-ness) with only one window (I'm not sure why). So we need to directly release the textures when we detect a problem with the device. This is doable, but brings up another issue (see below). However, we do need to release textures in the managed pool, not before we release the device, but before we use the next one. That's because you can only use a texture with the device it was created with, otherwise we get an error. Technically, if we only reset the device rather than create a new one, we don't need to do this. But atm, we don't expose that information outside the DeviceManager. Which relates back to the question which is going to end this comment. We do need to invalidate the content side. This makes the period when we render black much shorter (without invalidation, we can be waiting forever, sometimes). I assume this is because we need to refill the new managed textures that we recreate. I didn't investigate any further. So, the issue I have is in VerifyReadyForRendering, we can fail to reset in two ways - either we are optimistic and think we can get away resetting again later, so we return false, but don't need the DeviceManager to die, or we think things are catastrophically bad and we should kill the device and thus the DeviceManager itself. In both cases, we return false, so there is no difference to the outside world. Unfortunately, it is the outside world that has to kill the DeviceManager and release the textures in the second case. But, we might also have to do this if the swap chain fails to initialise, which the DeviceManager doesn't know about, so we can't move everything inside the DeviceManager. Also, if the first case happens more than a few times in a row, we probably want to try destroying the device completely, rather than just waiting around. We don't actually do this, despite keeping a count (mDeviceResetCount), which looks like it could be used for this, but is not (it is only used by LayerManagerD3D9). The easy solution is to just get rid of the code which is optimistic, that might cause appearance regressions with multiple screens on XP. It does not cause anything bad to happen on Windows 7. That is what I did originally and patch 7 undoes that. I don't know how many users we have with multiple screens on XP and Vista, but presumably it would be better not to cause them issues (the issue would be their browser turning black for a few seconds when it moves screen, which is not very pretty, but not the end of the world). Another solution would be to return some flag, rather than true or false to the compositor so it can determine whether we got a reset and optimism or real failure, then add logic to deal with that (give up after a number of retries, release textures sometimes, etc.). It is probably a better solution, but adds a little complexity. I don't want to wait for the next paint to do this like we did for LayerManagerD3D9. I think that now our invalidation is better, we could be waiting a long time. So I prefer to keep trying to composite until we succeed, but that means if we are waiting a long time for the device, then we block the main thread (since we are in a sync transaction) until we get that. We are never worse than main thread compositing and this is an exceptional circumstance, so I think that is OK.
Comment on attachment 807571 [details] [diff] [review] patch 7: Undo removal of earlier code. As discussed with Bas on irc, removing this code after all because it complicates the reset code and the savings are pretty minimal (it only affects Windows XP users when the add or remove a monitor from their system, we can afford a few moments of black window in that case).
Attachment #807571 - Attachment is obsolete: true
Don't use managed pool d3d9 textures. They don't buy us much and their use when resetting the device is a pain - unlike default pool textures, they don't need releasing before reset, but they do need releasing if we recreate the device. That makes things complicated. Also, removing means one less path in the texture code. And they were a bad idea anyway - they got removed in D3D9Ex.
Attachment #8333551 - Flags: review?(bas)
For anyone else (or future me) who has to come and tackle this problem (and because it is not covered by automated tests) here are the actions I found useful for testing: 1 - minimise and un-minimise the window 2 - ctrl+alt+delete and then cancel 3 - do 2 many times in rapid succession 4 - open two windows and do 2 5 - minimise the window, then do 2, then restore the window (this turned out to be particularly annoying) 6 - unplug a monitor whilst the window is on the monitor being unplugged I tried a few other things which weren't interesting: maximising the window moving the window from one monitor to another moving the window so it is across two monitors do 4 from the above list with one window on each monitor changing tabs before and after 2
Attachment #8333551 - Flags: review?(bas) → review+
Try push: https://tbpl.mozilla.org/?tree=Try&rev=28737e0c1c89 (the Win7 R failures are fixed by another patch not in that queue)
Comment on attachment 8334292 [details] [diff] [review] patch 10: an extra safety check in TextureD3D9 and some refactoring Review of attachment 8334292 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d9/TextureD3D9.h @@ +198,5 @@ > } > > protected: > gfx::IntRect GetTileRect(uint32_t aID) const; > + void Reset() I'd prefer to just have this in the implementation file.
Attachment #8334292 - Flags: review?(bas) → review+
Comment on attachment 8334290 [details] [diff] [review] patch 8: handle resetting and recreating devices better Review of attachment 8334290 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d9/DeviceManagerD3D9.cpp @@ +720,5 @@ > pp.Windowed = TRUE; > pp.PresentationInterval = D3DPRESENT_INTERVAL_DEFAULT; > pp.hDeviceWindow = mFocusWnd; > > + // if we got this far, we knoe !SUCCEEDEED(hr), that means hr is one of typo: know
Attachment #8334290 - Flags: review?(bas) → review+
Attachment #827278 - Flags: review?(bas) → review+
Comment on attachment 8334291 [details] [diff] [review] patch 9: better resets - logic for CompositorD3D9 Review of attachment 8334291 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/composite/LayerManagerComposite.cpp @@ +169,5 @@ > NS_WARNING("Call on destroyed layer manager"); > return; > } > > + mIsCompositorReady = true; This is weird... if mIsCompositorReady is false.. we'll never get to this point anyway if I understand correctly.. @@ +198,5 @@ > + if (!mIsCompositorReady) { > + return; > + } > + mIsCompositorReady = false; > + nit: whitespace
(In reply to Bas Schouten (:bas.schouten) from comment #40) > Comment on attachment 8334291 [details] [diff] [review] > patch 9: better resets - logic for CompositorD3D9 > > Review of attachment 8334291 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/layers/composite/LayerManagerComposite.cpp > @@ +169,5 @@ > > NS_WARNING("Call on destroyed layer manager"); > > return; > > } > > > > + mIsCompositorReady = true; > > This is weird... if mIsCompositorReady is false.. we'll never get to this > point anyway if I understand correctly.. > mIsCompositorReady should be false outside of a transaction and starts that way, we only check it when we end the transaction. If we got to this stage then we are ready to composite and so can change it to true.
I can't repro the situation, so this is guess work. But I think it duplicates the logic of the old code.
Attachment #8335793 - Flags: review?(bas)
To answer the do we always need to invalidate everything question - I think there _is_ a more efficient solution where we only need to re-composite. Textures used by SYSTEMMEM and shmem texture client/hosts should be up to date and still valid. Therefore we just need to re-upload and re-composite (note that just compositing is not enough though). I'm not sure about the DIB texture client/hosts - Bas thinks we can wrap those around SYSTEMMEM rather than around whatever they use currently (which is up to Cairo). Then we need to add a mechanism to trigger a complete re-upload of the layer tree from the compositor (I guess we just need to make the call to the layer manager and then down the layer tree, but I think there is no facility for that at the moment), then re-composite. That should be more efficient than the whole invalidate the world approach. But this is not really perf-critical code (Windows just generally chugs along for a minute or two anyway at this stage) so I vote to do that as a follow-up.
Attachment #8335793 - Flags: review?(bas) → review+
Comment on attachment 811635 [details] [diff] [review] Patch 3: rebuild d3d9 patch (v2, don't kill SYSTEMMEM textures) Review of attachment 811635 [details] [diff] [review]: ----------------------------------------------------------------- It doesn't make me too happy, but it'll have to do for now I suppose.
Attachment #811635 - Flags: review?(bas) → review+
Comment on attachment 8334291 [details] [diff] [review] patch 9: better resets - logic for CompositorD3D9 Review of attachment 8334291 [details] [diff] [review]: ----------------------------------------------------------------- All this stuff makes my eyes bleed, but that's D3D9 for us :).
Attachment #8334291 - Flags: review?(bas) → review+
Blocks: 941886
Filed bug 941886 as a follow-up to avoid the InvalidateAll business.
Depends on: 942819
Since this landed, Nightlies go completely black on my Windows XP machine after the screen locks. So. Hit ctrl-alt-del, hit cancel, black Firefox which redraws in pieces as parts of the UI are used. Is really rather annoying. ehoogeveen offered this bug up as a suspect. Nightly from 22nd worked, nightly from 23rd on (so far on 27th), all broken. Hoping maybe bug #941886 will fix this?
Blocks: 944087
(In reply to nemo from comment #49) > Since this landed, Nightlies go completely black on my Windows XP machine > after the screen locks. > So. Hit ctrl-alt-del, hit cancel, black Firefox which redraws in pieces as > parts of the UI are used. > Is really rather annoying. > ehoogeveen offered this bug up as a suspect. Nightly from 22nd worked, > nightly from 23rd on (so far on 27th), all broken. > > Hoping maybe bug #941886 will fix this? GAH! That is really bad, it is exactly what this bug is meant to fix and I thought I managed to avoid that situation. 941886 won't fix this, the fix needs more invalidation, not less. Filed bug 944087.
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: