Closed Bug 1015718 Opened 10 years ago Closed 10 years ago

[OMTC] "Search" flickers in Addon-Manager

Categories

(Core :: Graphics: Layers, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla34
Tracking Status
firefox33 --- verified
firefox34 --- verified

People

(Reporter: elbart, Assigned: nical)

References

Details

Attachments

(4 files, 1 obsolete file)

Nightly 2014-05-24

STR:
- Open Addon-Manager.
- Search for anything in the "Search all add-ons"-textbox and hit enter.

AR:
"Search" opens, but the label flickers during the animation.

ER:
"Search opens without flickering.

Does not happen when OMTC or HWA is disabled.
Blocks: 899785
Regression range:

Last good revision: 1417d180a1d8 (2014-04-01)
First bad revision: 4941a2ac0786 (2014-04-02)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1417d180a1d8&tochange=4941a2ac0786

Last good revision: c6e7e5965fb8
First bad revision: ba93b2f34a3f
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c6e7e5965fb8&tochange=ba93b2f34a3f
> Last good revision: c6e7e5965fb8
> First bad revision: ba93b2f34a3f
> Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=c6e7e5965fb8&tochange=ba93b2f34a3f

sO ONE OF Bug 982413, Bug 989904, Bug 989883
Flags: needinfo?(nical.bugzilla)
It looks like the same issue as bug 1017395. I haven't been able to reproduce these issues so far, unfortunately. Looks like it happens with intel drivers on Windows 7 (but not 8) specifically.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
The text and the panels in the accordion on www.mozilla.org are also flickering with OMTC on, and the white flicker started during the same regression-range as this bug.
See Also: → 1016408
FWIW, when using a build off mozilla-central-debug, the following WARNING is shown every time the flickering is happening:

> WARNING: Failed to open shared texture: file c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\gfx\layers\d3d11/TextureD3D11.cpp, line 333

Nightly debug 2014-07-07
At the moment I don't have a computer that reproduces the bug, so this is a little bit of a shot in the dark.

I spent some time comparing the old TextureClientD3D11 and the new one, and the only difference that I have found so far in how we use D3D/DXGI/D2D is that old textures were limited to BGRX, BGRA and Alpha surfaces. This patch makes sure we are using one of the above 3 formats.

If someone that can reproduce the bug can test with this patch it'd be awesome!
Could you create a debug try-build with these changes for me to test? I don't think I'm proficient enough in building Firefox with custom changes myself.
(In reply to Elbart from comment #7)
> Could you create a debug try-build with these changes for me to test? I
> don't think I'm proficient enough in building Firefox with custom changes
> myself.

There we go: https://tbpl.mozilla.org/?tree=Try&rev=7e00954e3f3f
Thanks a lot for testing!
The try-build failed. Maybe because

> switch aFormat {

should be

> switch (aFormat) {

?
Thanks. The flicker and warning are still being triggered, though.
TextureClientD3D11::ToSurfaceDescriptor gets a HANDLE for the ID3D10Texture2D which doesn't hold any sort of reference to the texture object.

DXGITextureHostD3D11::Lock is what calls OpenSharedResource and converts the HANDLE into a ID3D11Texture2D and takes a reference to the underlying texture object.

It seems like a pretty clear race here, and if we destroy the TextureClient before we lock the TextureHost (as is likely to happen during animations) then OpenSharedResource will fail and we'll get flickering/blackness.

Do we guarantee that the TextureHost is constructed before the TextureClient is destroyed? If so, then we can move the OpenSharedResource call from Lock into the ctor. If we don't then we'll need something more involved to prevent this from racing.
(In reply to Matt Woodrow (:mattwoodrow) from comment #12)
> TextureClientD3D11::ToSurfaceDescriptor gets a HANDLE for the
> ID3D10Texture2D which doesn't hold any sort of reference to the texture
> object.

Gosh! I have already fixed this bug in two different TextureClient implems. I was convinced it was something that had changed with the new textures.

> 
> Do we guarantee that the TextureHost is constructed before the TextureClient
> is destroyed?

No, but we should put something in place to ensure this, it's not very hard and it would help with this issue in the different implementations.
In the other cases, it's often a reference counted object in a same-process scenario that we pass around as a unitptr_t. so we addref it before sending and release it when receiving on the other side. The other handle-based implementations are either deeply integrated into IPDL (shmem) or have some centralized (SharedBufferManager) logic and are created from the Parent side so they don't have this issue.
I don't know exactly how DXGI works but we may end up making all of the allocations on the parent side at some point for security reasons, so we won't have the problem in its current form.

> If so, then we can move the OpenSharedResource call from Lock
> into the ctor. If we don't then we'll need something more involved to
> prevent this from racing.

The issue is that we don't know for sure that the compositor is using the same device as the one returned by gfxPlatform, do we? Are all widgets using the same device? If so, is it a constraint we want to rely on? (open question, really, if it's a sane thing to rely on it'll greatly simplify things).
(In reply to Nicolas Silva [:nical] from comment #15)
> The issue is that we don't know for sure that the compositor is using the
> same device as the one returned by gfxPlatform, do we? Are all widgets using
> the same device? If so, is it a constraint we want to rely on? (open
> question, really, if it's a sane thing to rely on it'll greatly simplify
> things).

Err, nevermind I just checked and it's actually always the same device (at least at the moment).
And we definitely should deserialize the handle when creating the TextureHost rather than in Lock.
Depends on: 1039535
Attachment #8456662 - Flags: review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #14)
> Compile failed, trying again:
> https://tbpl.mozilla.org/?tree=Try&rev=2238f547bfeb

Crashes right after the warning appears, and the crashreporter isn't fetching it:

  Problem Event Name:	APPCRASH
  Application Name:	firefox.exe
  Application Version:	33.0.0.5309
  Application Timestamp:	53c62b8b
  Fault Module Name:	xul.dll
  Fault Module Version:	33.0.0.5309
  Fault Module Timestamp:	53c62ac6
  Exception Code:	c0000005
  Exception Offset:	007fd694
  OS Version:	6.1.7601.2.1.0.256.1
  Locale ID:	3079
  Additional Information 1:	0a9e
  Additional Information 2:	0a9e372d3b4ad19135b953a78882e789
  Additional Information 3:	0a9e
  Additional Information 4:	0a9e372d3b4ad19135b953a78882e789
Oops, that's my fault. I'll push a fixed version soon, once the tree opens again.

The fact that we still get the error suggests that we need to ensure the TextureHost is fully constructed before the TextureClient is destroyed. I'd expect this patch to reduce the frequency of the failed rendering, but not eliminate it in this case.
(In reply to Matt Woodrow (:mattwoodrow) from comment #18)
> Oops, that's my fault. I'll push a fixed version soon, once the tree opens
> The fact that we still get the error suggests that we need to ensure the
> TextureHost is fully constructed before the TextureClient is destroyed. I'd
> expect this patch to reduce the frequency of the failed rendering, but not
> eliminate it in this case.

Yes, this will be bug 1039535.
(In reply to Matt Woodrow (:mattwoodrow) from comment #19)
> https://tbpl.mozilla.org/?tree=Try&rev=00bce23a617a

Warning is popping up, and I can't say that it's shown not as often as before.
Keep the D3D texture alive until the end of the TextureClient's shutdown handshake
Attachment #8458644 - Flags: review?(bas)
Fixed a mistake from the previous version.

What's puzzling, though, is that without the patches, even when inserting sleep calls just before the deserialization to make it very likely that the TextureClient will be destroyed before deserialization. I still don't see any flickering in the addon manager nor when playing html5 videos.
Attachment #8458644 - Attachment is obsolete: true
Attachment #8458644 - Flags: review?(bas)
Attachment #8458668 - Flags: review?(bas)
See Also: 1016408
(In reply to Nicolas Silva [:nical] from comment #24)
> try push: https://tbpl.mozilla.org/?tree=Try&rev=4b369933022e

No flickering, no warnings.
(In reply to Elbart from comment #26)
> (In reply to Nicolas Silva [:nical] from comment #24)
> > try push: https://tbpl.mozilla.org/?tree=Try&rev=4b369933022e
> 
> No flickering, no warnings.

Awesome! Thanks again for testing!
Nice nical! Feel free to land patch as well if you think we need it.
Attachment #8458668 - Flags: review?(bas) → review+
(In reply to Nicolas Silva [:nical] from comment #29)
> landed directly to central so that it gets to aurora today:
> https://hg.mozilla.org/mozilla-central/rev/02dd0a2e01e8
> https://hg.mozilla.org/mozilla-central/rev/93b9f79b978d

and thats caused a bustage like https://tbpl.mozilla.org/php/getParsedLog.php?id=44337989&tree=Mozilla-Central
(In reply to Nicolas Silva [:nical] from comment #24)
> try push: https://tbpl.mozilla.org/?tree=Try&rev=4b369933022e

Aren't try servers running with -WError?
Comment on attachment 8453868 [details] [diff] [review]
Make sure the surface format is BGRX, BGRA or A8 in TextureClientD3D11

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Windows OMTC blocker - random black flashes during rendering when OMTC is enabled.
[Describe test coverage new/current, TBPL]: 
[Risks and why]: low, but it's worth letting the patches bake in m-c for a day be sure. very easy to backout.
[String/UUID change made/needed]:
Attachment #8453868 - Flags: approval-mozilla-aurora?
Comment on attachment 8453868 [details] [diff] [review]
Make sure the surface format is BGRX, BGRA or A8 in TextureClientD3D11

wrong attachment, sorry.
Attachment #8453868 - Flags: approval-mozilla-aurora?
Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Blocking OMTC on Windows. Random black flashes when using OMTC on Windows.
[Describe test coverage new/current, TBPL]:
[Risks and why]: low risk, easy to back out, worth letting bake in m-c for a day just to be sure.
[String/UUID change made/needed]:
Attachment #8461396 - Flags: approval-mozilla-aurora?
Nightly 20140723 20140723030202 82df3654cd80: flickering
m-c-debug 20140723 20140722182256 82df3654cd80: flickering, warnings

Nightly 20140724 20140724030201 a91ec42d6a9c: no flickering
m-c-debug 20140724 20140723183607 757608ac7f19: no flickering, no warnings
And a surprising but sweet perf improvement on TResize:

Improvement: Mozilla-Inbound-Non-PGO - TResize - WINNT 6.1 (ix) - 8.93% decrease
Improvement: Mozilla-Inbound-Non-PGO - TResize - WINNT 6.2 x64 - 15.4% decrease

Unless the improvement comes from Bug 1031261

Thanks again Elbart for the testing you did here, it helped a lot.
QA Whiteboard: [qa+]
I will approve it during the week end.
(In reply to Nicolas Silva [:nical] from comment #40)
> Thanks again Elbart for the testing you did here, it helped a lot.

No, I have to thank you. :)
Attachment #8461396 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Couldn't reproduce this issue nor the one from comment 4 with Nightly 2014-05-24 on 2 different Windows 7 x64 machines (both AMD and Intel).

Elbart, could you please verify that this is fixed on latest Aurora too?
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/

Based on comment 39 marking as verified on 34 branch.
Status: RESOLVED → VERIFIED
Flags: needinfo?(elbart)
No bug in 33.0a2 of 2014-09-02.
Flags: needinfo?(elbart)
Many thanks, Elbart!
Based on comment 45, marking accordingly.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: