Closed Bug 1129360 Opened 6 years ago Closed 6 years ago

Intermittent topA-heightN-bottomA.html?margin_parent | image comparison (==), max difference: 255, number of differing pixels: 1000

Categories

(Core :: Layout, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cbook, Assigned: nical)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

b2g_emulator_vm fx-team opt test reftest-15

https://treeherder.mozilla.org/logviewer.html#?job_id=1882420&repo=fx-team

21:04:37 INFO - REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/position-dynamic-changes/vertical/topA-heightN-bottomA.html?margin_parent | image comparison (==), max difference: 255, number of differing pixels: 1000
I haven't been able to pinpoint the cause of this recent spike yet, but I'm working on it.
Down to this range on inbound:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=5fc092347f17&tochange=8977d94c9d9a&filter-searchStr=b2g_emulator_vm reftest-14

I already did a Try run of bug 1126903 backed out earlier today and the failures reproduced. So this strongly points to bug 1127289. Try run of that backed out:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dc3042e954d
Flags: needinfo?(nical.bugzilla)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #18)
> Down to this range on inbound:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&fromchange=5fc092347f17&tochange=8977d94c9d9a&filter-
> searchStr=b2g_emulator_vm reftest-14
> 
> I already did a Try run of bug 1126903 backed out earlier today and the
> failures reproduced. So this strongly points to bug 1127289. Try run of that
> backed out:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dc3042e954d

Looks like I was wrong about bug 1127289.
this try push restores the line that has an effect on non-windows platforms:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=19e7f371e572
If it doesn't come out green let's back out 112729.
Flags: needinfo?(nical.bugzilla)
this fixes the intermittent failure.
Not certain why, but not all tiles are going through the UnlockTile call
that happens shortly after Validate. I suspect it has something to do with the weird POD-but-not-really-POD semantics of TileClient.
Assignee: nobody → nical.bugzilla
Attachment #8559799 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8559799 [details] [diff] [review]
add back the Unlock() call that was removed in bug 1127289

Review of attachment 8559799 [details] [diff] [review]:
-----------------------------------------------------------------

If the patch fixed the problem, it is OK as short term fix. Lock/unlock handling in ClientTiledLayerBuffer is very hard to track by reading the code. There might be possibilities of the followings. The tiling code already ensured that the followings do not happen?

- TextureClient is recycled to TextureClientPool without unlocking it.
- TextureClient is already lock by different mode when checking IsLocked() like the following in ClientTiledLayerBuffer::ValidateTile().

       > if (!backBuffer->IsLocked()) {
Attachment #8559799 - Flags: review?(sotaro.ikeda.g) → review+
Flags: needinfo?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> Comment on attachment 8559799 [details] [diff] [review]
> add back the Unlock() call that was removed in bug 1127289
> 
> Review of attachment 8559799 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> If the patch fixed the problem, it is OK as short term fix. Lock/unlock
> handling in ClientTiledLayerBuffer is very hard to track by reading the
> code.

I totally agree, I am trying to simplify this but it'll be easier when I understand what exactly is going on here.

> There might be possibilities of the followings. The tiling code
> already ensured that the followings do not happen?
> 
> - TextureClient is recycled to TextureClientPool without unlocking it.

Discard[Front/Back]Buffer checks whether the TextureClient is locked and unlocks it if so.

> - TextureClient is already lock by different mode when checking IsLocked()
> like the following in ClientTiledLayerBuffer::ValidateTile().
> 
>        > if (!backBuffer->IsLocked()) {

Interesting, I'm digging.
Flags: needinfo?(nical.bugzilla)
It might be better to add ASSERT checks in another bug.
I am not sure it will make a difference but this is more correct in principle.
Attachment #8559907 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8559907 [details] [diff] [review]
Use OPEN_READ_WRITE rather than OPEN_WRITE when locking a tile's back buffer

Review of attachment 8559907 [details] [diff] [review]:
-----------------------------------------------------------------

Yhea, it could make more sense. It could fix the situation that this lock is held until ClientTiledLayerBuffer::ValidateTile().
Attachment #8559907 - Flags: review?(sotaro.ikeda.g) → review+
Is there something up getting this fix landed? I would have backed out yesterday if I'd known this wasn't landing ASAP.
Flags: needinfo?(nical.bugzilla)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #33)
> Is there something up getting this fix landed? I would have backed out
> yesterday if I'd known this wasn't landing ASAP.

I was hoping to get a better fix, but I am still waiting for try results on that side.
So I just landed the temporary fix:

https://hg.mozilla.org/integration/mozilla-inbound/rev/73b8ce222f9f
Flags: needinfo?(nical.bugzilla)
See Also: → 1130681
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> Comment on attachment 8559907 [details] [diff] [review]
> Use OPEN_READ_WRITE rather than OPEN_WRITE when locking a tile's back buffer
> 
> Review of attachment 8559907 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Yhea, it could make more sense. It could fix the situation that this lock is
> held until ClientTiledLayerBuffer::ValidateTile().

I moved this to a separate bug (bug 1131038) and landed it there.
This should be fixed by bug 1130681, but waiting a few more days before closing.
See Also: → 1133484
[Mass Closure] Closing bug as the WORKSFORME as the intermittent failure has not been seen for 45+ days If this has been closed and you feel that it should Not have been closed, please reopen and add [leave open] to the whiteboard.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.