1361918 - [meta] TextureClient::TryReadLock() shows up a lot in the 8-second BHR hang data

Reporter

Description

•

7 years ago

See <https://people-mozilla.org/~mlayzell/bhr/20170429/all.html>, and search for "TextureClient::TryReadLock()".  These are stack traces that are captured from periods of times when the main thread has been unresponsive (i.e., not processing events) for 8 seconds.  This doesn't necessarily mean that the stack trace was an 8 second hang itself, but that it was the tail of an 8 second execution where we didn't process any events from our event queue.

Obviously the paints happening here can be expensive, but then again 

The code here <https://searchfox.org/mozilla-central/rev/b0e1da2a90ada7e00f265838a3fafd00af33e547/gfx/layers/client/TextureClient.cpp#465> allows for a maximum of 500ms jank on the main thread each time it is called in the worst cast, at least in theory.  That's really bad.  :-(

We know this has regressed LinkedIn (bug 1349696), and this has regressed Talos (bug 1339454).  This is now on beta ongoing to be shipped in 54.

Can we do something about this please Matt?  Do we have any options, such as reducing this timeout?  Or backing out bug 1325227?

Flags: needinfo?(matt.woodrow)

(no longer active)

Reporter

Updated

•

7 years ago

Updated

•

7 years ago

Depends on: 1339454

(no longer active)

Reporter

Updated

•

7 years ago

No longer depends on: 1339454

(no longer active)

Reporter

Comment 1

•

7 years ago

(Although the Talos regressions may be unrelated... but perhaps it's interesting to experiment with different values for the lock timeout on Talos and see if numbers change in a meaningful way?)

Matt Woodrow (:mattwoodrow)

Comment 2

•

7 years ago

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #0)
> 
> We know this has regressed LinkedIn (bug 1349696), and this has regressed
> Talos (bug 1339454).  This is now on beta ongoing to be shipped in 54.

Bug 1349696 wasn't obviously a regression to me, and the newer profiles didn't show the read lock at all, just general GPU slowness.

There were a couple of talos regressions, but also a much larger number of improvements (both in number and magnitude).

> 
> Can we do something about this please Matt?  Do we have any options, such as
> reducing this timeout?  Or backing out bug 1325227?

As above, it's not obvious to me that anything is really worse than what we had previously (where we had an unbounded sync call blocking on the compositor), so backing out doesn't seem like a great choice at this point.

We probably don't want to be waiting 500ms per-lock, having that be the total accumulated wait for the paint would be much better.

500ms may also be unnecessarily large, at least for the potential case where the compositor has forgotten to actually release the lock we're waiting on.

I'm pretty busy with retained-dl stuff at the moment unfortunately, but I can try take a look at this soon.

Flags: needinfo?(matt.woodrow)

Ethan Lin[:ethlin]

Updated

•

7 years ago

Whiteboard: [qf:p1][bhr] → [qf:p1][bhr][gfx-noted]

(no longer active)

Reporter

Comment 3

•

7 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> (In reply to :Ehsan Akhgari (super long backlog, slow to respond) from
> comment #0)
> > 
> > We know this has regressed LinkedIn (bug 1349696), and this has regressed
> > Talos (bug 1339454).  This is now on beta ongoing to be shipped in 54.
> 
> Bug 1349696 wasn't obviously a regression to me, and the newer profiles
> didn't show the read lock at all, just general GPU slowness.

Yeah sorry I tested that myself in the end and in turned out to not be an issue.

> There were a couple of talos regressions, but also a much larger number of
> improvements (both in number and magnitude).

Fair enough.

> > 
> > Can we do something about this please Matt?  Do we have any options, such as
> > reducing this timeout?  Or backing out bug 1325227?
> 
> As above, it's not obvious to me that anything is really worse than what we
> had previously (where we had an unbounded sync call blocking on the
> compositor), so backing out doesn't seem like a great choice at this point.

Ugh, I was afraid it wouldn't be that easy...

> We probably don't want to be waiting 500ms per-lock, having that be the
> total accumulated wait for the paint would be much better.
> 
> 500ms may also be unnecessarily large, at least for the potential case where
> the compositor has forgotten to actually release the lock we're waiting on.
> 
> I'm pretty busy with retained-dl stuff at the moment unfortunately, but I
> can try take a look at this soon.

Thanks, I appreciate it.  I will keep an eye on the BHR data for this as we keep getting better data and will let you know if we find out any more information that looks related.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

7 years ago

Waiting to see if we get more data.

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

Whiteboard: [qf:p1][bhr][gfx-noted] → [qf:meta][bhr][gfx-noted]

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Priority: -- → P3

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Florian Quèze [:florian]

Updated

•

4 years ago

Whiteboard: [qf:meta][bhr][gfx-noted] → [qf:meta][bhr:TryReadLock][gfx-noted]

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → ?

Keywords: meta

Whiteboard: [qf:meta][bhr:TryReadLock][gfx-noted] → [bhr:TryReadLock][gfx-noted]

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Summary: TextureClient::TryReadLock() shows up a lot in the 8-second BHR hang data → [meta] TextureClient::TryReadLock() shows up a lot in the 8-second BHR hang data

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: ? → ---

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 5

•

6 months ago

The meta keyword is there, the bug doesn't depend on other bugs and there is no activity for 12 months.
:bhood, maybe it's time to close this bug?

Flags: needinfo?(bhood)

Bob Hood [:bhood]

Comment 6

•

6 months ago

Closing. OP is no longer active, and neither is this meta.

Status: NEW → RESOLVED

Closed: 6 months ago

Flags: needinfo?(bhood)

Resolution: --- → INACTIVE

Bugzilla

Quick Search

[meta] TextureClient::TryReadLock() shows up a lot in the 8-second BHR hang data

Categories

(Core :: Graphics: Layers, defect, P3)

Tracking

()

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: meta, perf, Whiteboard: [bhr:TryReadLock][gfx-noted])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Comment 5

Comment 6