Open Bug 1376063 Opened 7 years ago Updated 2 years ago

nsDisplayList::SortByZOrder() is slow

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

Performance Impact low
Tracking Status
firefox57 --- fix-optional

People

(Reporter: MatsPalmgren_bugz, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

It shows up in the Facebook use case in bug 1375346.
Blocks: 1371668
Here's the profile from that bug, with this function's stack highlighted:
https://perf-html.io/public/41deec02b0a56b16199fd78bb07797ccfb14af9d/calltree/?hiddenThreads=&range=10.9997_13.1069&search=sortbyzorder&thread=3&threadOrder=0-2-3-1

It looks like there are two samples in SortByZOrder.

 * The first is in ZIndexForFrame() calling into StyleDisplay().  This has to be for the "IsAbsPosContainingBlock()" call.  I suspect we could optimize this away by using a frame state bit (or a few bits in combination) to implement the style-dependent parts of IsAbsPosContainingBlock()/IsFixedPosContainingBlock() ...



 * The second is in nsTArray::EnsureCapacity.  This has to be because nsDisplayList::Sort uses a local nsTArray to implement its sort algorithm, and it appends elements one at at time.  This is supposed to be reasonably efficient (IIRC nsTArray uses a doubling algorithm or something), but it's possible we could optimize it by calling nsDisplayList::Count() and allocating exactly that much space up-front in our nsTArray.  (This isn't guaranteed to be a win, because Count() takes linear time and *might* have imperfect memory locality, and the nsTArray append-one-by-one doubling algorithm also should have amortized linear time...)
Whiteboard: [qf] → [qf:p3]
(In reply to Daniel Holbert [:dholbert] from comment #1)
>  * The second is in nsTArray::EnsureCapacity.  This has to be because
> nsDisplayList::Sort uses a local nsTArray to implement its sort algorithm,
> and it appends elements one at at time.  This is supposed to be reasonably
> efficient (IIRC nsTArray uses a doubling algorithm or something), but it's
> possible we could optimize it by calling nsDisplayList::Count() and
> allocating exactly that much space up-front in our nsTArray.  (This isn't
> guaranteed to be a win, because Count() takes linear time and *might* have
> imperfect memory locality, and the nsTArray append-one-by-one doubling
> algorithm also should have amortized linear time...)

I added some logging to see how many elements we usually get there.  Many calls get 0.  many more get under 10.  Then there are a few outliers than can get more than a hundred.  Based on this measurement I'm inclined to add a simple preallocated buffer of 20 or so, which should be pretty small both for nsDisplayItem* and ZSortItem.
Depends on: 1385182
Priority: P2 → P3
Keywords: perf
Performance Impact: --- → P3
Whiteboard: [qf:p3]

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: MatsPalmgren_bugz → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.