Closed Bug 1428285 Opened 6 years ago Closed 6 years ago

stylo: PropertyDeclaration::id() shows up in profile

Categories

(Core :: CSS Parsing and Computation, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox59 --- affected

People

(Reporter: xidorn, Unassigned)

References

Details

(Keywords: perf)

In a profile for tpaint, I see PropertyDeclaration::id() shows up in it. It takes 3ms (30 samples) which consists 0.8% of the measured range. The profile I'm looking at is https://perfht.ml/2CsDCaI

It's a surprising one, and thus I checked the assembly of that method, it is something like:
> 00007FF816417202  movsxd      rax,dword ptr [rdx+rax*4]  
> 00007FF816417206  add         rax,rdx  
> 00007FF816417209  jmp         rax  
> 00007FF81641720B  mov         ax,1  
> 00007FF81641720F  jmp         style::properties::PropertyDeclaration::id+0B9Bh (07FF816417D3Bh)  
> 00007FF816417214  xor         eax,eax  
> 00007FF816417216  jmp         style::properties::PropertyDeclaration::id+0B9Bh (07FF816417D3Bh)  
> 00007FF81641721B  mov         ax,2  
> 00007FF81641721F  jmp         style::properties::PropertyDeclaration::id+0B9Bh (07FF816417D3Bh)  
> 00007FF816417224  mov         ax,3  
> 00007FF816417228  jmp         style::properties::PropertyDeclaration::id+0B9Bh (07FF816417D3Bh)  
> 00007FF81641722D  mov         ax,4  
> 00007FF816417231  jmp         style::properties::PropertyDeclaration::id+0B9Bh (07FF816417D3Bh)  
> 00007FF816417236  mov         ax,5
> ...

each property takes 9 bytes, and we have 326 properties, so this branch table would take ~2.9KB (actually the last few branches take only 6 bytes, and the total size is ~2.8KB), which doesn't look very friendly to the cache for this kind of hot function.

A branch table is probably the best thing the compiler can optimize it to at the moment, but it isn't the best form for this kind of function. Since PropertyDeclaration and LonghandId are both use the same list to generate, they should actually have matched tag value. Ideally, the compiler should recognize this pattern and optimize it to a single conditional jump and a single move (without the whole table). That should behave much better on a modern CPU than this branch table.

(I'm not familiar with compiler implementation, but based on what I learned about Rust before, I suppose this can probably be done with a optimization pass in MIR?)
Before the compiler becomes able to recognize this pattern and apply the optimization, I guess we can probably use mem::discriminant and mem::transmute to simulate that, and use test to catch compiler change which would break the invariant we need.
One difficulty is that this enum has not only on variant per property, but also a few like Custom(…) that behave differently.

I did rewrite PropertyDeclaration::id to use two separate `match` expressions, only one with many "branches" and make it as "uniform" as possible. I did see it compile to a lookup table (indexing into a compact array of static data) rather than a jump table at some point, maybe some compiler change regressed this particular case.

Maybe another approach could be to use https://github.com/rust-lang/rfcs/pull/2195 (when it’s accepted/implemented) to manipulate the discriminant directly. Transmuting the result of mem::discriminant is not OK, it is made opaque specifically because we cannot make assumptions about its exact value.
(In reply to Simon Sapin (:SimonSapin) from comment #2)
> One difficulty is that this enum has not only on variant per property, but
> also a few like Custom(…) that behave differently.

Ideally compiler can detect the common pattern and use a single if for those pattern and handle the special cases separately... that's probably hard to implement perfectly, though.

> I did rewrite PropertyDeclaration::id to use two separate `match`
> expressions, only one with many "branches" and make it as "uniform" as
> possible. I did see it compile to a lookup table (indexing into a compact
> array of static data) rather than a jump table at some point, maybe some
> compiler change regressed this particular case.

Look up table would take 600+ bytes, which still not that good, but should be more or less better than a jump table, I guess.

> Maybe another approach could be to use
> https://github.com/rust-lang/rfcs/pull/2195 (when it’s accepted/implemented)
> to manipulate the discriminant directly. Transmuting the result of
> mem::discriminant is not OK, it is made opaque specifically because we
> cannot make assumptions about its exact value.

It would be great to have this. Can we push Rust team for having that?

I know it is made opaque intentionally, but as far as we have a test to check everything works fine... I guess we can rely on it until the compiler changes... Ah, well, probably you wouldn't be happy because that would potentially make it easier for Rust changes to break Servo build...
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #3)
> (In reply to Simon Sapin (:SimonSapin) from comment #2)
> > One difficulty is that this enum has not only on variant per property, but
> > also a few like Custom(…) that behave differently.
> 
> Ideally compiler can detect the common pattern and use a single if for those
> pattern and handle the special cases separately... that's probably hard to
> implement perfectly, though.

By this, I mean the compiler can check that if there is a pattern of
> match xxx {
>   1 => 1 + A,
>   2 => 2 + A,
>   3 => 3 + A,
>   ...
>   100 => 100 + A,
>   101 => { /* something different */ }
> }
where A is a constant, than optimize it to something like
> match xxx {
>   n if n <= 100 => n + A,
>   101 => { /* something different */ }
> }
Keywords: perf
Priority: -- → P3
It seems nox is fixing this in servo/servo#19956.
Indeed, my patch more or less makes PropertyDeclaration::id a noop.
The PR landed.
This seems to have improved perf! See the PR in comment 5.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.