Closed Bug 1344136 Opened 7 years ago Closed 7 years ago

stylo: Parsing a single property (e.g. for an inline style set) has tons of overhead.

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox54 --- affected

People

(Reporter: bzbarsky, Unassigned)

References

Details

See bug 1344131 comment 2.  Of 7s spent under ServoDeclarationBlock_SetPropertyById at least 3s is spent on malloc, free, and spinlock goop.
Blocks: 1344131
It slightly bothers me that it's not obvious where the allocations are.  Is the return value of parse_one_declaration heap-allocated?  I guess I kind of have to dig into the impl of parse_one_declaration to tell.  But looks like it is (a new vec every time).  I don't know where the free calls inside parse_one_declaration are coming from, offhand...

For what it's worth, in Gecko this codepath (as called from inline style sets for non-shorthands, at least) avoids allocation/deallocation like the plague and does all its work in-place as much as it can.
> I don't know where the free calls inside parse_one_declaration are coming from, offhand

From the return value of Parser::new(input) being deallocated.  Under there it calls cssparser::parser::{{impl}}::new which calls alloc::heap::exchange_malloc.  And then we have to fre it on the way out.

Gecko recycles parsers.
For the allocation from creating the vec of declarations, we can probably use smallvec somehow.
parse_one_declaration only involves a Vec because the one declaration can be a shorthand, which expands to multiple PropertyDeclaration values (which represents either a non-custom longhand or a custom property, with the corresponding value).

We could avoid heap allocation here with a new many-variant enum, declared in Mako:

    enum Bikeshed {
        LonghandOrCustom(PropertyDeclaration),
        % for shorthand in data.shorthands:
            ${shorthand.camel_case}Shorthand(properties::shorthands::${shorthand.ident}::Longhands),
        % endif
    }

This enum would be quite large, but we only need it transiently during parsing, and hopefully we can rely on return value optimization so that we don’t copy/move it around.

… Having typed the above, I realize that that Vec is returned by parse_one_declaration. So any allocation that is freed *from* parse_one_declaration is not it.
The allocation freed from parse_one_declaration is the tokenizer.  I thought I'd commented in this bug when we figured it out...
bz, is there still more to do here after comment 3?
Flags: needinfo?(bzbarsky)
Priority: -- → P2
If I exclude the stuff from bug 1346974 from a tip profile, I still see the following suspicious bits:

1) malloc/free under glue::set_property but outside declaration_block::parse_one_declaration.  I haven't checked.  This is at least 13% of the time under Servo_DeclarationBlock_SetPropertyById.

2) properties::ParsedDeclaration::expand: about 12% of the time.  Not sure whether this is avoidable here.

3) 11% under glue::set_property doing Arc::drop_slow or something like that ("_$LT$alloc..arc..Arc$LT$T$GT$$GT$::drop_slow::h70e3e43ece7924ff" is the function name the profiler shows), which leads to more free().  Maybe this is part of the bug 1346974 bits?  I'm not sure.

4) 5% memmove under glue::set_property.

5) URI/principal addreffing under glue::set_property: about 3%.  I can hope bug 1346974 will help.

6) malloc_free under declaration_block::parse_one_declaration but outside ParsedDeclaration::parse: 13%.

So I don't know what all we fixed up above, but we still have tons of malloc traffic there.  Even with all the bits from bug 1346974 removed I think we're no faster than we were when I filed this bug, if I ran the numbers right.  :(
Flags: needinfo?(bzbarsky)
Emilio thinks that a lot of the malloc overhead here may be related to Boxed CSSErrorReporters, which he fixed in [1] (landing now). So probably worth doing a quick remeasure tomorrow once that gets merged into you tree.

[1] https://github.com/servo/servo/pull/15931
I can just pull autoland and build, too.  ;)

With Emilio's patch there, we're a bit faster.  Looking at the items from my comment 8:

1)  Still there.
2)  Still there.
3)  Seems to be gone.
4)  Still there.
5)  Still there.
6)  Still there.

So it helps, but there's more malloc/free: items 1 and 6 on the list.
Just fwiw, I just reprofiled it with xidorn's URI patches.  It's better, but we still have malloc/free/zone_free_definite_size called from declaration_block::parse_one_declaration.  Those are now about 25% of the time under Servo_DeclarationBlock_SetPropertyById.

xidorn and I just poked at this for a bit, and looks like https://github.com/servo/rust-cssparser/pull/124 is not in the cssparser in our tree.  Presumably we need to update it.  Oh, and https://github.com/servo/servo/pull/15830 did land, but that doesn't affect stylo, looks like, because that uses the vendored cssparser in the m-c tree.
>  but that doesn't affect stylo, looks like, because that uses the vendored cssparser in the m-c tree.

To clarify a bit, changes to Servo's Cargo.lock will not be reflected (unless accompanied with a lockfile change on the m-c side), but changes to a Cargo.toml in any Servo component used by stylo will be. I'll put up a patch for updating this.
Alright.  So now the CSS parser is updated, and things are a bit better.  At this point stylo is only 25% slower than gecko over here, if I'm measuring right.

I'm not seeing a ton of other systematically silly low-hanging fruit; marking this fixed as filed to avoid scope creep.  I will follow up in bug 1354566 on the remaining bits that are slow-ish.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.