stylo: Experiment with rayon parallel iterators instead of scope().




CSS Parsing and Computation
3 months ago
2 months ago


(Reporter: emilio, Assigned: bholley)


(Blocks: 3 bugs)

Firefox Tracking Flags

(Not tracked)




3 months ago
From a conversation with Niko:

18:49 <nmatsakis> the advantages of par iters for this case would be:
18:49 <nmatsakis> (1) stack allocation
18:49 <nmatsakis> (2) automatic chunking
18:49 <nmatsakis> (we make some efforts to reduce overheads here)
18:49 <nmatsakis> given that the bugzilla bug in questio
18:49 <nmatsakis> is talking about the interaction w/ jemalloc
18:49 <nmatsakis> avoiding a malloc might be good
18:49 <nmatsakis> the only downside I can see is that it DOES mean that the depth of your stack
18:49 <nmatsakis> will be dependent on the depth of the input tree
18:49 <nmatsakis> we can allocate some big stacks, but that's a hazard
18:50 <nmatsakis> anyway, it seems like it'd be worth experimenting
18:51 <nmatsakis> I imagine that the automatic chunking etc *might* help to improve perf
18:52 <nmatsakis> ah
18:52 <nmatsakis> I forgot to mention
18:52 <nmatsakis> another advantage
18:52 <nmatsakis> using a scope like this
18:52 <nmatsakis> at least currently
18:52 <nmatsakis> means that there is a single global counter of "outstanding tasks"
18:52 <nmatsakis> if you use nested iterators, each will have its own counter,
18:52 <nmatsakis> which might mean less contention amongst threads
18:52 <nmatsakis> trying to inc/dec these counters
18:52 <nmatsakis> (Actually, nested iterators don't have any counters)

Seems like it'd be nice to try. We could also get rid of the sequential and parallel paths completely (just with one function abstracting over parallel/sequential iterators) in the case it works out nicely.

Bobby, do you have time to try this as part of your perf work?

Seems it shouldn't be really hard and could be well worth it. I can also try it if I find time.


3 months ago
Flags: needinfo?(bobbyholley)
Seems worth playing with once we get to the point where we're optimizing the parallel traversal. I think we need to sort out bug 1291355 first though.
Flags: needinfo?(bobbyholley)
This may help bug 1365682.
Assignee: nobody → bobbyholley
Blocks: 1365682
Priority: -- → P1
As discussed with Niko last week, we can't do this, because it forces all the parallelism to happen in tail calls, which means that we can't eagerly dispatch the children of already-processed children until we've styled the entire work unit.
Last Resolved: 2 months ago
Resolution: --- → WONTFIX
More precisely, if we were to use `join`, it would require unbounded stack growth to achieve the same effect, which is probably not desired.
You need to log in before you can comment on or make changes to this bug.