From a conversation with Niko: 18:49 <nmatsakis> the advantages of par iters for this case would be: 18:49 <nmatsakis> (1) stack allocation 18:49 <nmatsakis> (2) automatic chunking 18:49 <nmatsakis> (we make some efforts to reduce overheads here) 18:49 <nmatsakis> given that the bugzilla bug in questio 18:49 <nmatsakis> is talking about the interaction w/ jemalloc 18:49 <nmatsakis> avoiding a malloc might be good 18:49 <nmatsakis> the only downside I can see is that it DOES mean that the depth of your stack 18:49 <nmatsakis> will be dependent on the depth of the input tree 18:49 <nmatsakis> we can allocate some big stacks, but that's a hazard 18:50 <nmatsakis> anyway, it seems like it'd be worth experimenting 18:51 <nmatsakis> I imagine that the automatic chunking etc *might* help to improve perf 18:52 <nmatsakis> ah 18:52 <nmatsakis> I forgot to mention 18:52 <nmatsakis> another advantage 18:52 <nmatsakis> using a scope like this 18:52 <nmatsakis> at least currently 18:52 <nmatsakis> means that there is a single global counter of "outstanding tasks" 18:52 <nmatsakis> if you use nested iterators, each will have its own counter, 18:52 <nmatsakis> which might mean less contention amongst threads 18:52 <nmatsakis> trying to inc/dec these counters 18:52 <nmatsakis> (Actually, nested iterators don't have any counters) Seems like it'd be nice to try. We could also get rid of the sequential and parallel paths completely (just with one function abstracting over parallel/sequential iterators) in the case it works out nicely. Bobby, do you have time to try this as part of your perf work? Seems it shouldn't be really hard and could be well worth it. I can also try it if I find time.
Seems worth playing with once we get to the point where we're optimizing the parallel traversal. I think we need to sort out bug 1291355 first though.
As discussed with Niko last week, we can't do this, because it forces all the parallelism to happen in tail calls, which means that we can't eagerly dispatch the children of already-processed children until we've styled the entire work unit.
More precisely, if we were to use `join`, it would require unbounded stack growth to achieve the same effect, which is probably not desired.