Waiting for a parallel task attributes time spent to that task

RESOLVED INVALID

Status

()

RESOLVED INVALID
a year ago
a year ago

People

(Reporter: jonco, Assigned: jonco)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

a year ago
It turns out that we record the time spent joining on parallel threads in the phase for the task we're waiting on.  

This has the unfortunate consequence that when we join on several tasks one after the other we get a really skewed view of what's happening, with each task having its time minus the time we already spent waiting for previous tasks (or zero) attributed to it.

I think we should not do this and attribute all wait time to the current (parent) phase, or have a special phase for waiting.
(Assignee)

Comment 1

a year ago
Created attachment 8865479 [details] [diff] [review]
bug1362965-join-time

Patch to not attribute join time to the phase in question.

It would also be great to start record the time spent off thread again so we can see which of these tasks is taking too long.  Not sure how we would do this though - maybe a parallel stats tree for off-thread times?
Assignee: nobody → jcoppeard
Attachment #8865479 - Flags: review?(sphink)
Comment on attachment 8865479 [details] [diff] [review]
bug1362965-join-time

Review of attachment 8865479 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ -5071,5 @@
>  
>  void
>  GCRuntime::joinTask(GCParallelTask& task, gcstats::Phase phase, AutoLockHelperThreadState& locked)
>  {
> -    gcstats::AutoPhase ap(stats(), task, phase);

Ugh.

So I agree that the current state is bad, in that it doesn't blame enough of the time on a slow-to-finish task.

But it seems like this change makes it worse. Now, if you can split something out into a parallel task, then it appears to be free (from the perspective of that task's phase.) So one of the parallel tasks ended up being really slow, then all we would know is that its parent is slow (eg PHASE_SWEEP_COMPARTMENTS).

Take for example a situation with 2 parallel tasks each lasting 1 second. Currently, all of the time gets blamed on the first task to be joined, and none of the time goes to the second task. With your change, neither task gets blamed for anything. In both cases, the parent task correctly has the proper total time. In the current state, the parent task also has the correct self time; with your change, its self time would be inflated. In that scenario, I think I prefer the current state.

Another example: say you have 2 tasks, one takes 1 second and one we just messed up something badly and it's taking 2 seconds. Will the stats output point us to the right thing? In the current state, if we happen to join the slower task first, the answer will be yes. If we join the faster task first, then both would be blamed for 1 second and the answer is no (though if the slow task took even longer, it would eventually show up). With your change, there's no information in any case, other than that the parent task is taking a really long time for some unknown reason.

Am I misunderstanding the situation? Because to me, it seems like we currently have some partial information that doesn't mean quite what you think it should, due to boiling down parallelism to single numbers. But with your change, we lose that partial information (but gain in simplicity).

If we wanted to improve the situation, the most accurate way to do this would be to record task.duration() separately. That would be nice, though we'd still have to worry about presenting things accurately.

Hm, I guess that would look like: gather and report exactly what we're reporting now, but also report task durations and "wait time" (which should be pretty close to max(task.duration)). Then parents' self times would be their total time minus nonparallel children's total times minus time waiting to join parallel children.

But whenever I think about this, I tend to think that the only thing that really matters is the main thread times, and ten parallel tasks that all finish within one second really is the same to us as 1 second of time on the main thread; we don't really care too much about the total work being done.

As a further thought, say you have fewer cores than parallel children, and end up running the entirety of your two slowest tasks on the same core. Then the wait time will be much more than max(task.duration). In terms of performance, only the wait time matters. In terms of figuring out what to optimize, task.duration is most useful.

Anyway, I think where I'm ending up is that unless I'm missing something, this patch is making things worse not better. And it would be best to record both wait time and task.duration. How to display those and record them in telemetry, I'm not entirely sure.
Attachment #8865479 - Flags: review?(sphink) → review-
(Assignee)

Comment 3

a year ago
(In reply to Steve Fink [:sfink] [:s:] from comment #2)
> Am I misunderstanding the situation? Because to me, it seems like we
> currently have some partial information that doesn't mean quite what you
> think it should, due to boiling down parallelism to single numbers. But with
> your change, we lose that partial information (but gain in simplicity).

No, everything you say is true.  My feeling was that inaccurate and misleading information is worse than no information here.  I guess this patch makes it misleading in a different way though.

How about I add a separate phase for waiting on these tasks?

> And it would be best to record
> both wait time and task.duration. How to display those and record them in
> telemetry, I'm not entirely sure.

Yeah.  I'll think more about this.
(In reply to Jon Coppeard (:jonco) from comment #3)
> (In reply to Steve Fink [:sfink] [:s:] from comment #2)
> > Am I misunderstanding the situation? Because to me, it seems like we
> > currently have some partial information that doesn't mean quite what you
> > think it should, due to boiling down parallelism to single numbers. But with
> > your change, we lose that partial information (but gain in simplicity).
> 
> No, everything you say is true.  My feeling was that inaccurate and
> misleading information is worse than no information here.  I guess this
> patch makes it misleading in a different way though.

Yes, ordinarily I would agree with avoiding misleading information. It's just that it seems important if some parallel task is a big problem, that we have some way to figure out which one it is.

> How about I add a separate phase for waiting on these tasks?

Do you mean replacing all existing phases when you're waiting to join them with a new PHASE_WAIT or something? That would seem to have the same drawbacks as your original patch. Or do you mean to add in an additional layer around the set of joins? I guess that would make it more obvious where the weirdness in the hierarchy lives. That seems good. (I guess another way would be to use different phases with _PARALLEL_ in their names when you wait to join them.)
(Assignee)

Comment 5

a year ago
This will be fixed by bug 1309651.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.