Closed Bug 1540080 Opened 1 year ago Closed 1 year ago

JetStream 2 bomb-workers test much slower in Firefox than Chrome and Safari

Categories

(Core :: DOM: Workers, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: jandem, Assigned: baku)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [qf:p2:responsiveness])

Attachments

(2 files)

Scores for this test are:

Firefox:  9.124  (First: 56.818, Worst:  0.985, Average: 13.574)
Chrome:  41.740  (First: 35.211, Worst: 43.764, Average: 47.192)
Safari:  80.033  (First: 73.529, Worst: 75.472, Average: 92.376)

It's interesting that our "First" number is really good, but not the rest. Do we have worker limits in place?

See URL for stand-alone test.

If I'm reading the code right, we do 80 iterations and there are ~26 workers per iteration. Maybe we have problems reusing workers?

baku, any thoughts?

Flags: needinfo?(amarchesini)

Open the testcase, watch the console logs. The first 26 workers finish fast, then the last one takes a while.

If "count" is set to 55, then the first 26 finish fast, then we wait a while, then 27-52 all finish fast, then we wait a while, then 53-55 finish fast.

I have not yet located where the magic number "26" appears in the worker code. Maybe it's hardware-dependent?

More info on that testcase: that only happens if I have already tried to run the full suite multiple times before (possibly from the same hostname, unclear). So something complicated is going on.

So at least on Mac, one interesting thing: we end up getting to ~700-800 threads when running this testcase. Chrome and Savfair don't go above 40.

So one thing that might be going on is that we don't have workers "give up" their threads quickly enough to reuse them for the next test, and then we end up constantly creating and joining threads...

So I tried to trace through what close() does on a worker. Looks like it bounces back and forth through a bunch of runnables.

In particular, I tried logging timestamps for various parts of the closing-down process, then looking at the gaps between them. As far as I can tell, it takes about 12ms for the very first close() call that happens to land in WorkerThreadPrimaryRunnable::FinishedRunnable::Run, with several bounces back and forth between the worker and parent thread. I suspect we may just end up not releasing worker threads fast enough, creating new ones, then possibly swamping the main thread event loop with all the close/cancel messages...

Marking [qf] because I could see this destroy perf on (maybe poorly written, but still) websites using workers.

Whiteboard: [qf]

I'll work on this bug. I think I know what the problem is:

When self.close() is called (with self == one of the worker globals), if we don't have a sync event loop, we cleanup the event queue and we dispatch a canceling runnable in order to start the shutdown at the end of the current JS execution.

If we have a sync event loop, we don't do that, and this means that the worker is dismissed only when the 'parent' object is cced/gced.

Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)

That patch seems to fix the main problem, yes. With it, our performance is in about the same ballpark as Chrome; both are much worse than Safari.

Priority: -- → P3
Whiteboard: [qf] → [qf:p2:responsiveness]
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b4a7aeb63c65
Execute the canceling runnable after self.close() even when we have sync event loops, r=asuth
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
You need to log in before you can comment on or make changes to this bug.