Closed Bug 958011 Opened 11 years ago Closed 11 years ago

Worker object cycle collection is broken

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla29

Tracking Flags:

Tracking

Status

firefox27

---

unaffected

firefox28

---

fixed

firefox29

---

fixed

People

(Reporter: sole, Assigned: khuey)

References

Details

(Keywords: regression)

Attachments

(2 files)

Patch 11 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 13.20 KB, patch	bent.mozilla : review+ mccr8 : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
Aurora patch 11 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 15.11 KB, patch		Details \| Diff \| Splinter Review

Soledad Penades [:sole] [:spenades]

Reporter

Description

•

11 years ago

We've found this issue where launching web workers repeatedly on a page ends up with random timeouts in actually the web worker being launched.

It happens more or less every 20th web worker, and the first frame that "freezes" is also somewhat random. On Firefox release it seems to be consistently the 11th, but on Nightly it is sometimes the first, other times the 2nd, the 11th... quite unpredictable.

I thought it was our Animated_GIF.js library having something wrong but it also happens with gif.js. Here are two demos, where up to 200 gifs will be generated. The text in each square is the time elapsed between generating that one and the previous one-if it takes more than a second it's painted RED because it's slow. You'll see there's a certain pattern with this:

* Using Animated_GIF - we're launching 2 workers per GIF so it freezes every 10th GIF:
http://people.mozilla.org/~spenades/test_cases/workers_every_10th/tests_stress.html

* Using gif.js - since it launches only 1 worker per GIF it freezes every 20th GIF:
http://people.mozilla.org/~spenades/test_cases/workers_every_20th/tests_stress.html

We are also seeing this behaviour in chat.meatspac.es where we've already filed a bug:
https://github.com/meatspaces/meatspace-chat/issues/103

This behaviour is not present in Chrome (it crashes in my stress test, but after many GIFs have been generated without lag/timeouts between them).

I have been looking at the network graph and the profiler but I can't really figure out what it is that is happening :-/

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 1

•

11 years ago

There is a hardcoded limit of 20 workers per origin to prevent DOSing the browser.

http://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#100

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → INVALID

Soledad Penades [:sole] [:spenades]

Reporter

Comment 2

•

11 years ago

So if the workers are used and then go away, why are they not being GC'ed earlier? Because it looks to me that what's happening is it takes too long for them to be "restored into the pool". Sometimes it takes 7 seconds. Sometimes it takes A MINUTE. Sometimes IT NEVER HAPPENS.

The other browsers are fine. So something is wrong here. Reopening ;)

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Jen Fong-Adwent [:ednapiranha]

Comment 3

•

11 years ago

Kyle, this problem doesn't happen in Chrome and it affects sites like https://chat.meatspace.es which is heavily used.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 4

•

11 years ago

Ok, I think there is actually a Gecko bug here related to GCing cycles involving workers.

Status: REOPENED → NEW

Summary: Random timeouts when repeatedly launching Web Workers → Worker object cycle collection is broken

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Updated

•

11 years ago

Blocks: 928312

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 5

•

11 years ago

This library never actually terminates the workers (unless abort is called), so the only way to shut them down is for the GIF object to be GCd.  If the library consumer causes a GIF object to be entrained it will leak a running worker.  That's not the issue here, but it is poor design.

Soledad Penades [:sole] [:spenades]

Reporter

Comment 6

•

11 years ago

Hey Kyle

I added an explicit method to the library so you can ask it to release the workers once you don't want to use the parent object anymore, by calling terminate() on them. This makes the test pass without crashing, so maybe this confirms the GC issue?

Here's the new test: http://sole.github.io/Animated_GIF/tests_stress.html

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 7

•

11 years ago

Web workers shut down in one of 3 ways: when close() is called on the worker thread, when terminate() is closed on the parent thread, or when the Worker object is no longer reachable on the parent thread.  There's a bug in the implementation of the later (having a grey JS wrapper should count as unreachable for these purposes).  But the page should do one of the first two anyways, and not rely on the third, which is just there as a last resort.

Assignee: nobody → khuey

status-firefox27: --- → unaffected

status-firefox28: --- → affected

status-firefox29: --- → affected

Keywords: regression

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 8

•

11 years ago

Attached patch Patch — Details — Splinter Review

Attachment #8368810 - Flags: review?(continuation)

Attachment #8368810 - Flags: review?(bent.mozilla)

Andrew McCreight [:mccr8]

Comment 9

•

11 years ago

Comment on attachment 8368810 [details] [diff] [review]
Patch

Review of attachment 8368810 [details] [diff] [review]:
-----------------------------------------------------------------

Hooray for code removal!

The cycle collector and binding generator changes look good to me.

Attachment #8368810 - Flags: review?(continuation) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 10

•

11 years ago

Comment on attachment 8368810 [details] [diff] [review]
Patch

Review of attachment 8368810 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +3463,5 @@
> +  // down the thread.
> +
> +  if (!tmp->mBusyCount && !tmp->mMainThreadObjectsForgotten) {
> +    NS_CYCLE_COLLECTION_NOTE_EDGE_NAME(cb, "self");
> +    cb.NoteXPCOMChild(tmp);

I'd much prefer this to traverse an actual nsRefPtr<WorkerPrivateParent> called mSelfRef. Then we could set it to something in Constructor() and null it in the WorkerFinishedRunnable. It would at least be more easily searchable than looking for a stray AddRef/Release in here.

Attachment #8368810 - Flags: review?(bent.mozilla) → review+

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 11

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3f052ca2795f

Flags: in-testsuite+

Phil Ringnalda (:philor)

Comment 12

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/3f052ca2795f

Status: NEW → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla29

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Updated

•

11 years ago

status-firefox29: affected → fixed

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 13

•

11 years ago

Comment on attachment 8368810 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 928312
User impact if declined: Leaks - worker objects may not be GCd properly ever, pages may not work if they depend on workers being reclaimed (e.g. sole's demos)
Testing completed (on m-c, etc.): On m-c, has automated test.
Risk to taking this patch (and alternatives if risky): Low risk, well tested.  Backing out 928312 would be far more invasive
String or IDL/UUID changes made by this patch: None

Attachment #8368810 - Flags: approval-mozilla-beta?

bhavana bajaj [:bajaj]

Comment 14

•

11 years ago

Comment on attachment 8368810 [details] [diff] [review]
Patch

Switching the approval to aurora as Fx27 is unaffected here

Attachment #8368810 - Flags: approval-mozilla-beta? → approval-mozilla-aurora?

bhavana bajaj [:bajaj]

Updated

•

11 years ago

Attachment #8368810 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 15

•

11 years ago

This has some non-trivial conflicts with Aurora that I'm not confident trying to resolve.

Flags: needinfo?(khuey)

Keywords: branch-patch-needed

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 16

•

11 years ago

Attached patch Aurora patch — Details — Splinter Review

Flags: needinfo?(khuey)

Ryan VanderMeulen [:RyanVM]

Comment 17

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/afad6b72ed8d

status-firefox28: affected → fixed

Keywords: branch-patch-needed

You need to log in before you can comment on or make changes to this bug.