966646 - Use JS helper threads for GC background sweeping / allocation

Assignee

Description

•

11 years ago

Attached patch patch (obsolete) — Details — Splinter Review

This eliminates the last thread creation done other than for the JS helper thread pool, which is now per process. Doing this will allow DOM workers to share the use of these helper threads without causing any additional thread creation.

Attachment #8369092 - Flags: review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 1

•

11 years ago

Comment on attachment 8369092 [details] [diff] [review] patch Review of attachment 8369092 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this looks nice. I'd just like to see a few changes. ::: js/src/jsgc.cpp @@ +1433,5 @@ > + if (rt->gcChunkPool.wantBackgroundAllocation(rt)) { > + maybeLock.unlock(); > + AutoLockWorkerThreadState lock; > + maybeLock.lock(rt); > + rt->gcHelperState.startBackgroundAllocationIfIdle(); I think we can improve this since we don't need to start the background allocation right away. Can you please make an auto class that gets pushed onto the stack when we enter this function (before the GC lock is taken)? When we decide we want background allocation, we would set a flag on the class. Then the destructor for the class would check the flag and, if set, take the locks in the correct order and call startBackgroundAllocationIfIdle(). @@ +2481,1 @@ > for (;;) { Can you please try simplifying this code? I don't think the loop is needed anymore. Also, we should be able to assert that the state is not IDLE at the beginning. And I think it would be a lot clearer to unconditionally set it to IDLE at the end, while also notifying on |done|. As far as I can tell, the only way the state can change when it's not set to IDLE is if we set it to CANCEL_ALLOCATING. So perhaps we could also assert that the state is unchanged at the end of each case (except for allowing CANCEL_ALLOCATING in the ALLOCATING case). ::: js/src/vm/Runtime.h @@ +1866,5 @@ > runtime = rt; > rt->lockGC(); > } > > + void unlock() { Should be able to remove this.

Attachment #8369092 - Flags: review?(wmccloskey)

Brian Hackett [Laid off!]

Assignee

Comment 2

•

11 years ago

Attached patch updated — Details — Splinter Review

Make the requested simplifications. This also makes triggering a background GC thread infallible, which required some other modifications, mainly putting the |state| and |thread| members of GCHelperState under the worker thread state lock.

Assignee: nobody → bhackett1024

Attachment #8369092 - Attachment is obsolete: true

Attachment #8372492 - Flags: review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 3

•

11 years ago

Comment on attachment 8372492 [details] [diff] [review] updated Review of attachment 8372492 [details] [diff] [review]: ----------------------------------------------------------------- Look great, thanks. Please remember to make sure this compiles with !JS_THREADSAFE.

Attachment #8372492 - Flags: review?(wmccloskey) → review+

Chris Peterson [:cpeterson]

Comment 4

•

11 years ago

Brian: is your JS helper thread patch ready to land? Bill r+'d it two months ago. :)

Flags: needinfo?(bhackett1024)

OS: Linux → All

Hardware: x86_64 → All

Brian Hackett [Laid off!]

Assignee

Comment 5

•

11 years ago

I've tried quite a bit actually to get this into landable shape but the only problem is these persistent timeouts that only happen on b2g ICS emulator opt (but not debug) and which I don't really have any way I know of to debug. Which is pretty demoralizing and I don't have an ETA for landing this.

Flags: needinfo?(bhackett1024)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 6

•

11 years ago

Gregor, can you recommend anyone on the B2G side that could investigate here?

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 7

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #5) > I've tried quite a bit actually to get this into landable shape but the only > problem is these persistent timeouts that only happen on b2g ICS emulator > opt (but not debug) and which I don't really have any way I know of to > debug. Which is pretty demoralizing and I don't have an ETA for landing > this. Do you have more info here? Is it always the same test? Is it across all mochitest suites?

Flags: needinfo?(anygregor)

Brian Hackett [Laid off!]

Assignee

Comment 8

•

11 years ago

(In reply to Gregor Wagner [:gwagner] from comment #7) > (In reply to Brian Hackett (:bhackett) from comment #5) > > I've tried quite a bit actually to get this into landable shape but the only > > problem is these persistent timeouts that only happen on b2g ICS emulator > > opt (but not debug) and which I don't really have any way I know of to > > debug. Which is pretty demoralizing and I don't have an ETA for landing > > this. > > Do you have more info here? Is it always the same test? Is it across all > mochitest suites? The timeouts/failures happen on all mochitest suites and marionette-webapi tests, everything else on all platforms works fine. Here's the last try run I did showing the problem: https://tbpl.mozilla.org/?tree=Try&rev=229eb8495595

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 9

•

11 years ago

Uh the parent process keeps crashing. That basically indicates that the parent is restarting: 13:05:44 INFO - 03-13 19:54:58.487 33 33 I ServiceManager: service 'media.resource_manager' died Debug build is fine? Maybe some horrible race condition?

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 10

•

11 years ago

Do you have a new version of the patch? I see merge conflicts. I want to push it to the pine tree where we have improved crash stack reporting. Maybe we can find a stack for the crash.

Brian Hackett [Laid off!]

Assignee

Comment 11

•

11 years ago

Here is a try run for a rebased version of the patch, which still has the same issues: https://tbpl.mozilla.org/?tree=Try&rev=ae189cf76bb7

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 12

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #11) > Here is a try run for a rebased version of the patch, which still has the > same issues: > > https://tbpl.mozilla.org/?tree=Try&rev=ae189cf76bb7 I pushed to pine and it seems like the opt builds have problems with workers. https://tbpl.mozilla.org/?tree=Pine&showall=1&rev=21233bb8beaa

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 13

•

11 years ago

Meh seems like the worker bug is a regression from the merge :(

Gregor Wagner [:gwagner]

Comment 14

•

11 years ago

The device doesn't start with this patch as well but I couldn't get a stack yet. Should be the same problem as we see on the emulator.

Andrew McCreight (out until 2026) [:mccr8]

Comment 15

•

11 years ago

Brian, I'd try this again after the patch in bug 1006695 lands. Apparently Nuwa is unaware of JS helper threads, but was aware of the GCHelperThread, and I could imagine could cause some severe issues for this patch.

Andrew McCreight (out until 2026) [:mccr8]

Comment 16

•

11 years ago

Attached patch Use JS helper threads for GC background sweeping / allocation (rebased) — Details — Splinter Review

On the theory that bug 1006695 will fix the mysterious emulator problem, I've rebased the patch and pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=229eb8495595 There have been a few GC refactorings I had to rebase over, so who knows how well my patch works, but it at least compiles and runs a little locally.

Andrew McCreight (out until 2026) [:mccr8]

Comment 17

•

11 years ago

For good measure, here's an L64 debug run to make sure I didn't break everything: https://tbpl.mozilla.org/?tree=Try&rev=5544f49c29d7

Brian Hackett [Laid off!]

Assignee

Comment 18

•

11 years ago

Ack, I just did the exact same thing. Try competition! https://tbpl.mozilla.org/?tree=Try&rev=81ed8411b2dc

Alon Zakai (:azakai)

Updated

•

11 years ago

Blocks: 1014183

Alon Zakai (:azakai)

Updated

•

11 years ago

No longer blocks: 1014183

Brian Hackett [Laid off!]

Assignee

Comment 19

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9c4c4356afce

Benjamin Bouvier [:bbouvier] (inactive)

Comment 20

•

11 years ago

This introduced a build warning in both gcc and clang: /js/src/jsapi-tests/../jsgc.h:1027:17: warning: inline function ‘void js::GCHelperState::startBackgroundAllocationIfIdle()’ used but never defined [enabled by default] The function is used in GCRuntime.h, that includes jsgc.h (which doesn't contain the body). Not sure what the best fix is, so letting Brian decide: delete the inline keyword; move the definition to jsgc.h (but then, JSRuntime* is incompletely declared and used => error); or any other nice solution.

Flags: needinfo?(bhackett1024)

Carsten Book [:Tomcat]

Comment 21

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/9c4c4356afce

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla32

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 22

•

11 years ago

Backed this out in https://hg.mozilla.org/mozilla-central/rev/2619a4def1b9 for being the likely cause of bug 1014458

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Updated

•

11 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Brian Hackett [Laid off!]

Assignee

Comment 23

•

11 years ago

This push should fix the interrupt lock assertion (which can be a deadlock in release mode), we just check whether the current thread owns the interrupt lock when starting background allocation (startBackgroundAllocationIfIdle is, not surprisingly, already a best-effort sort of thing). This should also fix the warning in comment 20 by removing the inline keyword. https://hg.mozilla.org/integration/mozilla-inbound/rev/c3780924e7b4

Flags: needinfo?(bhackett1024)

Ryan VanderMeulen [:RyanVM]

Comment 24

•

11 years ago

Note that we also suspect that this led to a large increase in B2G emulator test timeouts (like 2-3 per push). In case it matters, the emulator is a) slow and b) single core. CCing the other sheriffs to keep an eye on whether they spike again post-comment 23.

Brian Hackett [Laid off!]

Assignee

Comment 25

•

11 years ago

Did anything happen to performance numbers in non-emulator builds?

Ryan VanderMeulen [:RyanVM]

Comment 26

•

11 years ago

AFAIK, eideticker is the perf monitoring we do on B2G. wlach?

William Lachance (:wlach)

Comment 27

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26) > AFAIK, eideticker is the perf monitoring we do on B2G. wlach? So eideticker is only part of the picture; we run many more tests and more frequently through b2gperf and 'make test-perf' which upload to http://datazilla.org/b2g. I'm not seeing any difference in the startup numbers reported by either that are attributable to this change, but of course there's more to performance than just that. I'll CC the fxos-perf team as a head's up.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 28

•

11 years ago

Backed out again in https://hg.mozilla.org/integration/mozilla-inbound/rev/5c8fac4197de for making emulator tests time out: https://tbpl.mozilla.org/php/getParsedLog.php?id=40238259&tree=Mozilla-Inbound

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Depends on: 1014458

Brian Hackett [Laid off!]

Assignee

Comment 29

•

11 years ago

OK, I give up. This brings this bug back to square one, aka comment 5. There really isn't anything in this patch which should cause tests to time out more often. So an increase in the number of timeouts could be a problem with the patch, it could be a problem with the testing infrastructure, it could be a problem with how JS interacts with b2g, like the Nuwa thing that held this bug up for three months. But without any information about how or why the tests are failing (how is it this hard to get a stack?) and apparently without anyone on the b2g side able to invest the time to figure out what could be wrong, the prospects of anyone figuring this out seem poor.

Assignee: bhackett1024 → nobody

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → WONTFIX

Andrew Halberstadt [:ahal]

Updated

•

11 years ago

Depends on: 1015178

Andrew Halberstadt [:ahal]

Comment 30

•

11 years ago

We can fix the no stack problem pretty easily. I filed bug 1015178 for it.

Jan de Mooij [:jandem]

Comment 31

•

11 years ago

Nicolas, Gregor, can we get some help with the b2g timeouts? This is a pretty important bug and it's completely blocked on b2g.

Flags: needinfo?(nicolas.b.pierron)

Flags: needinfo?(anygregor)

Andrew Halberstadt [:ahal]

Comment 32

•

11 years ago

From https://bugzilla.mozilla.org/show_bug.cgi?id=1015200#c0 it looks like this was wrongfully backed out. Ed, can this be re-landed as is?

Status: RESOLVED → REOPENED

Flags: needinfo?(emorley)

Resolution: WONTFIX → ---

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 34

•

11 years ago

jsfunfuzz found a hard-to-reduce (but easy to reproduce) testcase with the patch in comment 21, so please do not reland this until Brian has taken a look. Thanks!

Flags: needinfo?(emorley) → needinfo?(bhackett1024)

Andrew McCreight (out until 2026) [:mccr8]

Comment 35

•

11 years ago

Gary, that's the same assertion as in bug 1014458. Can you reproduce that in the most recent landing of this, revision c3780924e7b4, which was intended to fix that assertion?

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 36

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #35) > Gary, that's the same assertion as in bug 1014458. Can you reproduce that > in the most recent landing of this, revision c3780924e7b4, which was > intended to fix that assertion? That assertion was fixed by the backout: https://bugzilla.mozilla.org/show_bug.cgi?id=1014458#c28 And yes, it went away when the backout landed, so I figure that the patch is the cause of the issue.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 37

•

11 years ago

See comment 23, Gary.

Andrew McCreight (out until 2026) [:mccr8]

Comment 38

•

11 years ago

It went away when for the initial backout (revision 2619a4def1b9), but didn't come back when bhackett relanded the patch (revision c3780924e7b4), so that may fix your test case, too.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Comment 39

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #37) > See comment 23, Gary. Wasn't comment 23 backed out in comment 28? Anyway, I'm a little confused due to the back and forth landings/backouts here. I guess we can recheck the testcase when all the required patches land again.

Brian Hackett [Laid off!]

Assignee

Updated

•

11 years ago

Flags: needinfo?(bhackett1024) → needinfo?(emorley)

Brian Hackett [Laid off!]

Assignee

Comment 40

•

11 years ago

(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #39) > Wasn't comment 23 backed out in comment 28? Yes, but this assertion should be fixed in the patch landed in c3780924e7b4, which was indeed backed out in comment 28. If the patch in c3780924e7b4 ever relands then the assertion should not come up.

Florian Bender

Comment 41

•

11 years ago

Are the patches attached to this bug up-to-date (seems like the code from the last push isn't attached here). Are you still working on this or should :mccr8 take over?

Andrew McCreight (out until 2026) [:mccr8]

Comment 42

•

11 years ago

In comment 29, bhackett said he gives up and unassigned himself, which I think implies he is not working on it. ;)

Florian Bender

Comment 43

•

11 years ago

Yeah but since then he calmed down and some other stuff happened. Also, there's some confusion now over what the current state is / what happened. Hence the question. If you're willing to take over the bug, please assign yourself, thanks! IMHO, this should be fixed properly instead of worked around (as in Bug 1014183) …

Nicolas B. Pierron [:nbp]

Comment 44

•

11 years ago

jandem: I think comment 32 highlights that this patch might not be the reason of the emulator timeouts. We should wait for edmorley to confirm if this patch can re-land or not. (thx ahal for noticing)

Flags: needinfo?(nicolas.b.pierron)

Ed Morley [:emorley]

Comment 45

•

11 years ago

I didn't see the failure rate that was responsible for backing this out, so can't tell if the failures seen in bug 1015200 are exactly the same, or only a subset of what was occurring before. How about two try pushes, one with inbound tip and one with inbound tip + the patches here, using something like trychooser "try: -b do -p emulator -u reftest,crashtest -t none" - and then I'll retrigger a load of jobs and we can directly compare?

Flags: needinfo?(emorley)

Andrew McCreight (out until 2026) [:mccr8]

Comment 46

•

11 years ago

without the patch: https://tbpl.mozilla.org/?tree=Try&rev=bd98ed92bf50 with the patch: https://tbpl.mozilla.org/?tree=Try&rev=0e427fc574ea

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 47

•

11 years ago

The number of timeouts went down! Can we land this?

Flags: needinfo?(emorley)

Andrew McCreight (out until 2026) [:mccr8]

Comment 48

•

11 years ago

I retriggered every one of those twice, just now, so we'll see if the picture changes with a bigger sample.

Andrew McCreight (out until 2026) [:mccr8]

Comment 49

•

11 years ago

With my retriggers, there is 1 out of 35 timeouts with Brian's patch, and 4 out of 35 without.

Ed Morley [:emorley]

Comment 50

•

11 years ago

I've requested a few more retriggers, but looking promising so far. These latest retriggers should be complete within 60-90 mins.

Flags: needinfo?(emorley)

Ed Morley [:emorley]

Comment 51

•

11 years ago

Looks good to land :-)

Jan de Mooij [:jandem]

Comment 52

•

11 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #51) > Looks good to land :-) https://hg.mozilla.org/integration/mozilla-inbound/rev/93dce4b831f3 (Bug 1013663 made this not apply cleanly; pushed before it bitrots more.)

Flags: needinfo?(anygregor)

Tim Taubert [:ttaubert] (inactive)

Comment 53

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/93dce4b831f3

Assignee: nobody → bhackett1024

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Jesse Ruderman

Updated

•

11 years ago

Depends on: 1018533

Hannes Verschore [:h4writer]

Updated

•

11 years ago

Depends on: 1019224

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Blocks: 1014183

patch 11 years ago Brian Hackett [Laid off!] 34.11 KB, patch		Details \| Diff \| Splinter Review
updated 11 years ago Brian Hackett [Laid off!] 40.96 KB, patch	billm : review+	Details \| Diff \| Splinter Review
Use JS helper threads for GC background sweeping / allocation (rebased) 11 years ago Andrew McCreight (out until 2026) [:mccr8] 41.21 KB, patch		Details \| Diff \| Splinter Review