Last Comment Bug 813867 - Report memory for web workers that use ctypes
: Report memory for web workers that use ctypes
Status: RESOLVED FIXED
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: mozilla20
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
Mentors:
: 820894 820896 820898 820906 (view as bug list)
Depends on: 821493 827274
Blocks: DarkMatter B2GDarkMatter 820906
  Show dependency treegraph
 
Reported: 2012-11-20 22:00 PST by Nicholas Nethercote [:njn]
Modified: 2013-02-08 10:55 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
wontfix
wontfix
fixed
fixed
19+
fixed
affected


Attachments
partial, hacky patch (43.85 KB, patch)
2012-12-13 14:20 PST, Nicholas Nethercote [:njn]
no flags Details | Diff | Splinter Review
Very draft patch, v1 (11.75 KB, patch)
2012-12-13 17:41 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch, v1 (36.70 KB, patch)
2012-12-29 13:54 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
jorendorff: review+
khuey: review+
justin.lebar+bug: feedback+
justin.lebar+bug: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
B2G about:memory with this patch applied (23.91 KB, text/plain)
2012-12-30 09:52 PST, Justin Lebar (not reading bugmail)
no flags Details
Patch for b2g-18 (29.29 KB, patch)
2013-01-17 13:06 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
overholt: approval‑mozilla‑b2g18+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2012-11-20 22:00:04 PST
If a web workers uses ctypes its memory reporter gives up and does nothing.  This misses a non-trivial amount of memory for standard Firefox workers like esource:///modules/PageThumbsWorker.js and the async-file-io worker (whose name I can't remember now) -- just the JSRuntime for a worker is several 100s of KiBs.

The reason the memory reporter gives up is that some workers using ctypes do things like sit in blocking IO poll loops (we had this problem specifically with the LastPass extension, bug 679551). Our current implementation of memory reporters on workers blocks the main thread while we wait for the worker to respond to the memory request, and if the worker is blocked then the process will deadlock.

The way to fix this is with an asynchronous memory reporting API.  bent suggested one in bug 673323 comment 4.  Here's a tweaked version that would match the existing multi-reporter interface more closely:

  interface nsIMemoryAsyncMultiReporter : nsIMemoryMultiReporter {
    void cancelCollection();
  };

  interface nsIMemoryReporterManager  {
    ...

    void registerAsyncMultiReporter(in nsIMemoryAsyncMultiReporter reporter);
    void unregisterAsyncMultiReporter(in nsIMemoryAsyncMultiReporter reporter);

    void enumerateAsyncMultiReporters();
  };

You'd call CollectReports() as usual on the async reporter.  The difference is that if it hadn't completely within a certain time limit, it would be interrupted and cancelCollection() would be called.  Maybe collectReports() would be modified for async reporters by adding an extra arg that specifies the time limit.

I'm happy to implement this, but my understanding of timers and how this interrupting would work is very hazy.
Comment 1 Nicholas Nethercote [:njn] 2012-11-25 17:35:15 PST
I just tried experimenting with nsITimer::InitWithFuncCallback, but I don't think its powerful enough.  I guess it relies on control getting back to the event loop?  It certainly wasn't able to break out of an infinite loop.

So, what next?  Something signal-based, like setitimer()?  It's currently only used in Mozilla code within some 3rd-party code (rtc, skia, chromium) and some profiling code (jprof, tools/profiler/).
Comment 2 Justin Lebar (not reading bugmail) 2012-11-26 06:55:31 PST
> I just tried experimenting with nsITimer::InitWithFuncCallback, but I don't think its 
> powerful enough.  I guess it relies on control getting back to the event loop?

Yes.

> Something signal-based, like setitimer()?

Would this require us to audit all invocations of blocking functions in workers to ensure that they handle ERRINTR properly?

I don't understand this problem space, but would it be possible for us to spin up a new thread which tries to acquire a lock protecting the worker's JS runtime?  The worker releases this lock when it's blocked on I/O, and when it reaches the event loop.
Comment 3 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-26 07:03:56 PST
What's wrong with poking the OperationCallback like existing worker control messages do?
Comment 4 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-11-26 13:31:43 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> What's wrong with poking the OperationCallback like existing worker control
> messages do?

That happens automatically as soon as you post a control runnable to the worker (as the memory reporter stuff does now).
Comment 5 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-11-26 13:34:54 PST
Right, so what's the problem here?

If you want a timer to cancel the runnable you receive the timer callback on the main thread and set a flag on the runnable telling it not to do anything if/when it finally gets to run.
Comment 6 Andrew McCreight [:mccr8] 2012-12-12 13:40:07 PST
*** Bug 820898 has been marked as a duplicate of this bug. ***
Comment 7 Andrew McCreight [:mccr8] 2012-12-12 13:41:26 PST
jlebar found 4 unreported workers on B2G. It looks like net_worker.js, ril_worker.js and wifi_worker.js all use ctypes. wifi_worker seems to get spawned twice, so it seems like this bug is probably responsible for the unreported worker related memory he is seeing.
Comment 8 Andrew McCreight [:mccr8] 2012-12-12 13:42:00 PST
*** Bug 820896 has been marked as a duplicate of this bug. ***
Comment 9 Nicholas Nethercote [:njn] 2012-12-12 14:14:54 PST
*** Bug 820894 has been marked as a duplicate of this bug. ***
Comment 10 Nicholas Nethercote [:njn] 2012-12-13 14:20:08 PST
Created attachment 692020 [details] [diff] [review]
partial, hacky patch

Here's what I have so far.  (Sorry, I should have assigned this to myself.)

Basic idea is to create a new kind of reporter, nsIAsyncMemoryMultiReporter.  Its interface probably needs to change, I was kind of feeling my way with this stuff.  I modified aboutMemory.js to handle this kind of reporter, and I'm more confident about that part.  I also have two instances, OkAsyncMemoryMultiReporter and LoopyAsyncMemoryMultiReporter that I was using for testing;  the latter has an infinite loop, and so it should time out.

IIRC a problem with this code in its current form is that the timer used for the async reporters always runs to completion, even if all the async reporters have finished.  That clearly needs fixing.

I'm not particularly attached to this code, and it uses lots of pieces of Gecko that I'm not familiar with.  If you can see a better way to do it then go right ahead!
Comment 11 Justin Lebar (not reading bugmail) 2012-12-13 17:18:25 PST
I'll post an ugly WIP in a minute, but this is from my phone:

> ├──11.59 MB (30.47%) -- workers/workers()
> │  ├───6.26 MB (16.45%) -- worker(resource://gre/modules/ril_worker.js, 43aec400)
> │  │   ├──2.37 MB (06.24%) -- runtime
> │  │   │  ├──0.79 MB (02.07%) ── script-sources
> │  │   │  ├──0.65 MB (01.71%) ── temporary
> │  │   │  ├──0.52 MB (01.36%) ── jaeger-code
> │  │   │  └──0.42 MB (01.09%) ++ (11 tiny)
> │  │   ├──2.22 MB (05.83%) -- compartment(web-worker)
> │  │   │  ├──1.31 MB (03.43%) -- type-inference
> │  │   │  │  ├──1.16 MB (03.04%) ── analysis-pool [2]
> │  │   │  │  └──0.15 MB (00.39%) ++ (2 tiny)
> │  │   │  ├──0.46 MB (01.20%) ++ (6 tiny)
> │  │   │  └──0.45 MB (01.19%) ++ gc-heap
> │  │   ├──1.41 MB (03.70%) -- gc-heap
> │  │   │  ├──1.38 MB (03.61%) ── unused-arenas
> │  │   │  └──0.03 MB (00.08%) ++ (3 tiny)
> │  │   └──0.26 MB (00.69%) ++ compartment(web-worker-atoms)
> │  ├───2.78 MB (07.30%) -- worker(resource://gre/modules/wifi_worker.js, 42bcb800)
> │  │   ├──1.75 MB (04.61%) -- gc-heap
> │  │   │  ├──1.72 MB (04.53%) ── unused-arenas
> │  │   │  └──0.03 MB (00.08%) ++ (3 tiny)
> │  │   ├──0.47 MB (01.25%) ++ compartment(web-worker)
> │  │   ├──0.45 MB (01.17%) ++ runtime
> │  │   └──0.10 MB (00.27%) ++ compartment(web-worker-atoms)
> │  ├───2.55 MB (06.71%) -- worker(resource://gre/modules/net_worker.js, 44780c00)
> │  │   ├──1.73 MB (04.55%) -- gc-heap
> │  │   │  ├──1.70 MB (04.47%) ── decommitted-arenas
> │  │   │  └──0.03 MB (00.08%) ++ (3 tiny)
> │  │   └──0.82 MB (02.17%) ++ (3 tiny)
Comment 12 Justin Lebar (not reading bugmail) 2012-12-13 17:21:50 PST
Unfortunately with a 1s timeout it looks like we're giving up on multiple workers.  Obviously that's not ideal.
Comment 13 Justin Lebar (not reading bugmail) 2012-12-13 17:22:48 PST
And I should say, it seems to take these two particular workers ~60s before they unblock and run my memory report runnable.
Comment 14 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-12-13 17:27:53 PST
Yeah, these workers spend a lot of time blocking on FDs.
Comment 15 Andrew McCreight [:mccr8] 2012-12-13 17:28:22 PST
Would it be useful to indicate in about:memory that there were workers that failed to report back? Assuming you can do that.
Comment 16 Justin Lebar (not reading bugmail) 2012-12-13 17:39:41 PST
(In reply to Andrew McCreight [:mccr8] from comment #15)
> Would it be useful to indicate in about:memory that there were workers that
> failed to report back? Assuming you can do that.

We currently report them with 0mb memory used, which causes them to be hidden.  We could change that, of course.

But really, I want to get memory info out of these blocked workers...
Comment 17 Justin Lebar (not reading bugmail) 2012-12-13 17:41:41 PST
Created attachment 692110 [details] [diff] [review]
Very draft patch, v1

This is very rough, but I'm posting it mostly in case anyone has feedback on
the general approach of:

* enqueue a control runnable to the worker, then
* spin a nested event loop waiting for the worker to execute this runnable.

If B2G workers are commonly blocked, getting memory reports only from the
workers that aren't currently blocked on an fd seems pretty lame to me.

I wonder if there's something better we can do here.
Comment 18 Justin Lebar (not reading bugmail) 2012-12-13 17:50:35 PST
> But really, I want to get memory info out of these blocked workers...

You've probably already considered this, but what if

* We have a mutex M.
* When a worker starts running, it acquires a lock on M.
* When the worker calls a ctypes function, we release M.  When the ctypes call completes, it acquires M.
* When we want to get a memory report from the worker, we 

  * try to acquire a lock on M, and
  * enqueue a worker control task to get the memory report, as we do currently

Whichever one completes first gives us the memory report, and we cancel the other one.

I waved my hands a bit about the synchronization (we'd probably have to use a condvar), but does the general idea work?
Comment 19 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-12-14 08:56:23 PST
(In reply to Justin Lebar [:jlebar] from comment #18)
> but does the general idea work?

It sounds basically like what we do already. Dispatching a control runnable short-circuits any running JS and immediately processes the runnable (using JS_TriggerOperationCallback and a condvar), so if the worker isn't blocked we'll report memory immediately.

If the worker is blocked then we can't do anything to unblock it, so the runnable will wait until the next point where the JS engine can bail out for the operation callback.
Comment 20 Justin Lebar (not reading bugmail) 2012-12-14 10:46:37 PST
> > but does the general idea work?
>
> It sounds basically like what we do already.

I see.  I didn't realize that the control runnable runs immediately if the worker isn't in ctypes, so presumably we could wait N ms for the control runnable to complete and that would be equivalent to what I posted.

bent, jorendorff, and I agreed that comment 18 is basically our best bet here.  We just need to have a conversation with sicking about relative priorities.
Comment 21 Nicholas Nethercote [:njn] 2012-12-19 17:22:34 PST
One of the motivations for this bug is that Firefox uses workers itself by default, e.g. osfile_async_worker.js and resource:///modules/PageThumbsWorker.js.

I wonder if it would be worth having a whitelist of workers that use ctypes that we know don't cause problems for the memory reporters...
Comment 22 Justin Lebar (not reading bugmail) 2012-12-19 17:37:06 PST
> I wonder if it would be worth having a whitelist of workers that use ctypes that we know 
> don't cause problems for the memory reporters...

Given that none of us is reviewing that worker code, and given that we run worker memory reporters during telemetry, I'm not sure I'd want to take that risk, myself.
Comment 23 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-12-19 19:29:00 PST
Nominating for blocking since we suspect there might be very high percentages of our main process being consumed by workers. Comment 11 indicates up towards 30% of all parent process memory if I understand correctly?
Comment 24 Justin Lebar (not reading bugmail) 2012-12-19 19:46:35 PST
(In reply to Jonas Sicking (:sicking) from comment #23)
> Comment 11 indicates up towards 30% of all parent process
> memory if I understand correctly?

Yes.  I'd go further than saying we suspect that a lot of our main-process memory is workers that use ctypes; I'm relatively sure that the report in comment 11 isn't bogus.  (It's also missing one worker, I think.)

But I don't think this is a blocker because

 * we don't have a clear path towards fixing this bug,
 * no user will notice whether we've fixed this bug or not, and
 * we can measure memory allocated in workers which use ctypes using DMD, although it's difficult and imprecise.  (And critically, DMD requires a special build, while about:memory does not.)

Having said that, I'd really like us to fix this bug, and if we can fix it before we release, that's even better.  This bug is important because if one of these workers starts using a lot of memory for whatever reason, we can't tell unless we're running a special DMD build.  High memory usage here is potentially catastrophic -- you may not be able to do anything to the phone until you reboot it, because the memory is tied up in the main process.  So if a tester or user hits a case where one of these workers is using a lot of memory, we have no way to understand what's going on without this bug fixed.

If that's scary enough to block, please go for it!
Comment 25 Doug Turner (:dougt) 2012-12-20 10:05:29 PST
What justin said.  As the bug is written up, i don't think we want to block for this.  It is important to understand if we are leaking or using too much memory before we ship v.1, but as for reporting this we can live with out it.
Comment 26 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-12-20 12:31:18 PST
Actually, I agree with that too. I actually intended that this be marked as a "softblocker", but it appears that it never got evaluated for that.

So I'll just move it to that state myself :) Feel free to renom, or minus, if you disagree.
Comment 27 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-12-29 13:54:42 PST
Created attachment 696539 [details] [diff] [review]
Patch, v1

Ok, this should work.

jorendorff, can you please look at the js/src changes? It's the basic ctypes callback API that we discussed on irc.

khuey, can you take a look at the dom/workers changes? We may need to sit down and go through it together, but the basic idea is to do all memory reporting on the main thread while the worker is blocked. There are now two signals that are flagged to make this all work: mMemoryReporterRunning is the way that the main thread tells the worker that a memory report is requested, and mBlockedForMemoryReporter is the way that the worker thread acknowledges that it is safe to proceed. That blocking can happen when the worker is waiting for events, when the worker is currently calling into ctypes, or if we trap it in the operation callback.

jlebar, I'd love it if you could test this on b2g!
Comment 28 Justin Lebar (not reading bugmail) 2012-12-30 09:28:39 PST
I had to change the patch as follows in order to get this to compile on my machine.  (I was getting "can't static_cast char[21] to void*".)  I think I found a version of gecko which doesn't BSoD on B2G; I'll post a memory report if it works this time.

diff -u b/dom/workers/WorkerPrivate.cpp b/dom/workers/WorkerPrivate.cpp
--- b/dom/workers/WorkerPrivate.cpp
+++ b/dom/workers/WorkerPrivate.cpp
@@ -1431,7 +1431,8 @@
     aCompartmentStats->extra1 = strdup(cJSPathPrefix.get());
 
     // This should never be used when reporting with workers (hence the "?!").
-    aCompartmentStats->extra2 = static_cast<void*>("explicit/workers/?!/");
+    static const char* bogusMemoryReporterPath = "explicit/workers/?!/";
+    aCompartmentStats->extra2 = const_cast<char*>(bogusMemoryReporterPath);
   }
 };
Comment 29 Justin Lebar (not reading bugmail) 2012-12-30 09:52:41 PST
Created attachment 696631 [details]
B2G about:memory with this patch applied

I took this memory dump right after rebooting the phone.  The first-run screen was showing.

We have 15% heap-unclassified in the main process, which is by far the lowest I've ever seen on B2G.
Comment 30 Justin Lebar (not reading bugmail) 2012-12-30 09:53:22 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

My only regret is that I have but one + to give to this patch.
Comment 31 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-12-30 10:54:29 PST
(In reply to Justin Lebar [:jlebar] (away 12/21-1/2) from comment #30)

Great! Thanks for checking :)
Comment 32 Nicholas Nethercote [:njn] 2012-12-30 12:26:57 PST
> My only regret is that I have but one + to give to this patch.

You'll get to aurora+ and b2g18+ it soon enough :)
Comment 33 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2012-12-31 18:47:27 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

Review of attachment 696539 [details] [diff] [review]:
-----------------------------------------------------------------

It looks pretty good, but I still want to talk about a few things.

::: dom/workers/WorkerPrivate.cpp
@@ +2850,5 @@
> +      // Wait until the worker actually blocks.
> +      while (!mBlockedForMemoryReporter) {
> +        mMemoryReportCondVar.Wait();
> +      }
> +    }

I'm not thrilled about blocking the main thread waiting for the operation callback to run here.  We should talk through the implications of this.

@@ +2868,1 @@
>    }

Are the JS people committed to allowing us to do this from another thread?

@@ +2886,5 @@
> +{
> +  // This may happen on the worker thread or the main thread.
> +  MutexAutoLock lock(mMutex);
> +
> +  NS_ASSERTION(mMemoryReporterAlive, "Must be alive!");

This assertion will fire if NS_RegisterMemoryMultiReporter fails.

@@ +2912,5 @@
> +    NS_WARNING("Failed to register memory reporter!");
> +    // No need to lock here since a failed registration means our memory
> +    // reporter can't start running. Just clean up.
> +    mMemoryReporterAlive = false;
> +    mMemoryReporter = nullptr;

So you should just not set mMemoryReporterAlive here.

@@ +2948,5 @@
> +  if (NS_FAILED(NS_UnregisterMemoryMultiReporter(memoryReporter))) {
> +    // If this fails then the memory reporter will probably never die and we'll
> +    // hang below waiting for it. If this ever happens we need to fix
> +    // NS_UnregisterMemoryMultiReporter.
> +    NS_ERROR("Failed to unregister memory reporter! Worker is going to hang!");

I think this should be a hard abort.  If this ever starts happening we want to see it in crashstats.
Comment 34 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-12-31 22:44:50 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #33)
> I'm not thrilled about blocking the main thread waiting for the operation
> callback to run here.  We should talk through the implications of this.

It's what we do today already, though (just not for workers that use ctypes). The memory report runnable that we dispatch currently blocks the main thread until the operation callback processes the control runnable.

> Are the JS people committed to allowing us to do this from another thread?

I sure hope so! It only makes sense to call it from a different thread.

> This assertion will fire if NS_RegisterMemoryMultiReporter fails.

Yeah, can fix.

> I think this should be a hard abort.  If this ever starts happening we want
> to see it in crashstats.

Ok.

Anything else?
Comment 35 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2013-01-02 09:35:41 PST
(In reply to ben turner [:bent] from comment #34)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #33)
> > I'm not thrilled about blocking the main thread waiting for the operation
> > callback to run here.  We should talk through the implications of this.
> 
> It's what we do today already, though (just not for workers that use
> ctypes). The memory report runnable that we dispatch currently blocks the
> main thread until the operation callback processes the control runnable.

Bleh, ok.

> > Are the JS people committed to allowing us to do this from another thread?
> 
> I sure hope so! It only makes sense to call it from a different thread.

You should verify that with them :-P
Comment 36 Jason Orendorff [:jorendorff] 2013-01-03 08:29:05 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

Review of attachment 696539 [details] [diff] [review]:
-----------------------------------------------------------------

Oops, I marked this r+ but suddenly I have a few questions about GC worker threads.
Comment 37 Jason Orendorff [:jorendorff] 2013-01-03 08:52:03 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #35)
> > > Are the JS people committed to allowing us to do this from another thread?
> > 
> > I sure hope so! It only makes sense to call it from a different thread.
> 
> You should verify that with them :-P

This is the point I'm not sure about.

This code treats the worker's whole JSRuntime as a monolithic non-thread-safe data structure and adds locking around it. That's a fine plan, and totally plausible... except that the JSRuntime does have some multithreading going on internally: a GC helper thread. I'm worried that causing the worker (i.e. the JSRuntime's home thread) to block and stop paying attention to its GC helper thread could lead to deadlock. But maybe only because I don't know much about that code.

I don't yet see what mechanism or API guarantee makes it safe. Two locking schemes (one in your new code, one inside the JS engine) that have no detailed knowledge of each other are both protecting the same data.
Comment 38 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2013-01-03 08:57:07 PST
Worker threads don't use the separate gc thread.
Comment 39 Jason Orendorff [:jorendorff] 2013-01-03 09:03:50 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

Transferring review to jonco. The GC helper thread code isn't terribly well-owned.
Comment 40 Jason Orendorff [:jorendorff] 2013-01-03 09:05:41 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

Switching back to r+ given comment 38. This is fine, go for it.
Comment 41 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2013-01-03 09:06:20 PST
Perhaps we should add some assertions that all the ancillary threads are turned off if we call this on a different thread though?
Comment 42 Andrew McCreight [:mccr8] 2013-01-03 09:47:47 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #41)
> Perhaps we should add some assertions that all the ancillary threads are
> turned off if we call this on a different thread though?

That's probably a good idea. I could imagine that in the future we might want to turn on various JS helper threads for workers to get additional parallelism.
Comment 43 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-03 10:17:21 PST
(In reply to Andrew McCreight [:mccr8] from comment #42)
> That's probably a good idea. I could imagine that in the future we might
> want to turn on various JS helper threads for workers to get additional
> parallelism.

Anyone disagree that this can be done in a followup? I'd like to land this asap.
Comment 44 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-04 12:26:47 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b2611eed98a
Comment 45 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-04 12:30:31 PST
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #41)
> Perhaps we should add some assertions that all the ancillary threads are
> turned off if we call this on a different thread though?

Bug 826818.
Comment 46 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-04 12:35:25 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 679551 (kinda... We basically have never been able to gather memory reports from workers that use ctypes reliably)
User impact if declined: All workers that use ctypes will report 0 memory used.
Testing completed: tryserver was totally green
Risk to taking this patch (and alternatives if risky): It's possible that this could cause deadlocks if I goofed somewhere. No testing thus far has revealed anything so I guess we can't know for sure until we have some clean hang data.
String or UUID changes made by this patch: None
Comment 47 Justin Lebar (not reading bugmail) 2013-01-04 12:37:00 PST
Comment on attachment 696539 [details] [diff] [review]
Patch, v1

This is pretty critical stuff, IMO.
Comment 48 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-04 14:31:24 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/a1d1e72afe82
https://hg.mozilla.org/releases/mozilla-aurora/rev/d14b0128b5ae

(Needed a little merge fix)
Comment 49 Phil Ringnalda (:philor) 2013-01-05 16:30:09 PST
https://hg.mozilla.org/mozilla-central/rev/4b2611eed98a
Comment 50 Ryan VanderMeulen [:RyanVM] 2013-01-08 13:35:35 PST
Ben, is this (along with bug 827274 presumably) intended to land on b2g18 as well?
Comment 51 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-08 13:58:23 PST
(In reply to Ryan VanderMeulen from comment #50)
> Ben, is this (along with bug 827274 presumably) intended to land on b2g18 as
> well?

They don't apply cleanly (may require some other patches to be merged first) so I'll handle the uplift if it all gets approved. Thanks.
Comment 52 Nicholas Nethercote [:njn] 2013-01-16 01:39:05 PST
*** Bug 820906 has been marked as a duplicate of this bug. ***
Comment 53 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-17 13:06:33 PST
Created attachment 703563 [details] [diff] [review]
Patch for b2g-18

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 679551 (kinda... We basically have never been able to gather memory reports from workers that use ctypes reliably)
User impact if declined: All workers that use ctypes will report 0 memory used. This looks to be about 25% of the main process under normal conditions. If we don't take this patch on b2g-18 then we will not be able to diagnose memory problems in the very critical system workers (phone, wifi, netd) on the devices that we actually ship.
Testing completed: m-c now for a while, also m-a & m-b.
Risk to taking this patch (and alternatives if risky): Now that this has baked on three other branches for a while we know it's pretty safe as long as we also land the one additional fix (bug 827274).
String or UUID changes made by this patch: None
Comment 54 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-17 13:13:20 PST
Based on bug 827274 comment 12 I'm going to back this out of mozilla-beta.
Comment 55 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-17 14:00:51 PST
https://hg.mozilla.org/releases/mozilla-beta/rev/5851eefea110
Comment 56 Lukas Blakk [:lsblakk] use ?needinfo 2013-01-22 12:21:58 PST
Given that this is non-blocking and makes a pretty significant amount of change, we'll hold off on approving for branch landing until after v1.0.0 -- see https://wiki.mozilla.org/Release_Management/B2G_Landing which explains further but the options are either to land to the date branch now so that it gets merged into 1.0.1 after 1/25 or to wait and get b2g18 approval after 1/25 for direct landing to that branch.
Comment 57 Justin Lebar (not reading bugmail) 2013-02-01 11:08:55 PST
See bug 837187 as an example of the sort of bug report that is made difficult without this patch.
Comment 58 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-02-06 02:49:30 PST
https://hg.mozilla.org/releases/mozilla-b2g18/rev/bdf4eeccfb41
Comment 59 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-02-07 04:09:03 PST
I backed this out of b2g18 because it caused a perma-orange on linux pgo (only)... https://hg.mozilla.org/releases/mozilla-b2g18/rev/2035414073b7
Comment 60 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-02-08 10:55:05 PST
https://hg.mozilla.org/releases/mozilla-b2g18/rev/b88210291874

Note You need to log in before you can comment on or make changes to this bug.