Closed Bug 1309929 Opened 3 years ago Closed 3 years ago

Worker memory reporting happens off the main thread

Categories

(Core :: DOM: Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: jonco, Assigned: jonco)

Details

Attachments

(2 files, 2 obsolete files)

I noticed when working on bug 1297769 that the memory reporting for workers doesn't happen on the worker's main thread.  What seems to happen is that an interrupt is requested, the interrupt callback blocks the thread and the memory reporting happens on a different thread.

I'd like to add an assertion that we don't traverse the JS heap on anything other than the runtime's main thread but it's failing for this case.

njn, do you know what it works like this and if it's possible to change it?
Flags: needinfo?(n.nethercote)
I think we used to not support async memory reporting, so maybe that is why it is like this. We do support it now (for e10s), so maybe it could be reworked.
Yes we have async reporters now (e10s or not). You'll want to use one of aysnc registry functions [1] (note these are thread safe, but your memory reporter will still be called on the main thread). You can see an example in MediaStreamGraph [2].

[1] http://searchfox.org/mozilla-central/rev/2142de26c16c05f23e543be4fa1a651c4d29604e/xpcom/base/nsIMemoryReporter.idl#223,233
[2] http://searchfox.org/mozilla-central/rev/2142de26c16c05f23e543be4fa1a651c4d29604e/dom/media/MediaStreamGraph.cpp#3458-3504
Yes, what mccr8 and erahm said.

The worker reporter is truly horrid so it would be lovely if it could be rewritten to be nicer.
Flags: needinfo?(n.nethercote)
Priority: -- → P2
Here's a patch to make this use async memory reporting.  How does this look?

The memory reporter dispatches a worker control runnable to the worker's main thread to do the memory reporting, and this dispatches another runnable back to the main main thread to finish things off.

WorkerPrivate::MemoryReporter::mWorkerPrivate is still protected by the mutex but I removed all the other synchronisation code which is no longer needed now we're not blocking the worker thread.

I had to move WorkerJSContextStats outside of the anonymous namespace because I was getting compiler warnings (something like "blah has a field blah whose type uses the anonymous namespace").  I don't  understand why this is.
Assignee: nobody → jcoppeard
Attachment #8804353 - Flags: feedback?(erahm)
Comment on attachment 8804353 [details] [diff] [review]
bug1309929-worker-memory-reporter

Review of attachment 8804353 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, much cleaner. I added some notes about some potential thread issues.

::: dom/workers/WorkerPrivate.cpp
@@ +1944,5 @@
> +
> +  ~WorkerJSContextStats()
> +  {
> +    for (size_t i = 0; i != zoneStatsVector.length(); i++) {
> +      delete static_cast<xpc::ZoneStatsExtras*>(zoneStatsVector[i].extra);

Not your fault, but wow this is awful.

@@ +2033,5 @@
>  private:
> +  class CollectReportsRunnable final : public MainThreadWorkerControlRunnable
> +  {
> +    RefPtr<nsIHandleReportCallback> mHandleReport;
> +    RefPtr<nsISupports> mHandlerData;

These should probably be nsCOMPtr.

It's not clear to me if this runnable is destructed on the main thread or not, but I'm reasonably sure handleReport and handleData do not have thread-safe ref counting and need to be released on the main thread.

I think the only way that would be an issue is if this request were queued up and the worker was shutdown (without processing the request).

@@ +2057,5 @@
> +
> +  class FinishCollectRunnable final : public Runnable
> +  {
> +    RefPtr<nsIHandleReportCallback> mHandleReport;
> +    RefPtr<nsISupports> mHandlerData;

These should probably be nsCOMPtr.

@@ +2209,5 @@
> +
> +  runnable->SetSuccess(
> +    aWorkerPrivate->CollectRuntimeStats(&runnable->mCxStats, mAnonymize));
> +
> +  NS_DispatchToMainThread(runnable.forget());

I think it might be safer to dispatch this in the dtor which would make sure EndReport is called even if |WorkerRun| is not -- it's possible this is a non-issue if |WorkerRun| is guaranteed to be called.
Attachment #8804353 - Flags: feedback?(erahm) → feedback+
(In reply to Eric Rahm [:erahm] from comment #5)
> These should probably be nsCOMPtr.

On second thought RefPtr is probably fine as you're not going to QI them.
> On second thought RefPtr is probably fine as you're not going to QI them.

Isn't the general rule to use RefPtr for nsFoo and nsCOMPtr for nsIFoo? In which case nsCOMPtr is better...
Here's an updated patch with comments addressed.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb9dcddb598e1e7ce0dcc8dc77b4c87795f944df&selectedJob=29931696
Attachment #8804353 - Attachment is obsolete: true
Attachment #8805041 - Flags: review?(erahm)
Comment on attachment 8805041 [details] [diff] [review]
bug1309929-worker-memory-reporter v2

Review of attachment 8805041 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, just a few stylistic comments and maybe some further tweaking if you think it's worth it.

::: dom/workers/WorkerPrivate.cpp
@@ +2041,5 @@
> +      WorkerPrivate* aWorkerPrivate,
> +      nsIHandleReportCallback* aHandleReport,
> +      nsISupports* aHandlerData,
> +      bool aAnonymize,
> +      nsCString aPath);

nit: This could be const&

@@ +2044,5 @@
> +      bool aAnonymize,
> +      nsCString aPath);
> +
> +  private:
> +    bool WorkerRun(JSContext* aCx, WorkerPrivate* aWorkerPrivate);

nit: This should probably be marked override or final

@@ +2066,5 @@
> +    explicit FinishCollectRunnable(
> +      nsIHandleReportCallback* aHandleReport,
> +      nsISupports* aHandlerData,
> +      bool aAnonymize,
> +      nsCString aPath);

nit: This could be const&

@@ +2114,5 @@
> +
> +  RefPtr<CollectReportsRunnable> runnable;
> +
> +  {
> +    MutexAutoLock lock(mMutex);

I'm not sure if it's worth the effort but I think we could reduce the scope of this lock a bit. Basically:
 - Grab the lock
    - Check mWorkerPrivate
    - Copy the things we need (ScriptURL, Domain)
 - Do string building, update TryToMapAddon to take the URL and call it
 - Grab the lock?
    - Create the runnable

I'm fine either way here, we've already improved things a fair amount.

@@ +2159,5 @@
> +{
> +  AssertIsOnMainThread();
> +  mMutex.AssertCurrentThreadOwns();
> +
> +  if (mAlreadyMappedToAddon || !mWorkerPrivate) {

I don't really follow what's going on with |mAlreadyMappedToAddon|. It seems like that would prevent subsequent memory reports from mapping the add-on path.

Regardless, this is preexisting so maybe just worth a followup.
Attachment #8805041 - Flags: review?(erahm) → review+
(In reply to Eric Rahm [:erahm] from comment #9)
Cheers for the quick review.

I'm going to leave the mutex handling as is because to me it's more obviously correct the way it is.

I don't know what's going on with mAlreadyMappedToAddon either.  nmaier, do you know what this is doing?
Flags: needinfo?(maierman)
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b010bc739093
Use an async memory report for workers r=erahm
backed out for causing frequent failures like https://treeherder.mozilla.org/logviewer.html#?job_id=38373158&repo=mozilla-inbound
Flags: needinfo?(jcoppeard)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fdf88420e469
Backed out changeset b010bc739093 for frequent failures in test_WebAudioMemoryReporting.html
Hmm, so this doesn't fail locally...  Eric, and idea how I might go about debugging this?
Flags: needinfo?(jcoppeard) → needinfo?(erahm)
Comment on attachment 8805041 [details] [diff] [review]
bug1309929-worker-memory-reporter v2

Review of attachment 8805041 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +2032,5 @@
> +  class FinishCollectRunnable;
> +
> +  class CollectReportsRunnable final : public MainThreadWorkerControlRunnable
> +  {
> +    RefPtr<FinishCollectRunnable> mFinshCollectRunnable;

mFinishCollectRunnable

@@ +2121,5 @@
> +      // Returning NS_OK here will effectively report 0 memory.
> +      return NS_OK;
> +    }
> +
> +    nsCString path;

nsAutoCString

@@ +2126,5 @@
> +    path.AppendLiteral("explicit/workers/workers(");
> +    if (aAnonymize && !mWorkerPrivate->Domain().IsEmpty()) {
> +      path.AppendLiteral("<anonymized-domain>)/worker(<anonymized-url>");
> +    } else {
> +      nsCString escapedDomain(mWorkerPrivate->Domain());

nsAutoCString

@@ +2198,5 @@
> +  WorkerPrivate* aWorkerPrivate,
> +  nsIHandleReportCallback* aHandleReport,
> +  nsISupports* aHandlerData,
> +  bool aAnonymize,
> +  nsCString aPath)

const nsACString& aPath

@@ +2221,5 @@
> +WorkerPrivate::MemoryReporter::FinishCollectRunnable::FinishCollectRunnable(
> +  nsIHandleReportCallback* aHandleReport,
> +  nsISupports* aHandlerData,
> +  bool aAnonymize,
> +  nsCString aPath)

const nsACString& aPath
(In reply to Jon Coppeard (:jonco) from comment #14)
> Hmm, so this doesn't fail locally...  Eric, and idea how I might go about
> debugging this?

A timeout like that really shouldn't happen, we have a watchdog timer set to 50s so the finish callback should still be hit (about 4:10 before the timeout). There are other exceptions that could happen in the memory reporting code but I'd expect to see warnings (and this is a debug build). Also considering it's non-e10s there's much less that could go wrong.

I'll see if I can repro locally though.
Flags: needinfo?(erahm)
(In reply to Eric Rahm [:erahm] from comment #16)
Thanks!  I added some tracing and put it through try and it seems like nsMemoryReporterManager::EndReport isn't getting called enough times.
(In reply to Jon Coppeard (:jonco) from comment #17)
> (In reply to Eric Rahm [:erahm] from comment #16)
> Thanks!  I added some tracing and put it through try and it seems like
> nsMemoryReporterManager::EndReport isn't getting called enough times.

Was it for the worker reporter or the audio reporter? Either way we should get a bug on file for the timeout not working.
Attached patch bug1309929-fixSplinter Review
It turns out I wasn't calling EndReport when mWorkerPrivate was null in CollectReports.  The tests pass with this change.

(I looked at creating the FinishCollectRunnable first so that it would take care of this when it ran, but there was a dependency all the way back to WorkerPrivate that made this problematic).
Attachment #8806363 - Flags: review?(erahm)
Comment on attachment 8805041 [details] [diff] [review]
bug1309929-worker-memory-reporter v2

Review of attachment 8805041 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +2048,5 @@
> +    bool WorkerRun(JSContext* aCx, WorkerPrivate* aWorkerPrivate);
> +
> +    ~CollectReportsRunnable()
> +    {
> +      NS_DispatchToMainThread(mFinshCollectRunnable.forget());

If you are being destructed on the main thread, do you really need another dispatch here?  Can't you just run the mFinishCollectRunnable->Run().

If its not on the main thread, then please use WorkerPrivate's DispatchToMainThread().

So something like:

  if (NS_IsMainThread()) {
    mFinishCollectRunnable->Run();
    return;
  }

  WorkerPrivate* workerPrivate = GetCurrentThreadWorkerPrivate();
  MOZ_ALWAYS_SUCCEEDS(
    workerPrivate->DispatchToMainThread(mFinishCollectRunnable.forget()));

@@ +2079,5 @@
> +  private:
> +    ~FinishCollectRunnable()
> +    {
> +      // mHandleReport and mHandlerData are released on the main thread.
> +      AssertIsOnMainThread();

If you require this then it would be safer to use non-threadsafe ref-counting.
(In reply to Ben Kelly [:bkelly] from comment #20)

> If you are being destructed on the main thread, do you really need another
> dispatch here?

It's not on the main thread.  I'll use WorkerPrivate::DispatchToMainThread instead.

> If you require this then it would be safer to use non-threadsafe
> ref-counting.

Which ref-counting types are non-threadsafe?  I couldn't see anything in the documentation.  I guess some of them assert this already?
Flags: needinfo?(bkelly)
> > If you are being destructed on the main thread, do you really need another
> > dispatch here?
> 
> It's not on the main thread.  I'll use WorkerPrivate::DispatchToMainThread
> instead.

It can be main thread.  You have no guarantee the initial RefPtr<> is dropped before the worker executes the code.  You really need to check for both thread possibilities.

Its an annoying side effect of the WorkerRunnable API that basically there is no way to avoid this race over which thread gets to destruct the runnable.

> > If you require this then it would be safer to use non-threadsafe
> > ref-counting.
> 
> Which ref-counting types are non-threadsafe?  I couldn't see anything in the
> documentation.  I guess some of them assert this already?

You are inheriting from Runnable which uses NS_DECL_THREADSAFE_ISUPPORTS:

  https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsThreadUtils.h#256

Normally this is what you want.  In this case, though, you specifically want to ensure this object is released on the main thread.

You can get AddRef() and Release() to assert on any other thread if you extend nsIRunnable yourself and use NS_DECL_ISUPPORTS instead.  This will catch the possibility of destruction on other threads even if Release() doesn't trigger the destructor.
Flags: needinfo?(bkelly)
As you can see now, this should get review from both somebody familiar with our memory reporters (like erahm) and a DOM: workers peer (like bkelly). ;)
(In reply to Ben Kelly [:bkelly] from comment #22)
> It can be main thread.  You have no guarantee the initial RefPtr<> is
> dropped before the worker executes the code.  You really need to check for
> both thread possibilities.

Ah right, I didn't realise that.  I'll update the patch and flag you for review.
I fixed the issue with CollectReportsRunnable's destructor being possibly called on either thread, as suggested.

I tried making FinishCollectRunnable inherit nsIRunable and using NS_DECL_ISUPPORTS, but this caused assertion failures in TaskQueue::EventTargetWrapper::Dispatch because is creates an nsCOMPtr for the runnable here:

http://searchfox.org/mozilla-central/source/xpcom/threads/TaskQueue.cpp#39
Attachment #8805041 - Attachment is obsolete: true
Attachment #8806770 - Flags: review?(bkelly)
Comment on attachment 8806770 [details] [diff] [review]
bug1309929-worker-memory-reporter v3

Review of attachment 8806770 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Just a few nits.  I didn't review the parts specific to memory reporting.  I only looked at the runnable usage, etc.

Btw, I would love if all this stuff was in separate dom/workers/MemoryReporter.(h|cpp) files, but I won't make you do that here.

Thanks!

::: dom/workers/WorkerPrivate.cpp
@@ +2035,5 @@
> +
> +  class CollectReportsRunnable final : public MainThreadWorkerControlRunnable
> +  {
> +    RefPtr<FinishCollectRunnable> mFinishCollectRunnable;
> +    bool mAnonymize;

const

@@ +2042,5 @@
> +    CollectReportsRunnable(
> +      WorkerPrivate* aWorkerPrivate,
> +      nsIHandleReportCallback* aHandleReport,
> +      nsISupports* aHandlerData,
> +      bool aAnonymize,

const

@@ +2055,5 @@
> +        mFinishCollectRunnable->Run();
> +        return;
> +      }
> +
> +      WorkerPrivate* workerPrivate = GetCurrentThreadWorkerPrivate();

MOZ_ASSERT(workerPrivate)

@@ +2065,5 @@
> +  class FinishCollectRunnable final : public Runnable
> +  {
> +    nsCOMPtr<nsIHandleReportCallback> mHandleReport;
> +    nsCOMPtr<nsISupports> mHandlerData;
> +    bool mAnonymize;

const

@@ +2085,5 @@
> +      mSuccess = success;
> +    }
> +
> +  private:
> +    virtual ~FinishCollectRunnable()

This should not be virtual.  Ref counting is implemented in the base class and this class is final.
Attachment #8806770 - Flags: review?(bkelly) → review+
Comment on attachment 8806363 [details] [diff] [review]
bug1309929-fix

Review of attachment 8806363 [details] [diff] [review]:
-----------------------------------------------------------------

This change looks good. I'll leave the rest to bkelly.
Attachment #8806363 - Flags: review?(erahm) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5f1c931cdbc7
Use an async memory report for workers r=erahm r=bkelly
https://hg.mozilla.org/mozilla-central/rev/5f1c931cdbc7
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Flags: needinfo?(maierman)
You need to log in before you can comment on or make changes to this bug.