Label users of pipes (nsPipeInputStream)

RESOLVED WONTFIX

Status

()

enhancement
P3
normal
RESOLVED WONTFIX
2 years ago
2 years ago

People

(Reporter: billm, Assigned: bevis)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 15 obsolete attachments)

34.59 KB, application/zip
Details
nsPipeInputStream::AsyncWait is in the top 5 unlabeled runnables. We need to figure out where it's mainly used from an label those places.
Are you suggesting changing nsIAsyncInputStream API to support labeling?  I'm concerned with getting this to scale since we do lots of streams stuff all over, including using helpers like NS_AsyncCopy(), etc.  I guess I'd like to see the patch.
First I'd like to understand what the most common uses are (by frequency with which they occur at runtime).
Our IPCStream stuff probably uses it a lot now that we have it integrated in necko:

http://searchfox.org/mozilla-central/source/ipc/glue/IPCStreamSource.cpp#279
We're also about to start using it with ReadableStream.
See Also: → 1359706
API change makes not much sense just for the naming things.
I'll print some logs on all the callsite of nsIAsyncInputStream::Wait() locally first instead to figure out who is the high frequent one on the main thread of the content process.
Assignee: nobody → btseng
Posted patch Patch for Logging (obsolete) — Splinter Review
Posted file Logs for analysis (obsolete) —
After further breaking down to the log in comment 7(FB, GDoc, YouTube, NYTimes, Wiki, Yahoo are visited), 
1. All the unlabeled nsIAsyncInputStream::Wait() requests are found in nsPipeInputStream. 
   - Search the keywords of "could be unlabeled", you'll see that nsInputStreamPump::EnsureWaiting() is the entry point to call
     nsIAsyncInputStream::Wait() without nsIEventTarget or with nsIEventTarget which is not a SchedulerEventTarget instance. 
     This is fine if the eventTarget is actually aother background thread.
2. nsBaseChannel::BeginPumpingData() and mozilla::dom::workers::serviceWorkerScriptCache::(anonymous namespace)::CompareCache::ManageValueResult() are the 2 entry points to call nsInputStreamPump::AsyncRead without valid SchedulerEventTarget which could be when calling NS_NewInputStreamPump:
   - Search the keywords of "has no mLabeledMainThreadTarget" you can see only these 2 entry points printed in the attached log.
   - The frequency is pretty high in nsBaseChannel::BeginPumpingData() when loading web page in a new tab even though the 
     labeling was done in nsContentUtils::GetEventTargetByLoadInfo() which tries to return a SchedulerEventTarget when possible.
     However, we failed to get the nsGlobalWindow with the outerWindowId retrieved from nsILoadInfo.
   - The serviceWorkerScriptCache is obvious not labeled because the nsIEventTarget is not specified when calling NS_NewInputStreamPump().

:billm, is it reasonable that we failed to get the nsGlobalWindow from nsContentUtils::GetEventTargetByLoadInfo() in this frequency especially when loading from a new tab?

:bkelly, nsInputStreamPump::AsyncRead() in CompareCache::ManageValueResult() is called on the main thread. Do you have any suggestion to associate this nsInputStreamPump to a SchedulerEventTarget of the window?

BTW, you could check the change in nsDebugImpl.cpp and nsInputStreamPump::AsyncRead() and the log to know how I narrow down the unlabeled caller to nsBaseChannel::BeginPumpingData() and CompareCache::ManageValueResult().
Flags: needinfo?(wmccloskey)
Flags: needinfo?(bkelly)
(In reply to Bevis Tseng [:bevis][:btseng] from comment #8)
> BTW, you could check the change in nsDebugImpl.cpp and
> nsInputStreamPump::AsyncRead() and the log to know how I narrow down the
> unlabeled caller to nsBaseChannel::BeginPumpingData() and
> CompareCache::ManageValueResult().
1. Add logs to all callers and callees of nsIAsyncInputStream::Wait() and find out that nsInputStreamPump::EnsureWaiting() is the only one that could be unlabeled.
2. Add logs to all the callers and calles of nsInputStreamPump::AsyncRead() to figure out which caller is not labeled.
(In reply to Bevis Tseng [:bevis][:btseng] from comment #8)
> :bkelly, nsInputStreamPump::AsyncRead() in CompareCache::ManageValueResult()
> is called on the main thread. Do you have any suggestion to associate this
> nsInputStreamPump to a SchedulerEventTarget of the window?

This code is explicitly not associated with a window.  It should be in system group.
Flags: needinfo?(bkelly)
> :billm, is it reasonable that we failed to get the nsGlobalWindow from nsContentUtils::GetEventTargetByLoadInfo() in this
> frequency especially when loading from a new tab?

I don't know. It looks like the printing you added for GetEventTargetByLoadInfo doesn't check whether we're in the content process, so some of the logging is probably coming from chrome stuff that we don't care about.

Anyway, it seems like passing SystemGroup to NS_NewInputStreamPump in CompareCache::ManageValueResult would be a good start here. Once that's done, we can see if telemetry shows any improvement.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #11)
> > :billm, is it reasonable that we failed to get the nsGlobalWindow from nsContentUtils::GetEventTargetByLoadInfo() in this
> > frequency especially when loading from a new tab?
> 
> I don't know. It looks like the printing you added for
> GetEventTargetByLoadInfo doesn't check whether we're in the content process,
> so some of the logging is probably coming from chrome stuff that we don't
> care about.
Oh, but the unlabeled used of nsBaseChannel::BeginPumpingData() accompanied with invalid eventTarget from nsContentUtils::GetEventTargetByLoadInfo() is relatively higher than the unlabeled NS_NewInputStreamPump() in CompareCache::ManageValueResult()
I wasn't able to reproduce the calls to BeginPumpingData. Are there particular pages that trigger it? I also wonder if it has to do with extensions you have installed. It's possible that content scripts could be triggering that code. Did you have any extensions installed when you tested?
(In reply to Bill McCloskey (:billm) from comment #13)
> I wasn't able to reproduce the calls to BeginPumpingData. Are there
> particular pages that trigger it? I also wonder if it has to do with
> extensions you have installed. It's possible that content scripts could be
> triggering that code. Did you have any extensions installed when you tested?

No, there is no extension installed.
Besides, I saw nsBaseChannel::BeginPumpingData() unlabeled in 2 scencarios:
1. Type any url and load manually in a new tab. (I've tried FB, GDoc, YouTube, NYTimes, Wiki and Yahoo.)
2. Right click a link -> Open in new Tab.

The unlabeled CompareCache::ManageValueResult() can be found when watching video in Youtube.

1. Load Youtube in 1st Tab:
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from void mozilla::dom::workers::serviceWorkerScriptCache::(anonymous namespace)::CompareCache::ManageValueResult(JSContext *, JS::Handle<JS::Value>): 1152 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from void mozilla::dom::workers::serviceWorkerScriptCache::(anonymous namespace)::CompareCache::ManageValueResult(JSContext *, JS::Handle<JS::Value>): 1152 has no mLabeledMainThreadTarget for AsyncRead.

2. Open a new Tab to load Wikipedia:
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.

3. Right click a video in YouTube and select Open in a new Tab:
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from nsresult nsBaseChannel::BeginPumpingData(): 283 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from void mozilla::dom::workers::serviceWorkerScriptCache::(anonymous namespace)::CompareCache::ManageValueResult(JSContext *, JS::Handle<JS::Value>): 1152 has no mLabeledMainThreadTarget for AsyncRead.
AsyncRead from void mozilla::dom::workers::serviceWorkerScriptCache::(anonymous namespace)::CompareCache::ManageValueResult(JSContext *, JS::Handle<JS::Value>): 1152 has no mLabeledMainThreadTarget for AsyncRead.
Posted patch Patch for Logging (obsolete) — Splinter Review
Print error in nsContentUtils::GetEventTargetByLoadInfo only if we are on the main thread of a content process.
BTW, my patch is based on the revision of https://hg.mozilla.org/mozilla-central/rev/9af23c413a1f
(In reply to Bevis Tseng [:bevis][:btseng] from comment #14)
> Besides, I saw nsBaseChannel::BeginPumpingData() unlabeled in 2 scencarios:
> 1. Type any url and load manually in a new tab. (I've tried FB, GDoc,
> YouTube, NYTimes, Wiki and Yahoo.)
> 2. Right click a link -> Open in new Tab.

I added a printf here:
http://searchfox.org/mozilla-central/rev/17ebac68112bd635b458e831670c1e506ebc17ad/netwerk/base/nsInputStreamPump.cpp#386
like you have in your patch. It fired twice on content process startup, when it loaded these two images:
http://searchfox.org/mozilla-central/rev/17ebac68112bd635b458e831670c1e506ebc17ad/layout/generic/nsImageFrame.cpp#2295-2296

I couldn't get it to fire anymore after that. Can you try setting your content process count to 1 and then see if you still see this a lot? It's possible you're seeing a lot of content processes starting, which hopefully is not very common in the real world.


> The unlabeled CompareCache::ManageValueResult() can be found when watching
> video in Youtube.

Yeah, I can reproduce that one. We should label fix it.
(In reply to Bill McCloskey (:billm) from comment #17)
> I added a printf here:
> http://searchfox.org/mozilla-central/rev/
> 17ebac68112bd635b458e831670c1e506ebc17ad/netwerk/base/nsInputStreamPump.
> cpp#386
> like you have in your patch. It fired twice on content process startup, when
> it loaded these two images:
> http://searchfox.org/mozilla-central/rev/
> 17ebac68112bd635b458e831670c1e506ebc17ad/layout/generic/nsImageFrame.
> cpp#2295-2296
> 
> I couldn't get it to fire anymore after that. Can you try setting your
> content process count to 1 and then see if you still see this a lot? It's
> possible you're seeing a lot of content processes starting, which hopefully
> is not very common in the real world.
Thanks for pointing out this. The unlabeled BeginPumpingData() is related to the number of content processes and these 2 images to be loaded!
> 
> > The unlabeled CompareCache::ManageValueResult() can be found when watching
> > video in Youtube.
> 
> Yeah, I can reproduce that one. We should label fix it.

Sure, bug 1377275 has been filed to address this.

I'll keep using this build in these days to see if there is any other unlabeled nsPipeInputStream::AsyncWait. :-)
Attachment #8882324 - Attachment is obsolete: true
There are several call sites of NS_NewInputStreamPump():
http://searchfox.org/mozilla-central/search?q=NS_NewInputStreamPump&case=false&regexp=false&path=
1. Unlabeled:
   - (anonymous namespace)::CacheScriptLoader::ResolvedCallback in dom/workers/ScriptLoader.cpp
   - nsJARChannel::OpenLocalFile(), nsJARChannel::OnDownloadComplete
   - nsZipWriter::BeginProcessingAddition()
   - nsZipWriter::BeginProcessingRemoval()
   - mozilla::net::ExtensionStreamGetter::OnStream() in netwerk/protocol/res/ExtensionProtocolHandler.cpp
   - nsWyciwygChannel::ReadFromCache()
2. Labeled:
   - serviceWorkerScriptCache::(anonymous namespace)::CompareCache::ManageValueResult() in bug 1377275.
   - FetchBodyConsumer<Derived>::BeginConsumeBodyMainThread() in bug 1363318.

I haven't hit the remained unlabeled ones so far.
Not sure in which scenario will these call sites be triggered.
Depends on: 1377275
Depends on: 1377484
I've seen 
"AsyncWait called from nsresult mozilla::dom::FileReader::DoAsyncWait(): 272 could be unlabeled"
several times in today's trial.

It's a little bit surprising me because the FileReader::mTarget for the ::AsyncWait() at[1] is supposed to be a SchedulerEventTarget retrieved at [2] when FileReader is created on the main thread.

I'll do more investigation to see if this is a false alarm.

[1] http://searchfox.org/mozilla-central/rev/dde5c480358718607cc40d752656c968a0e6eabd/dom/file/FileReader.cpp#274
[2] http://searchfox.org/mozilla-central/rev/dde5c480358718607cc40d752656c968a0e6eabd/dom/file/FileReader.cpp#120
(In reply to Bevis Tseng [:bevis][:btseng] from comment #19)
>    - mozilla::net::ExtensionStreamGetter::OnStream() in netwerk/protocol/res/ExtensionProtocolHandler.cpp
This has been addressed in bug 1377544.
>    - nsWyciwygChannel::ReadFromCache()
WyciwygChannel is run in parent process. So we can skip this.
Depends on: 1377544
(In reply to Bevis Tseng [:bevis][:btseng] from comment #19)
> There are several call sites of NS_NewInputStreamPump():
> http://searchfox.org/mozilla-central/
> search?q=NS_NewInputStreamPump&case=false&regexp=false&path=
> 1. Unlabeled:
>    - nsJARChannel::OpenLocalFile(), nsJARChannel::OnDownloadComplete
>    - nsZipWriter::BeginProcessingAddition()
>    - nsZipWriter::BeginProcessingRemoval()

Hi, :aklotz

We are trying to specify proper nsIEventTarget for the use of NS_NewInputStreamPump() on the main thread of content processes to get more benefit from Quantum-DOM scheduling. [1]
There are several unlabeled use cases in modules/libjar as mentioned above.

Can you help to check if this are the candidates that need to be labeled? [2]
(If yes, it would be great if someone familiar with modules/libjar could help us or give us suggestion to label it.)

Thanks!

[1] https://hacks.mozilla.org/2017/06/an-inside-look-at-quantum-dom-scheduling/
[2] https://wiki.mozilla.org/Quantum/DOM#Q3:_What_runnables_shall_be_labeled.3F
Flags: needinfo?(aklotz)
(In reply to Bevis Tseng [:bevis][:btseng] from comment #20)
> I've seen 
> "AsyncWait called from nsresult mozilla::dom::FileReader::DoAsyncWait(): 272
> could be unlabeled"
> several times in today's trial.
> 
> It's a little bit surprising me because the FileReader::mTarget for the
> ::AsyncWait() at[1] is supposed to be a SchedulerEventTarget retrieved at
> [2] when FileReader is created on the main thread.
> 
> I'll do more investigation to see if this is a false alarm.
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> dde5c480358718607cc40d752656c968a0e6eabd/dom/file/FileReader.cpp#274
> [2]
> http://searchfox.org/mozilla-central/rev/
> dde5c480358718607cc40d752656c968a0e6eabd/dom/file/FileReader.cpp#120

After further breaking down, I found that the unlabeled use case of the ::AsyncWait() in FileReader::DoAsyncWait() was triggered from backgroundPageThumbsContent.js instead of a web content(The web content one is fine after testing, btw):
http://searchfox.org/mozilla-central/rev/a3a739de04ee6134c11546568a33dbb6a6a29907/toolkit/components/thumbnails/content/backgroundPageThumbsContent.js#165,177

As explained in comment 20, the FileReader::mTarget is supposed to be a SchedulerEventTarget.
However, when creating FileReader from backgroundPageThumbsContent.js, because the pass-in nsIGlobalObject is not a nsGlobalWindow, it falls back to the one in DispatcherTrait::EventTargetFor() when calling nsIGlobalObject::EventTargetFor().

I haven't figured out the STR of this use case yet but it's the only unlabeled ::AsyncWait() call I have seen so far after bug 1377275 and bug 1377484 are fixed and the nsBaseChannel::BeginPumpingData() is clarified.

:billm, do you have any suggestion to label this use case?
Flags: needinfo?(wmccloskey)
(In reply to Bevis Tseng [:bevis][:btseng] from comment #22)
> Can you help to check if this are the candidates that need to be labeled? [2]
> (If yes, it would be great if someone familiar with modules/libjar could
> help us or give us suggestion to label it.)

Unfortunately the original authors of libjar have left Mozilla, so we have lost a lot of expertise. As a peer I do libjar reviews but I do not admit to being very knowledgeable about all aspects of libjar's code.

Sorry but I don't think that I know enough about the code in question to have anything to add. It might be a good idea to check with the Necko people to see if any of them have any advice.

(I am happy to review any libjar patches, however!)
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #24)
> (In reply to Bevis Tseng [:bevis][:btseng] from comment #22)
> > Can you help to check if this are the candidates that need to be labeled? [2]
> > (If yes, it would be great if someone familiar with modules/libjar could
> > help us or give us suggestion to label it.)
> 
> Unfortunately the original authors of libjar have left Mozilla, so we have
> lost a lot of expertise. As a peer I do libjar reviews but I do not admit to
> being very knowledgeable about all aspects of libjar's code.
> 
> Sorry but I don't think that I know enough about the code in question to
> have anything to add. It might be a good idea to check with the Necko people
> to see if any of them have any advice.
> 
> (I am happy to review any libjar patches, however!)

No problem! Thanks for your feedback! :)

BTW, after further study, I found that we don't have to care about the use of nsJarChannel in content process because it has been disabled after FF55:
https://developer.mozilla.org/en-US/docs/Mozilla/Security/Security_and_the_jar_protocol

I'll keep reviewing the use of nsZipWriter to see if labeling on it is necessary.
(In reply to Bevis Tseng [:bevis][:btseng] from comment #19)
> 1. Unlabeled:
>    - nsZipWriter::BeginProcessingAddition()
>    - nsZipWriter::BeginProcessingRemoval()
The use case of ZipWriter can only be found in the WebIDE of devtools which is only used for the App developmement in Firefox OS. Hence, we could ingore this one for now. :)
http://searchfox.org/mozilla-central/source/devtools/shared/apps/app-actor-front.js#82-93
(In reply to Bevis Tseng [:bevis][:btseng] from comment #23)
> I haven't figured out the STR of this use case yet but it's the only
> unlabeled ::AsyncWait() call I have seen so far after bug 1377275 and bug
> 1377484 are fixed and the nsBaseChannel::BeginPumpingData() is clarified.
> 
> :billm, do you have any suggestion to label this use case?

After further investigation, I found that the nsIGlboalObject is a TabChildGlobal when creating FileReader from backgroundPageThumbsContent.js.
Hence, we could override the methods defined in DispatcherTrait class using TabChildGlobal::mTabChild::TabGroup().
(Maybe we could move the declaration of ::TabGroup() from TabChild to TabChildBase for the access from TabChildGlobal.)
Depends on: 1378298
The problem has been addressed in bug 1378298. Cancel the NI flag.
Flags: needinfo?(wmccloskey)
The labeling of nsPipeInputStream looks fine in today's trial after bug 1378298 is fixed.
There is only one false alarm found in nsInputStreamPump::EnsureWaiting() because the nsInputStreamPump::mTargetThread could be retargeted by nsInputStreamPump::RetargetDeliveryTo() with nsStreamTransportService specified which is fine.

Hope the labeling of this runnable could be improved significantly in next telemetry result after bug 1378298 is landed.
Unfortunately, we didn't label too much use case of nsPipeInputStream::AsyncWait according to the telemetry analysis where(appBuildId=lambda x: x.startswith('20170709') or x.startswith('20170710').where(appUpdateChannel="nightly"):
1. (u'nsPipeInputStream::AsyncWait', 1486),
2. (u'nsPipeInputStream::AsyncWait(labeled)', 19),

This indicates that my daily trial didn't cover most use cases of nsPipeInputStream::AsyncWait in nightly build. :\

BTW, the unlabeled runnables higher than nsPipeInputStream::Async are:
 (u'PVsync::Msg_Notify', 7226),
 (u'IdleRunnable', 5457),
 (u'CollectorRunner', 5060),
 (u'PCompositorBridge::Msg_DidComposite', 4184)
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #1)
> Are you suggesting changing nsIAsyncInputStream API to support labeling? 
> I'm concerned with getting this to scale since we do lots of streams stuff
> all over, including using helpers like NS_AsyncCopy(), etc.  I guess I'd
> like to see the patch.

It seems unavoidable to have some API change to identify the users of nsPipeInputStream after several trials to identify the callers from local build.

However, instead of changing nsIAsyncInputStream API, I think we could at least name the caller of NS_NewPipe2() from native and nsPipeConstructor() from JS and append the caller's name at [1] to identify the callers from telemetry as what we have done to name the callers of "AbstractThead::Runner" and "ProxyReleaseEvent".

[1] http://searchfox.org/mozilla-central/rev/5dadcbe55b4ddd1e448c06c77390ff6483aa009b/xpcom/io/nsPipe3.cpp#1496
(In reply to Bevis Tseng [:bevis][:btseng] from comment #31)
> However, instead of changing nsIAsyncInputStream API, I think we could at
> least name the caller of NS_NewPipe2() from native and nsPipeConstructor()
> from JS and append the caller's name at [1] to identify the callers from
> telemetry as what we have done to name the callers of
> "AbstractThead::Runner" and "ProxyReleaseEvent".
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 5dadcbe55b4ddd1e448c06c77390ff6483aa009b/xpcom/io/nsPipe3.cpp#1496

The only concern is that if the relationship between the caller of NS_NewPipe2 and the caller of nsIAsyncInputStream::Await is too weak, then we'll still have to do some API change on nsIAsyncInputStream eventually to identify the use of "nsPipeInputStream::AsyncWait".
So after looking at this stuff a bit more I have a new question.

AsyncWait() already supports passing an event target.  Is this bug just about making sure the right TabGroup event target is passed there instead of NS_GetCurrentThread()/NS_GetMainThread()?

And yea, I don't think you can fix this by changing the pipe constructor methods.  There can be any number of listeners attached to the same pipe and its really those call sites that determine how these runnables are created.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #33)
> So after looking at this stuff a bit more I have a new question.
> 
> AsyncWait() already supports passing an event target.  Is this bug just
> about making sure the right TabGroup event target is passed there instead of
> NS_GetCurrentThread()/NS_GetMainThread()?
Correct, but it's time-consuming to label all the call sites of nsIAsyncInputStream::AsyncWait:
http://searchfox.org/mozilla-central/search?q=symbol:_ZN19nsIAsyncInputStream9AsyncWaitEP22nsIInputStreamCallbackjjP14nsIEventTarget%2C%23asyncWait&redirect=false
> And yea, I don't think you can fix this by changing the pipe constructor
> methods.  There can be any number of listeners attached to the same pipe and
> its really those call sites that determine how these runnables are created.
The runnable of "nsPipeInputStream::AsyncWait" is only created when calling nsIAsyncInputStream::AsyncWait on a nsPipeInputStream instance only if a valid nsIEventTarget is also specified:
http://searchfox.org/mozilla-central/source/xpcom/io/nsPipe3.cpp#1496
and the nsPipeInputStream is only created in nsPipe's ctor.
Changing the signature of nsIAsyncInputStream::AsyncWait with call site's name could help us to identify the users directly but the places to be changed are a lot in both native code and JS:
http://searchfox.org/mozilla-central/search?q=symbol:_ZN19nsIAsyncInputStream9AsyncWaitEP22nsIInputStreamCallbackjjP14nsIEventTarget%2C%23asyncWait&redirect=false
Similar issues if we change the signature of nsPipe creators:
http://searchfox.org/mozilla-central/search?q=NewPipe&case=false&regexp=false&path=
http://searchfox.org/mozilla-central/search?q=%40mozilla.org%2Fpipe%3B1&case=false&regexp=false&path=

If changing the API signaure from 
    void asyncWait(in nsIInputStreamCallback aCallback,
                   in unsigned long aFlags,
                   in unsigned long aRequestedCount,
                   in nsIEventTarget aEventTarget);
to
    void asyncWait(in ACString aCallSite,
                   in nsIInputStreamCallback aCallback,
                   in unsigned long aFlags,
                   in unsigned long aRequestedCount,
                   in nsIEventTarget aEventTarget);
is acceptable, then I'd like to add this change instead to identify the users from the telemetry easily.

If the |aCallSite| is annoying, we could remove it later once we identified all the users of "nsPipeInputStream::AsyncWait".

How do you think?
Flags: needinfo?(bkelly)
(In reply to Bevis Tseng [:bevis][:btseng] from comment #34)
> If changing the API signaure from 
>     void asyncWait(in nsIInputStreamCallback aCallback,
>                    in unsigned long aFlags,
>                    in unsigned long aRequestedCount,
>                    in nsIEventTarget aEventTarget);
> to
>     void asyncWait(in ACString aCallSite,
>                    in nsIInputStreamCallback aCallback,
>                    in unsigned long aFlags,
>                    in unsigned long aRequestedCount,
>                    in nsIEventTarget aEventTarget);
> is acceptable, then I'd like to add this change instead to identify the
> users from the telemetry easily.
> 
> If the |aCallSite| is annoying, we could remove it later once we identified
> all the users of "nsPipeInputStream::AsyncWait".
> 
> How do you think?

After further study, I think we could just make nsIInputStreamCallback inherit from nsINamed instead and have GetName method implemented in each nsIInputStreamCallback.
The only concern is that there is no syntax sugar to implement nsIInputStreamCallback as a JS function.
It shall be easier to identify the caller of nsPipeInputStream::AsyncWait() by querying the name of the nsIInputStreamCallback instance by overriding GetName() in nsInputStreamReadyEvent class without any API signature change.

TODO:
1. Finish the nsINamed implementation in all nsIInputStreamCallback instance.
2. Override GetName() in nsInputStreamReadyEvent to append the name of nsIInputStreamCallback.
3. Try to concatenate the nsIInputStreamCallback names because some AsyncInputStream wrappers override the nsIInputStreamCallback to itself in ::AsyncWait() implementation. This prevents us identifying the real caller.
4. Name the caller of nsInputStreamPump::AsyncRead with similar solution.
Attachment #8882328 - Attachment is obsolete: true
Attachment #8882435 - Attachment is obsolete: true
Flags: needinfo?(bkelly)
AIUI naming stuff is just one step towards the ultimate goal of getting the right event target.  Right?  Changing a lot of callsites to pass a name seams a similar amount of work to changing a lot of callsites to pass a different event target.

For example, all the js code doing some form of:

  Services.tm.currentThread

Don't all of these call sites have an nsIGlobalObject in place?  (I guess I assume they have a global since they are running js, but maybe there is some weird legacy mode...) Can we bulk convert them to calling nsIGlobalObject::EventTargetFor() in some way?

The c++ call sites I would change in this way:

Convert to SystemGroup::EventTargetFor():
* http://searchfox.org/mozilla-central/source/dom/file/ipc/IPCBlobInputStream.cpp#308
* http://searchfox.org/mozilla-central/source/dom/flyweb/HttpServer.cpp
* http://searchfox.org/mozilla-central/source/dom/presentation/PresentationTCPSessionTransport.cpp
* http://searchfox.org/mozilla-central/source/gfx/layers/LayerScope.cpp

Convert to nsIGlobalObject::EventTargetFor():
* http://searchfox.org/mozilla-central/source/dom/network/TCPSocket.cpp#204

These are already labeled on main thread:
* http://searchfox.org/mozilla-central/source/dom/file/FileReader.cpp#271

Not called on main thread:
* http://searchfox.org/mozilla-central/source/dom/indexedDB/ActorsChild.cpp#3526 (STS)
* http://searchfox.org/mozilla-central/source/dom/indexedDB/ActorsParent.cpp#29610 (STS)
* http://searchfox.org/mozilla-central/source/dom/workers/FileReaderSync.cpp#455 (STS)
* http://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpTransaction.cpp#764
* http://searchfox.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp

I skipped call sites that were just pass-through due to wrapping nsIAsyncInputStream or were passing nullptr for that target.

Give the c++ call sites I would guess the large volume of runnables is coming from one of the js call sites.  So getting a browser chrome exposed self.eventTargetFor(TaskCategory.other) would be really helpful there.
Flags: needinfo?(wmccloskey)
Hi Bevis. Today I started an official Nightly in a fresh profile with processCount=1, browsed around a bit, and then looked at about:telemetry. I saw over a thousand unlabeled nsPipeInputStream::AsyncWait calls. I wonder if local testing might find more stuff. One thing I wonder is whether things work differently when you do |make package|. Perhaps we do a lot of work reading from the OmniJAR using pipes?
Flags: needinfo?(wmccloskey)
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #38)
Thanks for breaking this down.
This analysis helps a lot if we have to label anyone of them.
To prevent any regression and longer debugging session after labeling these at once, I'd like to try to name them first and then label the necessary one according to the telemetry result.

The worse case is the JS one that need to be labeled which I don't have too much knowhow about exposing API from a internal nsIGlobalObject to the internal js.
(In reply to Bill McCloskey (:billm) from comment #39)
> Hi Bevis. Today I started an official Nightly in a fresh profile with
> processCount=1, browsed around a bit, and then looked at about:telemetry. I
> saw over a thousand unlabeled nsPipeInputStream::AsyncWait calls. I wonder
> if local testing might find more stuff. One thing I wonder is whether things
> work differently when you do |make package|. Perhaps we do a lot of work
> reading from the OmniJAR using pipes?

Wow! I was not aware of the difference of the official build and my local build.
I just checked the telemetry of my nightly and saw over 7k unlabeled nsPipeInputStream::AsyncWait!
I'll try |make package| later to see if I could reproduce this locally.
Thanks for these information!
Regarding the JS stuff, it's not as easy as labeling based on the global. All the code will be using a system global, which isn't much help. We don't know if the chrome code is going to touch content at some point.

Most of the JS stuff doesn't look that important to me except for the devtools stuff. I don't understand what that's for. It might be used a lot when devtools are open. I think we might be able to use the SystemGroup there. It probably makes sense to have something like Services.tm.systemGroupEventTarget (which would use TaskCategory::Other).
Make this change to allow the nsINamed-implemented nsIInputStreamCallback in part 2 and part 3 to be displayed in the telemetry.
Note: We reuse nsIRequest::GetName() is available.
Attachment #8885668 - Attachment is obsolete: true
Attachment #8886108 - Flags: review?(wmccloskey)
Comment on attachment 8886108 [details] [diff] [review]
(v1) Part 1: Append the CallSite name to nsInputStreamReadyEvent::GetName().

treeherder looks NG.
Attachment #8886108 - Flags: review?(wmccloskey)
Comment on attachment 8886109 [details] [diff] [review]
(v1) Part 2: Have nsINamed interface implemented by all C++ instances of nsIInputStreamCallback.

treeherder looks NG.
Attachment #8886109 - Flags: review?(wmccloskey)
Comment on attachment 8886110 [details] [diff] [review]
(v1) Part 3: Have nsINamed interface implemented by all JS instances of nsIInputStreamCallback.

treeherder looks NG.
Attachment #8886110 - Flags: review?(wmccloskey)
Hi Bill, I've tried to make the call sites of nsIAsyncInputStream::AsyncWait() visible from the telemetry with these changes.
Although there are still some bugs to be fixed, I'd like to have your earlier feedback about the patches noted below, thanks!

Part 1:
 - To append the call site name in nsInputStreamReadyEvent::GetName() by QI(nsINamed)
Part 4:
 - Because there are too many call sites of nsIputStreamPump::AsyncRead(), this patch to to append the additional call site in nsIputStreamPump::GetName()
Part 5:
 - Some call site of nsIputStreamPump::AsyncRead() use NS_NewStreamLoader to create a listener.
 - This patch is to allow the call site to call ::SetName of this created listener.
Part 7:
 - Some of the listeners override GetName() from nsIRequest, so I add nsIAlteredNamed idl interface to provide the class name properly.
Flags: needinfo?(wmccloskey)
I spent pretty much all day debugging the problem I described in comment 39. It turns out there is a really major bug in telemetry that I suspect will have a major influence on our telemetry data, including this bug. Let's wait to do anything here until bug 1380880 is fixed and we get new data.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #58)
> I spent pretty much all day debugging the problem I described in comment 39.
> It turns out there is a really major bug in telemetry that I suspect will
> have a major influence on our telemetry data, including this bug. Let's wait
> to do anything here until bug 1380880 is fixed and we get new data.
OMG, great to know that!
The unlabeled count of this runnable is much lower down in the ranking now after bug 1380880 was fixed.
Backup the patches that is ready for the review here for revisiting later if needed.
Attachment #8886203 - Attachment is obsolete: true
Attachment #8886204 - Attachment is obsolete: true
Attachment #8886205 - Attachment is obsolete: true
Attachment #8886206 - Attachment is obsolete: true
Attachment #8886207 - Attachment is obsolete: true
Attachment #8886208 - Attachment is obsolete: true
Attachment #8886209 - Attachment is obsolete: true
Attachment #8886210 - Attachment is obsolete: true
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.