Open Bug 1618687 Opened 5 years ago Updated 6 months ago

[network-markers] Too much time attributed to "waiting for socket thread", need to gather more information about what happens before the start of a request

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Profile: https://perfht.ml/3cdKr25

In this profile, the first three requests spend 20ms, 12ms, and 92ms in a state which the UI exposes as "Waiting for socket thread".
This label is probably misleading. Matt Woodrow and Randell Jesup have suggested the following pieces that we might be spending this waiting time in:

  • SSL setup time
  • waiting on a slot (6 simultaneous http1 connections/server)
  • waiting for the http2 channel

If that's what these requests are waiting for, then we should collect timing data for those things and put the data into the marker.

Ah yeah, this is something we already discussed with mayhemer. Definitely we need to make it better. At the very least change the label (that would be a patch to the UI part), but we also discussed about using another structure to gather more timings than the ones we already do.
Thanks for filing !

Julien, do you know who should own this bug? Thanks!

Flags: needinfo?(felash)
Priority: -- → P3
Whiteboard: [necko-triaged]

Hey Valentin,
Clearly we'll need to work cross-team to solve this.
I believe the profiler team will eventually work on this, but this isn't planned very soon. The priority also depends on how badly our users want this. I expect that we'll be able to look at network markers again some time this year, but rather H2.
Because it blocks our meta bug it's on our radar anyway. Would you prefer that we have it in the profiler component?

Flags: needinfo?(felash)

(In reply to Julien Wajsberg [:julienw] from comment #3)

Because it blocks our meta bug it's on our radar anyway. Would you prefer that we have it in the profiler component?

I think it's fine to stay in Networking if it's Necko/Necko-related work. I just wasn't aware of the plan for this - whether it was immediate work or something else.

See Also: → 1625006

hello !
I would like to work on this .
Valentin could you please elaborate what I need to do in this ?
Thanks :)

Flags: needinfo?(valentin.gosu)

(In reply to Mahak from comment #5)

Valentin could you please elaborate what I need to do in this ?

I think this task might be a little more complex. I'll let someone else take the lead on this.

Flags: needinfo?(valentin.gosu)
Depends on: 1652515
Depends on: 1652517

https://share.firefox.dev/3aWNkV0 (from bug 1684703) is another case where 260ms are attributed to "waiting for socket thread" but all threads are idle.
Edit: In that profile it was the extension suspend that caused the delay.

Blocks: 1690235
Blocks: 1684703
No longer blocks: 1690235
No longer blocks: 1684703
See Also: → 1777538

If I understand the code correctly, waiting for socket thread is reporting the time spent from 'aStart' on first network marker call to the first entry in the network timing structure, TimingStruct. This could be to domainLookupStart or requestStart, or possibly others. Depending on what timing is populated in the TimingStruct.

On the front end: the startTime to 'waiting for socket thread' mapping.

And an example profiler marker, profiler_add_network_marker() call in HttpChannelChild::AsyncOpenInternal.

We could increase the granularity by adding more steps to the network timing structure, but I don't think its intention is to capture internal timings (like time from asyncOpen in the child process to asyncOpen in the parent process).

But from my perspective, those would be the things I would want to see. i.e. time spent between processes or waiting on the event queue.
Also URL classification, which can slow down loads and doesn't appear on this list.

Maybe the addition of an internal timing structure parameter to the profiler_add_network_marker() function could address this?

As you've seen, the current support reuses the network timing structure which isn't ideal. The timing structure is used when outputting the "end marker" (there's also a "start marker" that's a bit weird compared to the other start/end markers we have).

The long-term plan is to support marker sets (bug 1661114), that is grouping several markers together using the same index. This would make it easier to add new phases, and each phase would have its marker. However this is quite some work.

The intermediate solution would be adding new markers for the missing information. This way this would already provide useful information to developers (even if not integrated with the current markers), but in the future when we'll implement the marker sets functionality, we would also easily add the index to them.

What do you think?

Julien, that sounds very reasonable.
With this approach, the work of adding the new measurements is purely marker instrumentation to the platform code, (i.e. no profiler front end edits), which will make it easier for me and other devs working on this.
As we find missing measurements in Bug 1777538, we will add them as profiler markers. Thank you!

Severity: normal → S3

The current strategy is to add more markers (which include the channel id), and then find those markers to segment the "Waiting for Socket Thread" timing.
This can be done without marker set support.

Flags: needinfo?(acreskey)

Adding a new marker for when asyncOpen is received in the parent in Bug 1835709

Flags: needinfo?(acreskey)
See Also: → 1835709

In bug 1877525 I'm adding channelId to key events on the socket thread: transaction arrival and dispatch.
With these in place, we may be able to break up "Waiting for socket thread"

See Also: → 1913417
Duplicate of this bug: 1913417

Andrew has added a marker and added channelId's to other markers. Two questions: first, what other things fall into "startTime" currently (delays due to the sharding limit, classification, etc) can we add channelId's for? (I.e. what else can happen to a request that falls into startTime?) Second, can we get the profiler UI to grab these other markers to break up startTime/Waiting for Socket Thread?

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(felash)
Flags: needinfo?(acreskey)

Second, can we get the profiler UI to grab these other markers to break up startTime/Waiting for Socket Thread?

We can look into it, I filed a bug in https://github.com/firefox-devtools/profiler/issues/5092.

One issue we will face, I think, is that the new markers are (I presume) in the main process only or even maybe in different threads, but our marker handling pipeline is very one-thread-oriented, with network markers currently duplicated in both the content process and the main process. We'll have to think quite a bit about this issue and how we want things to look. Would that be good enough to have this breakdown in the main process only?

Flags: needinfo?(felash)

(In reply to Julien Wajsberg [:julienw] from comment #18)

Second, can we get the profiler UI to grab these other markers to break up startTime/Waiting for Socket Thread?

We can look into it, I filed a bug in https://github.com/firefox-devtools/profiler/issues/5092.

One issue we will face, I think, is that the new markers are (I presume) in the main process only or even maybe in different threads, but our marker handling pipeline is very one-thread-oriented, with network markers currently duplicated in both the content process and the main process. We'll have to think quite a bit about this issue and how we want things to look. Would that be good enough to have this breakdown in the main process only?

Yes, the new markers are on the socket thread of parent process.
Having just the main process broken down this way would definitely be good enough as the first step as that's where we have a lot of activity.

Thanks for logging this on the profiler side. If there's anything else needed with the markers, don't hesitate to ask.

Flags: needinfo?(acreskey)

Would it make sense to move these markers to the parent process? This would be done by adding an option to the marker function such as in [1]. It generally makes sense if we'd like the marker to show up even though the specific thread isn't captured. (and for this bug this could make our life easier too :-) ).

[1] https://searchfox.org/mozilla-central/rev/490a1df802d8872f996f8ef4093d54e3c854c8f9/tools/profiler/core/platform.cpp#5377

(In reply to Julien Wajsberg [:julienw] from comment #20)

Would it make sense to move these markers to the parent process? This would be done by adding an option to the marker function such as in [1]. It generally makes sense if we'd like the marker to show up even though the specific thread isn't captured. (and for this bug this could make our life easier too :-) ).

[1] https://searchfox.org/mozilla-central/rev/490a1df802d8872f996f8ef4093d54e3c854c8f9/tools/profiler/core/platform.cpp#5377

Yes; I didn't realize we had this capability and this seems like a good use as profiling the socket thread explicitly is less common.
Patch incoming.

(In reply to Julien Wajsberg [:julienw] from comment #20)

Would it make sense to move these markers to the parent process? This would be done by adding an option to the marker function such as in [1]. It generally makes sense if we'd like the marker to show up even though the specific thread isn't captured. (and for this bug this could make our life easier too :-) ).

[1] https://searchfox.org/mozilla-central/rev/490a1df802d8872f996f8ef4093d54e3c854c8f9/tools/profiler/core/platform.cpp#5377

Bug 1915700 adds the MarkerThreadId::MainThread() marker option to the three markers that include an int property, channelId.

Depends on: 1915700

Tangentially related, bug 1915717, Expose http protocol version for a request in the profiler network tab

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #17)

Andrew has added a marker and added channelId's to other markers. Two questions: first, what other things fall into "startTime" currently (delays due to the sharding limit, classification, etc) can we add channelId's for? (I.e. what else can happen to a request that falls into startTime?) Second, can we get the profiler UI to grab these other markers to break up startTime/Waiting for Socket Thread?

I think there's a bunch of things that we do before we actually go to the socket thread - such as classification or waiting for a HTTPS record, CORS, etc. I don't think we mark when these are done (at least for some of them) but we probably should.

Flags: needinfo?(valentin.gosu)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #24)

I think there's a bunch of things that we do before we actually go to the socket thread - such as classification or waiting for a HTTPS record, CORS, etc. I don't think we mark when these are done (at least for some of them) but we probably should.

Agreed - additional markers around these steps could be very useful.

One I just logged -- Expose class of service for each request in the profiler network tab.

Here's another profile where I spend 2 seconds waiting for something and I don't know for what: https://share.firefox.dev/4dN4SBs

Blocks: 1907982
You need to log in before you can comment on or make changes to this bug.