Open Bug 1342632 Opened 7 years ago Updated 2 years ago

Speed up PHttpChannelConstructor and consider streaming HttpChannelOpenArgs

Categories

(Core :: Networking: HTTP, defect, P3)

defect

Tracking

()

Performance Impact low

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, Whiteboard: [necko-backlog])

See this profile: <https://perfht.ml/2ljUbg8>

222ms spent reading synchronously from the IPC channel on the main thread for PHttpChannelConstructor.  This is the entire profile: <https://perfht.ml/2muOsnZ>  This is from a session restore with a lot of tabs, so there's a lot of HTTP traffic going on.

We should investigate whether everything we are sending is necessary, and see if we can somehow compress things.  Also we should consider splitting this up into several IPC messages in the channel open case.  HttpChannelOpenArgs <http://searchfox.org/mozilla-central/rev/c9ccb8ec1f5acda78e6344ffb87aa3a409031e3d/netwerk/ipc/NeckoChannelParams.ipdlh#144> is giant, and includes things such as request headers, so an abusive page that XHRs giant headers can jank the UI thread pretty badly.  :-(

We shouldn't be sending this as IPC message arguments.  We should probably send it over an IPC stream and read/write async to the FDs on both sides.
Actually it seems like it's reading the IPC stream that is taking a long time?
I'm having trouble understanding this profile. It claims that all the time is spent reading InputStreamParamsWithFds. But that code doesn't do very much by itself. I could see us spending a lot of time on the InputStreamParams. But that functions is really big, so it would be unexpected if it got inlined.

Is it possible you hit a page fault or something here? Did it happen at startup?
Switching to IPCStream can help with the upload body by splitting to separate buffer calls..  (I thought that was done.) Also, switching to PBackground would get the recv call off the main thread.  I think that's in progress.
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #3)
> Switching to IPCStream can help with the upload body by splitting to
> separate buffer calls..  (I thought that was done.)

The profile seems to suggest it's already using an IPCStream. I thought that was what bug 1277681 did.

> Also, switching to
> PBackground would get the recv call off the main thread.  I think that's in
> progress.

What recv call do you mean? It looks like this code is just doing serialization.
The constructor recv.  Comment 0 says the delay is in reading, so I assume we are talking about parent recv here.

If the necko actors move to PBackground then that won't happen on main thread any more.
(In reply to Bill McCloskey (:billm) from comment #2)
> Is it possible you hit a page fault or something here?

It's possible, but there is no way to be sure.

> Did it happen at startup?

It happened during session restore, but that was maybe tens of seconds after the browser had started up and loaded pages and whatnot, so not the first time the code was hit.
jason can you find someone from the team to help out.. schien maybe?
Flags: needinfo?(jduell.mcbugs)
this may just be a dup of pbackground bug 1015466 - but if not maybe there is a separate win
Flags: needinfo?(schien)
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #5)
> The constructor recv.  Comment 0 says the delay is in reading, so I assume
> we are talking about parent recv here.

Oh, I see. I somehow thought you meant the recv() syscall.

> If the necko actors move to PBackground then that won't happen on main
> thread any more.

I've been told before that PBackground is supposed to be a low-latency thread. If stuff running on PBackground takes too long, my understanding is that WebRTC audio will jank or something. Randell has more details on this.

Granted, this seems like a very poor design. But apparently it's what we have right now.
(In reply to Bill McCloskey (:billm) from comment #9)
> > If the necko actors move to PBackground then that won't happen on main
> > thread any more.
> 
> I've been told before that PBackground is supposed to be a low-latency
> thread. If stuff running on PBackground takes too long, my understanding is
> that WebRTC audio will jank or something. Randell has more details on this.

Yes, you should try not to jank the PBackground thread if you can, but its not as critical as janking the main thread.  And I think its hard to say IPC serialization work is inappropriate for an IPC specific thread.

But sure, speeding it up would be great too.  I was just trying to point out that moving off the main thread is a work-in-progress already.
It's probably a good idea to first land some telemetry to get a better sense of how much of a problem this is before writing code for it here.
Depends on: 1342635
I agree that telemetry (or other data) would be useful here.  I'm skeptical that this is a significant perf issue in the common case.

For reference: we shove everything into a single IPDL message because back when I worked on distributed computing the focus was always on sending fewer, larger messages over lots of small ones (to avoid IPC overhead, process interrupts, etc).  But admittedly jank wasn't much of an issue when crunching large scientific datasets :)

If it turns out that we want to split up these messages, it would be easy enough to do--and actually make the code cleaner.  Essentially we'd just revert bug 558623 for starters.

It does sound like PBackground is likely to make this bug go away (at least on the main thread).  SC, is that right?
(In reply to Jason Duell [:jduell] (needinfo me) from comment #12)
> I agree that telemetry (or other data) would be useful here.  I'm skeptical
> that this is a significant perf issue in the common case.

Well, the common case is one thing, but we should also have decent performance characteristics in the worst case.  So for example we should probably profile what happens if content tries to send hundreds of megabytes of data in the headers.  If that creates arbitrary large pauses in the UI, that's pretty bad.  :-)

> For reference: we shove everything into a single IPDL message because back
> when I worked on distributed computing the focus was always on sending
> fewer, larger messages over lots of small ones (to avoid IPC overhead,
> process interrupts, etc).  But admittedly jank wasn't much of an issue when
> crunching large scientific datasets :)
> 
> If it turns out that we want to split up these messages, it would be easy
> enough to do--and actually make the code cleaner.  Essentially we'd just
> revert bug 558623 for starters.

That's great to know.

> It does sound like PBackground is likely to make this bug go away (at least
> on the main thread).  SC, is that right?

Bug 1015466 seems to only be moving Necko transport to PBackground.  The channel protocol is still the same, as far as I can tell from the patches...
My patch in Bug 1015466 only moves OnStartRequest/OnDataAvailable/OnStopRequest/OnProgress/OnStatus event reporting to PBackground, so it's not going to help this bug.

In addition, since the performance profile indicated the bottle neck is solely in PNeckoParent::Read(InputStreamParamsWithFds), reverting bug 558623 is not likely to help this bug either.

IMO moving entire PHttpChannel to PBackground will make this problem less observable by end user, however we still have a slow read on IPC messaging. Figuring out why deserializing InputStraemParamsWithFds is slow will be the right thing do fix this issue.
Flags: needinfo?(schien)
(In reply to Jason Duell [:jduell] (needinfo me) from comment #12)
> If it turns out that we want to split up these messages, it would be easy
> enough to do--and actually make the code cleaner.  Essentially we'd just
> revert bug 558623 for starters.

If we can split the message to a part that needs to be delivered to start the channel and other message(s) that don't (doing just some port-start tweaks on the channel), then let's do it.  But if we just split one essential message to many that will all block opening the channel on the parent process, then I'm not sure it's a good idea.  It may improve jankiness when processes on the main thread, but may have other performance implications by delaying asyncopen call on the channel.  Delivering off main thread is probably the priority here?
Whiteboard: [necko-backlog]
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [necko-backlog] → [necko-backlog][qf:p1]
Initial telemetry data:

Start	End	IPC_WRITE_MAIN_THREAD_LATENCY_MS Count
0	1	0 (0%)
1	2	152.9k (68.24%)
2	3	27.73k (12.37%)
3	4	11.16k (4.98%)
4	6	9.64k (4.3%)
6	8	4.03k (1.8%)
8	11	3.15k (1.4%)
11	15	2.12k (0.95%)
15	21	9.14k (4.08%)
21	29	1.08k (0.48%)
29	40	985 (0.44%)
40	55	637 (0.28%)
55	75	466 (0.21%)
75	103	359 (0.16%)
103	141	263 (0.12%)
141	193	158 (0.07%)
193	265	111 (0.05%)
265	364	74 (0.03%)
364	500	24 (0.01%)
500	Infinity	37 (0.02%)

It looks like this message isn't all that bad, at least on the Nightly population.
> if we just split one essential message to many that will all block opening
> the channel on the parent process, then I'm not sure it's a good idea.

Yes, if we split the constructor back into separate constructor + asyncOpen IPDL messages, the channel doesn't open any earlier.  And since all the delay in the comment 0 trace is from one part of the serialization, we wouldn't reduce jank either.  So I don't think it's a super-promising path forward.

Given comment 16, and the fact that we have no steps to repro the long delay case, I'm wondering if we should INVALID or at least backlog this bug.  Doesn't seem to be a perf problem statistically.  What do you think Ehsan?
Flags: needinfo?(ehsan)
(Sorry for my long lag here.)

This is the latest data on Nightly:

Start	End	IPC_WRITE_MAIN_THREAD_LATENCY_MS Count
0	1	15 (0%)
1	2	11.07M (67.8%)
2	3	2.02M (12.36%)
3	4	829.16k (5.08%)
4	6	747.5k (4.58%)
6	8	316.4k (1.94%)
8	11	236.57k (1.45%)
11	15	149.93k (0.92%)
15	21	699.66k (4.29%)
21	29	70k (0.43%)
29	40	60.88k (0.37%)
40	55	42.29k (0.26%)
55	75	28.89k (0.18%)
75	103	21.23k (0.13%)
103	141	14.23k (0.09%)
141	193	8.79k (0.05%)
193	265	5.2k (0.03%)
265	364	3.12k (0.02%)
364	500	1.63k (0.01%)
500	Infinity	2.14k (0.01%)

I couldn't get data from older branches due to bug 1359156 unfortunately.

Sadly data from the Nightly channel is really inconclusive due to bug 1348591 comment 3 -- I can't predict how much of the time spent here is due to the sentinel checking.  But I agree that this is probably not a [qf:p1] issue.  I'd be happy to bump it down to P3.

That being said, this code is common enough that it seems worthwhile for someone to pick up a profiler and spend some time looking into.  See for example my investigations in bug 1347376.  We have found and fixed numerous optimization opportunities in that bug and this is even a more common code path.  Up to the Necko team how this should be prioritized, but I'd really appreciate if someone can profile common Necko hot code like this stuff a bit. :-)

Thanks!
Flags: needinfo?(ehsan)
Whiteboard: [necko-backlog][qf:p1] → [necko-backlog][qf:p3]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Keywords: perf

I don't think this bug is high priority any more, given how we serialize streams now.

Assignee: valentin.gosu → nobody
Severity: normal → S3
Flags: needinfo?(valentin.gosu)
Performance Impact: --- → P3
Whiteboard: [necko-backlog][qf:p3] → [necko-backlog]
You need to log in before you can comment on or make changes to this bug.