Closed Bug 1612118 Opened 5 years ago Closed 4 years ago

Experiment with high priority network events

Categories

(Core :: Networking: HTTP, task, P2)

task

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: acreskey, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file, 1 obsolete file)

Please provide a patch where select network events are on the high priority queue (for performance experiments).

Flags: needinfo?(honzab.moz)

Related to bug 1608114, which we are figuring out with Valentin.

See Also: → 1608114
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Flags: needinfo?(honzab.moz)
Priority: -- → P1
Whiteboard: [necko-triaged]

I've done one larger test with this change:

Compare fenix_medium_high_network to fenix_local (baseline):
https://docs.google.com/spreadsheets/d/1p6EabhB3pjSwpK9wB_IxzPPkOQB4gsafUcaf4pDeA1U/edit#gid=80054860

I don't see any performance improvements in there, although I will continue to test this to verify.

In other tests of this change, I couldn't reproduce any improvements to Fenix visual metrics within our noise:
See 'fenix_medium_high` here:
https://docs.google.com/spreadsheets/d/1p6EabhB3pjSwpK9wB_IxzPPkOQB4gsafUcaf4pDeA1U/edit#gid=895975584&range=O14

and here:
https://docs.google.com/spreadsheets/d/133a2JzAw4nD6gWplm1X5m0ulbEX24ySgHPzh2mCEtc4/edit#gid=925311447&range=T17

Honza, I assume that this change modifies the priority of all stream listener events, like OnStartRequest()?

Just thinking out loud, are there any other events in the pathway from channel.asyncOpen2(...); whose priority could be elevated?

Flags: needinfo?(honzab.moz)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)

Honza, I assume that this change modifies the priority of all stream listener events, like OnStartRequest()?

Just thinking out loud, are there any other events in the pathway from channel.asyncOpen2(...); whose priority could be elevated?

As I don't have a recent BT-enabled build (the patch has rotten) I could only guess.

Matt wanted to use logging to catch all places we go through and jump between thread loops. Matt, do you see more options/places to skip? Do you need any help or guidance to figure out more of them?

Flags: needinfo?(honzab.moz) → needinfo?(matt.woodrow)

(In reply to Honza Bambas (:mayhemer) from comment #6)

Matt wanted to use logging to catch all places we go through and jump between thread loops. Matt, do you see more options/places to skip? Do you need any help or guidance to figure out more of them?

I haven't seen anything obvious that we can remove, from profiling it appears that the most common wait is for IPC messages from content-process to parent-process main-thread (because the parent is doing something else).

I'm experimenting with a patch to allow speculativeConnect requests from the content process to bypass the parent main-thread (using PBackground), which seems to help.

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #7)

(In reply to Honza Bambas (:mayhemer) from comment #6)

Matt wanted to use logging to catch all places we go through and jump between thread loops. Matt, do you see more options/places to skip? Do you need any help or guidance to figure out more of them?

I haven't seen anything obvious that we can remove, from profiling it appears that the most common wait is for IPC messages from content-process to parent-process main-thread (because the parent is doing something else).

I'm experimenting with a patch to allow speculativeConnect requests from the content process to bypass the parent main-thread (using PBackground), which seems to help.

If that's so, then we may think about raising priority of some IPC messages that are related to navigation. There is no way to set a priority on the same message. We can only add a new *Priority message to the IPDL doing the same thing but with prio(high) or something and use it when we are urgent. This is sensitive, could lead to event processing flip and break some assumptions. And it's also hard to pick the right messages if we don't know which are delayed the most.

Matt, do you want to take over this bug?

Flags: needinfo?(matt.woodrow)

(In reply to Honza Bambas (:mayhemer) from comment #8)

If that's so, then we may think about raising priority of some IPC messages that are related to navigation. There is no way to set a priority on the same message. We can only add a new *Priority message to the IPDL doing the same thing but with prio(high) or something and use it when we are urgent. This is sensitive, could lead to event processing flip and break some assumptions. And it's also hard to pick the right messages if we don't know which are delayed the most.

Matt, do you want to take over this bug?

From what I saw in profiles, it looked like the wait was usually for a single in-progress task on the main-thread (like a GC), so the IPDL priority won't help.

I'm sure there are cases where this isn't true, but initial testing of using prio(high) didn't seem to move any of the number.

I think we should wait to see if we can prove any benefits before progressing with this further.

Flags: needinfo?(matt.woodrow)

Lowering priority as there no longer seems to be any push on this, but keeping this one on me to further experiment.

Priority: P1 → P2
Assignee: honzab.moz → nobody
Status: ASSIGNED → NEW

We couldn't pick up any improvements from this strategy (using PRIORITY_MEDIUMHIGH for all streamListener notifications) using our pageload tests.

I've since written some tooling to measure channel completion time, from asyncOpen to onStopRequest.

I was hoping that finer-grained timings would show improvements for this change.

However, that was not the case.
The baseline and this patch ("test") show virtually identical results.

Measuring time from asyncOpen to onStopRequest on a series of live pageloads.
MotoG5
------------------
baseline
count  8803.000000
mean    223.960468
std     526.486916
min       5.000000
10%      26.000000
20%      48.000000
40%      88.000000
50%     110.000000
60%     141.000000
80%     267.000000
90%     353.800000
max    8343.000000
------------------


------------------
test
count  9762.000000
mean    231.566585
std     569.284700
min       5.000000
10%      29.100000
20%      49.000000
40%      84.000000
50%     106.000000
60%     134.000000
80%     267.000000
90%     381.000000
max    6416.000000
------------------

Thank you for the analysis, Andrew. I guess that means we can close this bug.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
See Also: → 1693545
Attachment #9124112 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: