Closed Bug 1524609 Opened 5 years ago Closed 5 years ago

Investigate performance impact of tuning RCWN heuristics

Categories

(Core :: Performance, enhancement, P1)

Product:

Component:

Type:

enhancement

Priority:

P1

Severity:

normal

Tracking

()

Status:

RESOLVED INACTIVE

Tracking Flags:

Tracking

Status

firefox67

---

affected

People

(Reporter: acreskey, Assigned: acreskey)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

g5_0MS.png 5 years ago Andrew Creskey [:acreskey] 87.50 KB, image/png		Details
g5_50ms.png 5 years ago Andrew Creskey [:acreskey] 91.41 KB, image/png		Details
g5_WIFI.png 5 years ago Andrew Creskey [:acreskey] 93.34 KB, image/png		Details
Reference_laptop_0MS.png 5 years ago Andrew Creskey [:acreskey] 91.59 KB, image/png		Details
Reference_laptop_50ms.png 5 years ago Andrew Creskey [:acreskey] 95.39 KB, image/png		Details
Reference_laptop_WIFI.png 5 years ago Andrew Creskey [:acreskey] 100.80 KB, image/png		Details
browsertime-tests.zip 5 years ago Andrew Creskey [:acreskey] 1.54 MB, application/zip		Details

Andrew Creskey [:acreskey]

Assignee

Description

•

5 years ago

The Necko "Race Cache With Network" system makes a decision to fetch a given resource from either disk cache or from the network. This is on several heuristics (e.g. disk cache speed, resource size).

A quick test of disabling the feature (network.http.rcwn.enabled=false) shows significant potential for performance improvements in multiple raptor tp6 page load tests.

e.g. load event and hero elements 20% + faster on some sites.
 Some tests, notably instagram, appear to have regressed significantly.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&newProject=try&newRevision=225ed801cc271e0902260fb443f6dd75da173d15&framework=10&showOnlyComparable=1&selectedTimeRange=1209600

In addition, the Noise Metric (~ the sum of test std dev) dropped significantly.
e.g. windows10-64 down 59.39%. 

Caveats:
The raptor tests are run in a lab, and perhaps the low-latency network may skew the value of RCWN.
In addition, the http sessions are played back from mitmproxy recordings. 

This bug should cover the work of investigating the performance impact of RCWN tuning under "real world" conditions.

Selena Deckelmann :selenamarie :selena

Comment 1

•

5 years ago

Hey Vicky, I'm thinking perhaps we should disable this while we're exploring tuning. What do you think? Is there any reason to leave it enabled given these results?

Flags: needinfo?(vchin)

Andrew Creskey [:acreskey]

Assignee

Comment 2

•

5 years ago

Hi Selena,

One scenario where I suspect that RCWN is helping us is on machines like the reference laptops where the slow platter hard disk is perpetually chugging away.
Network resources may very well be in the cache but it may also take a very long time to retrieve them.
I don't have actual data on this though.

Bas Schouten (:bas.schouten)

Comment 3

•

5 years ago

Disabling the RCWN makes app-prod.js come in several times faster on https://www.youtube.com/tv/ based on some quick measurements (40-100ms to ~10ms), this doesn't necessarily make the site faster in this case, since there's other work to do while we're waiting for the network, but this is a bad sign. This is on a very fast machine where the network is actually very fast at getting this file, so it doesn't even seem to be helping in that case.

Andrew Creskey [:acreskey]

Assignee

Comment 4

•

5 years ago

Running a try patch on the reference hardware (-ux) that disables RCWN:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=136092e3f9435e91f4d0554d66d172cbb5f9b37a&framework=10&selectedTimeRange=604800

Not the best test because this is gainst recorded http sessions instead of live sites.

Andrew Creskey [:acreskey]

Assignee

Comment 5

•

5 years ago

These tests on the reference hardware in the lab timeout as often as they succeed.
See: https://treeherder.mozilla.org/#/jobs?repo=try&revision=136092e3f9435e91f4d0554d66d172cbb5f9b37a&selectedJob=229015294
However overall they look to improve the raptor-measured metrics.

When I start this investigation the plan is to also collect data points from these sources:
-raptor suite against live sites (:rwood has a patch to enable this)
-web page test with a script to enable the flag (live sites)
-browsertime against live sites

Vicky Chin [:vchin]

Comment 6

•

5 years ago

:selena I'd like to see the results of turning this off against live sites first as outlined in comment 5.

Flags: needinfo?(vchin)

Andrew Creskey [:acreskey]

Assignee

Comment 7

•

5 years ago

Now that raptor can run tp6 pageload tests against lives sites (Bug 1531169) I've started running tests where we should expect to see the impact of RCWN tuning.

Andrew Creskey [:acreskey]

Assignee

Comment 8

•

5 years ago

So this is a comparison between raptor live sites (Base, left) and raptor live sites with rcwn disable (New, right)

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a1ea76df63c20c933b2a5b8826363147cf647979&newProject=try&newRevision=558f44100b18bee85ec7547c504116531d904fb7&framework=10

And the same test using the reference laptop in the lab (-ux): raptor live sites (Base, left) and raptor live sites with rcwn disable (New, right):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3aa96f138ab4cd33efa8bbe9cbf55bae3f8871e7&newProject=try&newRevision=be4bbbb12e811c54aeb5947c5bb393425129e26c&framework=10

I don't think I have enough runs in yet, but the meaning of these results is not clear. Especially compared to the results from Comment 1 (recorded http sessions).

I'll run a live site vs live site comparison (no other changes) this weekend when I can get lots of jobs done quickly to get a better understanding of the baseline noise in live site comparisons.

Andrew Creskey [:acreskey]

Assignee

Comment 9

•

5 years ago

One reason I'm using the raptor tp6 page load tests is that they are actually page reload tests. The initial page load is discarded ("cold loads" coming soon to raptor). So they really exercise the cached resource codepaths.

Andrew Creskey [:acreskey]

Assignee

Comment 10

•

5 years ago

I have some results from local runs of raptor-tp6-1 (amazon,facebook,google,youtube)
Each run in raptor is 24 reloads of the page for each site. For the results below, I collected the results from 5 runs.
So 120 reloads of each site for each hardware config.

2017 Macbook Pro (great wifi), summary here:
https://docs.google.com/spreadsheets/d/1lYXjy0FiQJf-0qPXMGcDsBY9mwV-QOatblVqOm1w4ys/edit#gid=0&range=228:228

2017 Reference laptop (wired connection), summary here:
https://docs.google.com/spreadsheets/d/1lYXjy0FiQJf-0qPXMGcDsBY9mwV-QOatblVqOm1w4ys/edit#gid=1492341646&range=228:228

Although this is noisy data, it looks like RCWN is reducing loadtime on the reference laptop by ~10% (a lot!) for facebook and amazon. No effect on the other sites.

On the Macbook Pro there is some evidence that RCWN regresses loadtime on facebook and google (~10%), maybe youtube.

Andrew Creskey [:acreskey]

Assignee

Comment 11

•

5 years ago

^^ Those runs are using raptor with live sites enabled.

Selena Deckelmann :selenamarie :selena

Comment 12

•

5 years ago

Interesting! So, the Macbook Pro has an SSD, correct? Do we have telemetry on loadtime that we can correlate with disk type?

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Assignee

Comment 13

•

5 years ago

Yes, the Macbook Pro is indeed SSD. Unfortunately we don't have telemetry on disk type yet (I just logged it Bug 1533861), but it looks like from the Telemetry Environment (Windows only) I can get the hdd model which could work.
I can also run local tests like using my external spinning platter drive on my MacBook Pro.

From my understanding of the Race Cache With Network code, it makes its decisions based on the queue of items to be retrieved from the disk cache and also if that cache retrieval is getting slow. So it should handle both SSD and spinning disk.

Right now I'm trying to find a scenario where the feature provides a strong win or a strong loss so that I can compare profiles and get a better understanding of why.

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Assignee

Comment 14

•

5 years ago

So at the moment I don't think I'll be able to use results from raptor live sites on try because of the noise.

I compared tp6-1 and tp6-2 results from separate pushes of the same revision and even with high repeated job counts I'm seeing results that differ significantly.
e.g.
Amazon load time improved by 32.94% over 20 runs on OSX with no code change.
Amazon load time regressed by 27.18% on linux PGO over 25 runs with no code change.

I was expecting that as the repeat job count increased, the results from each push would converge.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c0dfc4da1fca8b3f054b96ed77f693aba21d3e5e&newProject=try&newRevision=26626d4540eb1f2e6917bcead61995db5ad7eced&framework=10

Andrew Creskey [:acreskey]

Assignee

Comment 15

•

5 years ago

I've re-run a large series of raptor tp6 tests to get a better understanding of the impact of RCWN on our current test infrastructure:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&newProject=try&newRevision=81be09264a0f3ad0092a8b2c4a8ae0f22e38c043&framework=10&selectedTimeRange=1209600

I'm not seeing the drastic improvements in performance that I saw in Comment 1, although I am still seeing the Noise Metric being reduced significantly (most prominent on linux64 where it drops ~54%).

Andrew Creskey [:acreskey]

Assignee

Comment 16

•

5 years ago

Updates:

I attempted to use the Windows Environment telemetry probe system.hdd.profile.model to compare by SSD / HDD. Unfortunately others have tried this route but the 8k+ entries make this not possible without classifying the entries.

But while looking at the telemetry I did make this observation: the NetworkDelayedRace outcome is exceptionally rare (e.g. 0.25% of outcomes) if I read this correctly.
This is the scenario where the network wins even after it's been given a delayed start.

So I’ve tried experiments which exclude or minimize this code path (since it can still incur a cost to the parent process's main thread once the the delayed nsHttpChannel is created)

1. Only RCWN if the cache is slow (otherwise just use cache)
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4a567ef71c971269481a6089b0d576252efe5a00&newProject=try&newRevision=b8f99d69619bfa841ad296a66afbbbf01af53936&framework=10
There may be some small gains here, in the 1-3% on multiple sites.

2. Don't delay network requests when racing
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4a567ef71c971269481a6089b0d576252efe5a00&newProject=try&newRevision=daa7a113d2f309137cd240428c1664400191e1f2&framework=10
This appears to significantly regress multiple sites. e.g. 6.73% on amazon (osx), 9% on microsoft (osx)

:valentin, :michal, do you think there is potential in approach 1? Perhaps if the definition of a slow cache was modified?

I did discover something interesting while stepping through the cache code though:
on android the http memory cache size was fixed at 1MB about 10 years ago. This is probably way too small for modern android devices. Logged for investigation in Bug 1536171

Flags: needinfo?(valentin.gosu)

Flags: needinfo?(michal.novotny)

Michal Novotny [:michal]

Comment 17

•

5 years ago

(In reply to Andrew Creskey from comment #16)

But while looking at the telemetry I did make this observation: the NetworkDelayedRace outcome is exceptionally rare (e.g. 0.25% of outcomes) if I read this correctly.
This is the scenario where the network wins even after it's been given a delayed start.

So I’ve tried experiments which exclude or minimize this code path (since it can still incur a cost to the parent process's main thread once the the delayed nsHttpChannel is created)

What we should try to minimize is the scenario when the cache wins when the delayed network request was triggered. Unfortunately, the current probe doesn't provide this information. CacheDelayedRace is reported regardless of whether the network request was sent or not.

1. Only RCWN if the cache is slow (otherwise just use cache)
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4a567ef71c971269481a6089b0d576252efe5a00&newProject=try&newRevision=b8f99d69619bfa841ad296a66afbbbf01af53936&framework=10
There may be some small gains here, in the 1-3% on multiple sites.

2. Don't delay network requests when racing
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4a567ef71c971269481a6089b0d576252efe5a00&newProject=try&newRevision=daa7a113d2f309137cd240428c1664400191e1f2&framework=10
This appears to significantly regress multiple sites. e.g. 6.73% on amazon (osx), 9% on microsoft (osx)

:valentin, :michal, do you think there is potential in approach 1? Perhaps if the definition of a slow cache was modified?

We definitely should not remove delayed racing because detecting slow cache is always tricky, so having some reasonable delay is good.

Flags: needinfo?(michal.novotny)

Andrew Creskey [:acreskey]

Assignee

Comment 18

•

5 years ago

Thanks for the feedback Michal.

By the way, relative to physical drive types, I verified that raptor tests on these platforms are on SSD: linux64, windows10-64, windows7-32

And these are on platter HDD: osx-10-10, windows10-64-ux

Michal Novotny [:michal]

Updated

•

5 years ago

Blocks: 1425268

Valentin Gosu [:valentin] (he/him)

Updated

•

5 years ago

Flags: needinfo?(valentin.gosu)

Vicky Chin [:vchin]

Updated

•

5 years ago

Priority: -- → P1

Andrew Creskey [:acreskey]

Assignee

Comment 19

•

5 years ago

I haven't been able to spend a lot of time on this issue but I was able to collect results from a long-running live site test.
This was run on the Acer reference laptop using the Browsertime framework:

https://paste.rs/DDT

I found these datapoints to be interesting:
-disabling RCWN lead to a 12% regression in median loadtime on the buzzfeed site, although mean firstPaint and firstContentfulPaint were improved signficantly (~40%)
-disabling RCWN lead to a 69, 61% regression in firstPaint, firstContentfulPaint on the wired site
-disabling RCWN appears to improve most metrics on the washingtonpost site

Andrew Creskey [:acreskey]

Assignee

Comment 20

•

5 years ago

As a datapoint, this is a raptor tp6 comparison of running with and without rcwn on android (Moto G5):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2ff6be5092202f8b43b1757505a4c77a8c33ae15&newProject=try&newRevision=edbf678e015090d4f9778d8e7357a3324701f50c&framework=10

I would say that the results are within the error bars of the tests.

I plan to revisit this when I can fix the network latency in a test framework. (Likely through tsproxy or similar).

Andrew Creskey [:acreskey]

Assignee

Comment 21

•

5 years ago

I ran a large scale pageload test over various fixed network conditions to try and find areas where RCWN was helping or else hindering performance. With those cases in mind the plan was to test variations on the RCWN tuning parameters.

Notes:
• These were run on the Moto G5 android device with Geckoview_Example (05/24/2019) and on the 2017 Reference Laptop (Acer-Aspire-i3) with Firefox Night 69.0a
• The web pages were recorded once and played back using Web Page Replay and Browsertime
• I used tsproxy to simulate network conditions.
• tsproxy defines its network presets [here] (https://github.com/catapult-project/catapult/blob/484f9f764dc58973a0466e4bdf1bfd50c75165e2/telemetry/telemetry/page/traffic_setting.py#L39). I used 'NONE' (0MS) , 'WIFI', and a custom setting of 50 ms rtt.
• The loadtime was measured on "warm" page loads -- i.e. the page was loaded once and then it was reloaded 25 times. It was the reload performance that were captured (to ensure more resources were in the network cache). I do have data from "cold" loads if anyone is interested.

This first round of testing was simple: baseline (RCWN on) and RCWN off.
I'll attach the raw data and boxplots generated w/ R.

Andrew Creskey [:acreskey]

Assignee

Comment 22

•

5 years ago

Attached image g5_0MS.png — Details

Moto G5 - 0ms rtt

Andrew Creskey [:acreskey]

Assignee

Comment 23

•

5 years ago

Attached image g5_50ms.png — Details

Moto G5 - 50ms rtt

Andrew Creskey [:acreskey]

Assignee

Comment 24

•

5 years ago

Attached image g5_WIFI.png — Details

Moto G5 - 'WIFI' setting (30 Mbps down and 2ms rtt)

Andrew Creskey [:acreskey]

Assignee

Comment 25

•

5 years ago

Attached image Reference_laptop_0MS.png — Details

Reference laptop - 0ms rtt

Andrew Creskey [:acreskey]

Assignee

Comment 26

•

5 years ago

Attached image Reference_laptop_50ms.png — Details

Reference laptop - 50ms rtt.

Andrew Creskey [:acreskey]

Assignee

Comment 27

•

5 years ago

Attached image Reference_laptop_WIFI.png — Details

Reference laptop - 'WIFI' preset

Andrew Creskey [:acreskey]

Assignee

Comment 28

•

5 years ago

Attached file browsertime-tests.zip — Details

Raw results from browsertime for the above charts.

Andrew Creskey [:acreskey]

Assignee

Comment 29

•

5 years ago

One thing that stands out to me:
On the reference laptop (slow platter drive) with low latency conditions (0ms and 'WIFI' (2ms)), we can see that disabling RCWN massively degrades performance and increases variance on the wired site.
Comment 25 and Comment 27

Or put another way, RCWN is very helpful in improving performance and reducing variance on that site for the reference laptop.
But note that with an added 50ms of latency I'm not seeing the performance win for that site.

Unfortunately my android and laptop pagesets were a bit different so I don't have wired on Moto G5 to compare against.

But even so I don't see a clear path to tune RCWN based on these results.
They are in many ways like the perfherder changes that Michal and I have put up -- some small wins here and there and maybe some small losses here and there.
Perhaps my pageset doesn't capture enough sites like wired that are impacted by RCWN.
Perhaps my test methodology isn't ideal.

But I do believe that my initial hypothesis in Comment 1 -- disabling RCWN leads to large performance gains -- is wrong.
Note that I was comparing my revision against mozilla-central, and not a proper baseline parent revision. (My mistake as a newbie. Mozilla-central changes significantly.)

Andrew Creskey [:acreskey]

Assignee

Comment 30

•

5 years ago

•

One more thought:
The "sterile" environment in which these tests are run (Windows with Windows Defender disabled, minimal set of processes running, 1 tab open, etc) is probably not ideal to surface cases where RCWN is helping.

A better scenario might be:
The user has 5 applications running, 10 tabs are open, OS is paging to disk, real-time virus checking is running, and Netflix is playing in a second window.
In this case it's easy to imagine RCWN being a big win even with higher latency network. But hard to test!

Randell Jesup [:jesup] (needinfo me)

Comment 31

•

5 years ago

The "sterile" environment in which these tests are run (Windows with Windows Defender disabled, minimal set of processes running, 1 tab open, etc) is probably not ideal to surface cases where RCWN is helping.

A better scenario might be:
The user has 5 applications running, 10 tabs are open, OS is paging to disk, real-time virus checking is running, and Netflix is playing in a second window.
In this case it's easy to imagine RCWN being a big win even with higher latency network. But hard to test!

Right. When we landed RCWN, we had some telemetry around it; I don't know shat is showed. The way to test this would be to land some changes and run an experiment in Nightly with A/B comparisons, and see how the telemetry differs. You can't compare on one site, but you can compare average loadtimes and cache-hit-rates/etc.

Andrew Creskey [:acreskey]

Assignee

Comment 32

•

5 years ago

I was not able to find any performance improvements by tuning RCWN.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → INACTIVE

You need to log in before you can comment on or make changes to this bug.