Open Bug 1425268 Opened 7 years ago Updated 1 year ago

Tune RCWN racing parameters (and make them pref-able)

Categories

(Core :: Networking: Cache, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: jduell.mcbugs, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

On my OSX box I'm seeing us race more than we probably need to: Total network request count: 5574 Cache won count 938 Net won count 13 That's racing almost 16% of the time, but only winning 1.3% of the time. We should probably back off on racing a bit in this case, at least. We should make a bunch of things pref-able so we can tweak them and see where the sweet spot is (maybe using Test Pilot): - What's considered "Slow" cache: There's a magic 3x that should be prefable - The 3x average cache latency before we timeout and start network race - Both of these use hard-coded 3x: make those prefs Things that are already prefs that we could play with: - currently at 500ms - cacheMaxQueueLength: currently at 8 (or 2 for priority)
Priority: -- → P2
Whiteboard: [necko-triaged]
Not sure if this bug is still valid right now. Michal, what do you think?
Flags: needinfo?(michal.novotny)
Assignee: nobody → michal.novotny
Depends on: 1524609
Flags: needinfo?(michal.novotny)
Depends on: 1537750

I tried to run tp6 talos with different RCWN preferences and the result is:

  • small_resource_size_kb has no effect (tried 64k and 128k)
  • setting min/max_wait_before_racing_ms to a very high values is equivalent to turning RCWN off
  • results for different min_wait_before_racing_ms times (other tests than listed are not affected much):
    OFF 400 200 100 50 30 15 8 4 3 2 1
    raptor-tp6-imdb-firefox pgo (windows10-64) +15.95 +14.13 +14.79 +11.33 +10.06 +12.18 +12.96 +16.27 +11.04 +12.05 +11.01 +10.31
    raptor-tp6-microsoft-firefox pgo (linux64) -10.45 -10.86 -11.62 -10.14 -11.36 -9.71 -9.71 -10.71 -8.76 -6.98 -5.46 -4.26
    raptor-tp6-wikipedia-firefox pgo (linux64) -5.56 -4.54 -5.48 -5.88 -5.49 -4.25 -5.23 -4.53 -5.29 -5.02 -5.02 -7.59
    raptor-tp6-yahoo-news-firefox pgo (linux64) -2.02 -1.76 -2.05 -2.14 -2.38 -1.45 -2.16 -2.42 -1.67 -1.99 -1.38 -2.18

It's worth to note that changing cache_queue_normal_threshold and cache_queue_priority_threshold cannot have any effect because they affect CacheFileUtils::CachePerfStats::ENTRY_OPEN which isn't used for slow cache detection.

To sum up, some tests run faster and some slower when changing RCWN parameters. Talos tests have neither realistic network traffic (all data is available immediately) nor storage (it's not shared with other applications). To tune the parameters we need to run shield studies while watching selected telemetry probes (HTTP_COMPLETE_LOAD, all RCWN probes, etc.) and maybe we need to add also some new probes.

Priority: P2 → P3

Unassigning Michal to move bugs back to triage pool.

Assignee: michal.novotny → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.