Closed Bug 1655840 Opened 5 years ago Closed 5 years ago

Confidence/replayed/not-replayed metrics are not being calculated correctly

Tracking

(firefox84 fixed)

Status:

RESOLVED FIXED

Milestone:

84 Branch

Tracking Flags:

Tracking

Status

firefox84

---

fixed

People

(Reporter: sparky, Assigned: kanishk509, Mentored)

References

Details

(Keywords: good-first-bug)

User Story

See here for information on how to contribute: https://wiki.mozilla.org/TestEngineering/Performance/NewContributors

Attachments

(1 file)

Bug 1655840 - Add new mozproxy metrics replay-confidence, and recording-proportion-used. r=sparky 5 years ago Kanishk 47 bytes, text/x-phabricator-request		Details \| Review

Greg Mierzwinski [:sparky]

Reporter

Description

•

5 years ago

See this graph: https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&highlightedRevisions=5a874f1887ff&series=autoland,2396440,1,10&series=autoland,2396442,1,10&series=autoland,2396439,1,10&timerange=5184000&zoom=1591223605643,1591227721330,33.62381966393652,886.2515267675393

Note how replayed decreased, and not-replayed decreased which should not be the case. If replayed decreases, not-replayed should increase. This results in a poor calculation of the confidence metric and it becomes very difficult to interpret.

We should modify how the confidence is calculated and use a total number of expected requests that is known beforehand. At the moment, we assume that all requests will be seen in mitmproxy so we calculate the total number of requests as replayed+not-replayed which is a bad assumption.

Greg Mierzwinski [:sparky]

Reporter

Updated

•

5 years ago

Comment 1

•

5 years ago

We could think of these as hits (requested, and replayed from recording), misses (requested, but not replayed from recording), and spares (in recording, but not requested). The confidence could be the percentage of the total (hits+misses+spares) that are hits.

Greg Mierzwinski [:sparky]

Reporter

Comment 2

•

5 years ago

I think that if we can get the number of spares, that would solve the issue and we wouldn't need to redo the confidence calculation. We could add a new metric for recordingProportionUsed (which would include the spares in the total) and then for confidence, we could keep the same calculation but rename it to replayConfidence or networkConfidence to make their purpose clearer.

It would be great if we could find a way to make sure devs know what these metrics are and how to interpret them as well.

Greg Mierzwinski [:sparky]

Reporter

Updated

•

5 years ago

Priority: P2 → P3

Greg Mierzwinski [:sparky]

Reporter

Updated

•

5 years ago

Mentor: gmierz2

User Story: (updated)

Keywords: good-first-bug

Kanishk

Assignee

Comment 3

•

5 years ago

Hey Greg.
I would like to look into this bug. Would be great if you can point me to where I can find the relevant code and give me a little context about the confidence calculations that are being discussed.
Cheers.

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Reporter

Comment 4

•

5 years ago

Hi Kanishk!

You should start with getting your environment going first: https://wiki.mozilla.org/TestEngineering/Performance/NewContributors

After that you'll want to be looking in these files:

Confidence: https://searchfox.org/mozilla-central/source/testing/mozbase/mozproxy/mozproxy/backends/mitm/mitm.py#372
Where the metrics are gathered: https://searchfox.org/mozilla-central/source/testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/alternate-server-replay.py#241,252

You'll need to change the existing confidence metric to replayConfidence and its calculation will be the same. Then, you'll also need to add a new recordingProportionUsed metric which would be calculated as hits+misses/total where total would be the maximum possible requests that can be done in this recording.

I think you can get that total from the size of self.flow_map, :bebe can you confirm?

Flags: needinfo?(gmierz2) → needinfo?(fstrugariu)

Kanishk

Assignee

Comment 5

•

5 years ago

Attached file Bug 1655840 - Add new mozproxy metrics replay-confidence, and recording-proportion-used. r=sparky — Details

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → kanishk509

Status: NEW → ASSIGNED

Kanishk

Assignee

Comment 6

•

5 years ago

I've submitted an initial patch, taking (hits + misses) as (self._replayed + self._not_replayed) and the total as len(self.flowmap). I've updated the metrics in mitm.py too.
Let me know what further changes are required.
Cheers.

Flags: needinfo?(gmierz2)

Florin Strugariu [:Bebe]

Comment 7

•

5 years ago

:sparky that sounds right
:Kanishk thanks for the patch I will also take a look.

Flags: needinfo?(fstrugariu)

Greg Mierzwinski [:sparky]

Reporter

Comment 8

•

5 years ago

:bebe, we're getting some odd numbers using the flowmap attribute. Do you have any suggestions for a better way of doing this?

See the amazon tests here and how the metric is >100%: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8ab6e0986ae83ddaa94e9eed2503903a4a38358&selectedTaskRun=eWo5gunAR6uGnv0VOXp3eg.0

Flags: needinfo?(gmierz2) → needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Comment 9

•

5 years ago

Here is how I see it:
The proxy is used for all the test retrys used in the test. That means it's started at the beginning of the test and closed at the end.

If the metric we are looking for is: Percent of the available recording used during all the retries we should calculate it like this.

Count every time we use a hash (compiled packaged available in the recording) and generate the metric by dividing the total number of available replays to the number of used replies.

A issue I see here is that a hash is generated using a combination of arguments of a request(url components, port scheme raw content host etc.)
so one or more replies are bundled in one hash(from which we use only the last added reply to respond) so we ignore part of the recording from start. But that was always like this. So should we compare with the total number of request available in a recorded file for the number of hashes generated?

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Comment 10

•

5 years ago

Here is how I would do it:

diff --git a/testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/alternate-server-replay.py b/testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/alternate-server-replay.py
--- a/testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/alternate-server-replay.py
+++ b/testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/alternate-server-replay.py
 
@@ -123,9 +123,10 @@ class AlternateServerPlayback:
                 )
             elif i.response and self.mitm_version in ("4.0.2", "4.0.4", "5.1.1"):
                 # see: https://github.com/mitmproxy/mitmproxy/issues/3856
-                l = self.flowmap.setdefault(self._hash(i), [])
-                l.append(i)
-                self._max_requests += 1
+                l = self.flowmap.setdefault(self._hash(i),
+                                            {"items": [],
+                                             "reply_count": 0})
+                l["items"].append(i)
             else:
                 ctx.log.info(
                     "Recorded request %s has no response. Removing from recording list"
@@ -196,8 +197,9 @@ class AlternateServerPlayback:
             found.
         """
         hsh = self._hash(request)
-        if hsh in self.flowmap:
-            return self.flowmap[hsh][-1]
+        if hsh in self.flowmap.keys():
+            self.flowmap[hsh]["reply_count"] += 1
+            return self.flowmap[hsh]["items"][-1]
 
     def configure(self, updated):
         if not self.configured and ctx.options.server_replay_files:
@@ -208,8 +210,18 @@ class AlternateServerPlayback:
         if self._done or not ctx.options.upload_dir:
             return
 
+        recording_proportion_count = 0
+        for flow in self.flowmap:
+            if self.flowmap[flow]["reply_count"] > 0:
+                recording_proportion_count += 1
+
         replayConfidence = float(self._replayed) / (self._replayed + self._not_replayed)
-        recordingProportionUsed = float(self._replayed + self._not_replayed) / self._max_requests
+
+        if recording_proportion_count > 0:
+            recordingProportionUsed = recording_proportion_count / len(self.flowmap)
+        else:
+            recordingProportionUsed = 0
+
         stats = {"totals": dict(self.netlocs),
                  "calls": self.calls,
                  "replayed": self._replayed,
@@ -283,6 +295,7 @@ if hasattr(signal, 'SIGBREAK'):
     def _shutdown(sig, frame):
         ctx.master.shutdown()
 
+
     signal.signal(signal.SIGBREAK, _shutdown)
 
 addons = [playback]

Kanishk

Assignee

Comment 11

•

5 years ago

:bebe, So basically we should calculate the proportion of hashes that we later used vs the total hashes we generated/stored?
One question (might be unrelated to the task at hand): why are we storing a list of replies for a colliding hash when we are always using the most recent one?

Flags: needinfo?(fstrugariu)

Kanishk

Assignee

Comment 12

•

5 years ago

Updated the patch based on :bebe 's suggestions.

Florin Strugariu [:Bebe]

Comment 13

•

5 years ago

(In reply to Kanishk from comment #11)

:bebe, So basically we should calculate the proportion of hashes that we later used vs the total hashes we generated/stored?
One question (might be unrelated to the task at hand): why are we storing a list of replies for a colliding hash when we are always using the most recent one?

That code supported multiple changes over time according to the proxy version and usage. But you have a great point in there...

Can you file a bug to review that code and add improvements. Any other suggestions are really appreciated!

Flags: needinfo?(fstrugariu) → needinfo?(kanishk509)

Kanishk

Assignee

Comment 14

•

5 years ago

Okay. My first time filing a bug.

Flags: needinfo?(kanishk509)

Phabricator Automation

Updated

•

5 years ago

Attachment #9181489 - Attachment description: Bug 1655840 - New confidence metrics replayConfidence, recordingProportionUsed. r=sparky → Bug 1655840 - Add new mozproxy metrics replay-confidence, and recording-proportion-used. r=sparky

Pulsebot

Comment 15

•

5 years ago

Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/766437adb95f Add new mozproxy metrics replay-confidence, and recording-proportion-used. r=sparky,perftest-reviewers

Noemi Erli[:noemi_erli]

Comment 16

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/766437adb95f

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox84: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 84 Branch

You need to log in before you can comment on or make changes to this bug.