Closed Bug 1815970 Opened 2 years ago Closed 2 years ago

Improve fuzz factor usage in the visualmetrics-portable.py script

Categories

(Testing :: Performance, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sparky, Unassigned)

References

Details

(Whiteboard: [fxp])

This bug is for changing how the visualmetrics-portable.py script works in browsertime: https://github.com/sitespeedio/browsertime/blob/main/browsertime/visualmetrics-portable.py

It involves changing the usage of fuzz factors throughout the code to something that doesn't involve arbitrary hard-coded values. These are used for distinguishing minor variations in pixel intensities from changes in content. Using a fuzz factor is problematic because the amount of variation from one recording to another changes so using a single value is going to result in some tests having good results being produced, but others might not. This can result in issues like bug 1809874 where a small shift in the variation of pixels can produce large differences. Note that the reason the differences get so large is because the fuzz factor allows the frames be sufficiently different from each other even when they are similar to each other. A higher fuzz factor would likely resolve the issue in the aforementioned bug.

To resolve this issue, we should look into implementing a segmentation-based technique for determining the similarity in frames that have minor differences. For instance, we could use a region-growing technique, then compare the size of the regions. If we have fully-connected regions that have a size greater than some X% of the frame, and these regions exist in one frame but not the other, then we could be more confident in saying that there was a genuine visual change in one frame versus another. One other possibility is that we could use this to enhance our current compare method rather than replace the fuzz factor. Doing this could let us look at only the pixels that are inexact, allowing us to use a simpler segmentation technique like thresholding.

This is similar to what the Contentful Speed Index does where we look at the edges (or what we call "content") of the frames. Segmentation would let us do something similar but it would look at areas of content rather than the edges/boundaries of those areas.

See Also: → 1809874

Link to a relevant comment in the other bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1809874#c12

I managed to get the "good" results to agree more with what the actual results are in the "bad" run:

Good Before Change:
15:09:16.917 - Done calculating histograms
15:09:16.919 - 0ms - 0% Complete
15:09:16.920 - 120ms - 49% Complete
15:09:16.923 - 840ms - 48% Complete
15:09:16.925 - 1160ms - 49% Complete
15:09:16.927 - 1360ms - 54% Complete
15:09:16.928 - 1400ms - 54% Complete
15:09:16.929 - 1440ms - 94% Complete
15:09:16.931 - 1480ms - 99% Complete
15:09:16.932 - 1520ms - 99% Complete
15:09:16.934 - 1720ms - 99% Complete
15:09:16.935 - 2560ms - 99% Complete
15:09:16.936 - 2600ms - 100% Complete
{"FirstVisualChange": 120, "LastVisualChange": 2600, "SpeedIndex": 805, "VisualProgress": "0=0, 120=49, 840=48, 1160=49, 1360=54, 1400=54, 1440=94, 1480=99, 1520=99, 1720=99, 2560=99, 2600=100", "videoRecordingStart": 1000}

Good After Change:

15:06:24.463 - Done calculating histograms
15:06:24.465 - 0ms - 0% Complete
15:06:24.465 - 120ms - 13% Complete
15:06:24.466 - 840ms - 14% Complete
15:06:24.466 - 1160ms - 46% Complete
15:06:24.466 - 1360ms - 51% Complete
15:06:24.467 - 1400ms - 52% Complete
15:06:24.467 - 1440ms - 92% Complete
15:06:24.467 - 1480ms - 99% Complete
15:06:24.468 - 1520ms - 99% Complete
15:06:24.468 - 1720ms - 99% Complete
15:06:24.468 - 2560ms - 99% Complete
15:06:24.469 - 2600ms - 100% Complete
{"FirstVisualChange": 120, "LastVisualChange": 2600, "SpeedIndex": 1182, "VisualProgress": "0=0, 120=13, 840=14, 1160=46, 1360=51, 1400=52, 1440=92, 1480=99, 1520=99, 1720=99, 2560=99, 2600=100", "videoRecordingStart": 1000}

Bad After Change:

15:07:26.849 - Done calculating histograms
15:07:26.851 - 0ms - 0% Complete
15:07:26.851 - 200ms - 9% Complete
15:07:26.852 - 880ms - 10% Complete
15:07:26.852 - 1160ms - 44% Complete
15:07:26.852 - 1200ms - 48% Complete
15:07:26.853 - 1400ms - 49% Complete
15:07:26.853 - 1440ms - 91% Complete
15:07:26.854 - 1480ms - 98% Complete
15:07:26.854 - 1520ms - 99% Complete
15:07:26.854 - 1760ms - 99% Complete
15:07:26.855 - 2600ms - 99% Complete
15:07:26.855 - 2640ms - 100% Complete
{"FirstVisualChange": 200, "LastVisualChange": 2640, "SpeedIndex": 1233, "VisualProgress": "0=0, 200=9, 880=10, 1160=44, 1200=48, 1400=49, 1440=91, 1480=98, 1520=99, 1760=99, 2600=99, 2640=100", "videoRecordingStart": 920}

Bad Before Change:
15:10:28.148 - Done calculating histograms
15:10:28.151 - 0ms - 0% Complete
15:10:28.154 - 200ms - 15% Complete
15:10:28.157 - 880ms - 16% Complete
15:10:28.160 - 1160ms - 47% Complete
15:10:28.164 - 1200ms - 51% Complete
15:10:28.167 - 1400ms - 53% Complete
15:10:28.169 - 1440ms - 93% Complete
15:10:28.170 - 1480ms - 99% Complete
15:10:28.172 - 1520ms - 99% Complete
15:10:28.174 - 1760ms - 99% Complete
15:10:28.175 - 2600ms - 99% Complete
15:10:28.177 - 2640ms - 100% Complete
{"FirstVisualChange": 200, "LastVisualChange": 2640, "SpeedIndex": 1165, "VisualProgress": "0=0, 200=15, 880=16, 1160=47, 1200=51, 1400=53, 1440=93, 1480=99, 1520=99, 1760=99, 2600=99, 2640=100", "videoRecordingStart": 920}

The change I made to the visual-metrics script was related to the calculate_frame_progress method. I think the issue that we're hitting here is either the slop usage which provides a range of values in each bucket, or that something is wrong with how the totals/matched are calculated. In any case, we should consider changing to this simpler method:

def calculate_frame_progress2(histogram, start, final):
    total = 0
    matched = 0
    slop = 5  # allow for matching slight color variations
    channels = ["r", "g", "b"]

    for channel in channels:
        for pixel_ind in range(256):
            curr = histogram[channel][pixel_ind]
            init = start[channel][pixel_ind]
            target = final[channel][pixel_ind]

            curr_diff = abs(curr - init)
            target_diff = abs(target - init)

            matched += min(curr_diff, target_diff)
            total += target_diff

    progress = (float(matched) / float(total)) if total else 1
    return math.floor(progress * 100)

I modeled this off the speedline version from here: https://github.com/paulirish/speedline/blob/master/core/lib/speed-index.js#L38-L67

Thinking about this further, the reason we might be using this fuzz/slop here is because we have those compression artifacts to deal with. However, at this point in the code we've already matched the frames and determined that all the frames are definitely different from each other with an allowable fuzz of 5/10/15% depending on where it was called. That difference is unlikely to have a large impact on the result because it manifests itself as a variation in the histogram graph and we allowed up to 15% in difference. What we're doing now is essentially taking 10 possibly disjointed pixel groups (as we're now only looking at individual channels) and adding these differences together to become the matched value.

To properly do this fuzzing, we would need to determine the RGB value of the pixel we are currently looking at, and add the pixels with an RGB value of +-15%. We can't do this with individual channels as we are mixing up colours that could be vastly different from each other. For instance consider a bright red with rgb(240, 0, 0), and a light grey with rgb(245,245,245); the grey would be considered a part of the red slop.

I've made a PR to fix the progress calculation issue here: https://github.com/sitespeedio/browsertime/pull/1902

Duplicate of this bug: 1809874

The following field has been copied from a duplicate bug:

Field Value Source
Regressed by bug 1809188 bug 1809874

For more information, please visit auto_nag documentation.

Keywords: regression
Regressed by: 1809188

The bug from the duplicate didn't cause this issue.

No longer regressed by: 1809188

This bug has been marked as a regression. Setting status flag for Nightly to affected.

Removed regression, and affect settings as this issue is not a regression.

(In reply to Greg Mierzwinski [:sparky] from comment #2)

I've made a PR to fix the progress calculation issue here: https://github.com/sitespeedio/browsertime/pull/1902

We're going to do this fix. I can't find a reason for us to stick to the existing method versus the new one.

Depends on: 1817204
Duplicate of this bug: 1823064
Flags: needinfo?(gmierz2)

Yes, thanks for the needinfo!

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(gmierz2)
Resolution: --- → FIXED
See Also: → 1856940
You need to log in before you can comment on or make changes to this bug.