Closed Bug 1849415 Opened 1 year ago Closed 9 months ago

Intermittent FOGFixture.TestCppTimingDistWorks | Expected: (data.sum) > ((uint64_t)(15 * NANOS_IN_MILLIS) - EPSILON), actual: 10734000 vs 14800000 @ /builds/worker/checkouts/gecko/toolkit/components/glean/tests/gtest/TestFog.cpp:264

Categories

(Toolkit :: Telemetry, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=426514312&repo=mozilla-esr115
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/cpYQ1Na5QgezNLhNTM47gQ/runs/0/artifacts/public/logs/live_backing.log


[task 2023-08-18T21:07:58.880Z] 21:07:58     INFO -  TEST-START | FOGFixture.TestCppTimingDistWorks
[task 2023-08-18T21:07:58.911Z] 21:07:58  WARNING -  TEST-UNEXPECTED-FAIL | FOGFixture.TestCppTimingDistWorks | Expected: (data.sum) > ((uint64_t)(15 * NANOS_IN_MILLIS) - EPSILON), actual: 10734000 vs 14800000 @ /builds/worker/checkouts/gecko/toolkit/components/glean/tests/gtest/TestFog.cpp:264
[task 2023-08-18T21:07:58.911Z] 21:07:58  WARNING -  TEST-UNEXPECTED-FAIL | FOGFixture.TestCppTimingDistWorks | test completed (time: 20ms)
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → INCOMPLETE

As you can see from the See Also list, timing tests are tough. The most complete exploration of this particular failing is in https://bugzilla.mozilla.org/show_bug.cgi?id=1813167#c1:


The 15 * NANOS_IN_MILLIS is the assumed, reasonable sum of one timing sample that slept for two 5ms intervals plus one timing sample that slept for one 5ms interval.

HOWEVER, things are rarely (see the bugs in the See Also list for just how rare) that reasonable. Those 5ms intervals are only "roughly" 5ms long, and if each of them is shaved juuuuust enough, we can find ourselves at a sum that is 15ms - (3 * SHAVE). This is why we put in EPSILON: we hope that EPSILON > (3 * SHAVE) and simultaneously keep it as small as possible.

Alas. An EPSILON of 0.04ms is apparently not always enough. We need one of at least 0.1867ms.

(( 1.3% seems like an awful lot for a timer to be off... but remember, this is three compounding errors, so it averages to 0.41%. Still not great, but. ))

(( Also, we straight up use an EPSILON of 2ms in the JS versions of this test (most of that's due to rounding, but not all of it) ))

A decent fix for this would be to up EPSILON to 200000 or (uint64_t)(0.2 * NANOS_IN_MILLIS)


If this comes back, we may need to re-examine whether we're correctly accounting for the errors in the three measurements or whether we're missing something.

Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 1 year ago11 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 11 months ago10 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 10 months ago9 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.