Closed Bug 1502378 Opened 2 years ago Closed 2 years ago

pingsender may be fooled into losing some main/shutdown pings on non-windows clients

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
2

Tracking

()

RESOLVED FIXED
Tracking Status
firefox65 --- affected

People

(Reporter: chutten, Assigned: chutten)

References

(Blocks 1 open bug)

Details

In looking at proportions of "main" pings with reason: "shutdown" sent from Firefox 55+, our Linux population sends a much noisier and lower proportion[1] than the broader population[2], by about 10-15%.

We only looked into this when Jan-Erik noticed that the curl implementation of PingSender::Post seems to treat all response codes < 400 as success[3]. The Windows implementation takes the inverse approach and treats only a 200 code as success.

This matters because a success results in the deletion of the pending ping. If pingsender thinks it's a success in a case when ingest didn't receive the ping, this means data loss.

The Telemetry HTTP Edge Server spec will only hand out 200 for success and a code >= 400 on failure[4]... but that presumes pingsender was able to reach the edge server at all.

I don't know if it is possible to positively confirm with available data whether the curl implementation's broader definition of success is problematic. Thus, I propose we change the curl implementation to match the Windows implementation (only 200 is success) and then measure builds with and without the fix for changes in volume.

[1]:
[2]:
[3]: https://searchfox.org/mozilla-central/rev/72b1e834f384a2ffec6eb4ce405fbd4b5e881109/toolkit/components/telemetry/pingsender/pingsender_unix_common.cpp#187
[3]: https://docs.telemetry.mozilla.org/concepts/pipeline/http_edge_spec.html#postput-response-codes
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Points: --- → 2
Priority: -- → P1
See Also: → 1502382
Some quick analysis shows that the Linux population is missing subsessions at a lower rate than the broader population, which is inconsistent with the theory that the curl implementation is inherently more problematic. (It might still be more problematically-written, but there may be aspects of the environment or population that lessen the effect).

Improvements to the pingsender implementation wrt dealing with the HTTP edge server can come via the broader effort in bug 1290256
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
See Also: → 1290256
You need to log in before you can comment on or make changes to this bug.