Closed Bug 1614716 Opened 6 years ago Closed 3 years ago

Uptake reported age seems to be wrong in some situations

Categories

(Firefox :: Remote Settings Client, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: leplatrem, Unassigned)

References

Details

(Whiteboard: telescope poucave delivery-checks prod remotesettings-uptake-release/max-age)

Attachments

(1 file)

We report the age of obtained data when changes are pulled from the server.

For scheduled or startup synchronizations, the reported age increases with the time passing (until a new change is published). In the uptake telemetry, we can observe this through the «sawtooth» aspect of the graph (switchback?).

But for broadcast synchronizations, the reported age should roughly be stable, and should not follow the pattern of those described above.
However, it does. Meaning that we may have an issue with the code that reports the age.

One possibility would be that client reconnections are handled as "broadcast" and would thus spoil the proper realtime broadcast reported values.

Type: enhancement → defect
Priority: -- → P1

I spent some time today trying to figure out if this could be due to client reconnections. I haven't done any actual tests using a live browser, but from looking at the code, it doesn't seem like reconnections should be a factor. The code is contained in https://dxr.mozilla.org/mozilla-central/source/dom/push/PushServiceWebSocket.jsm. We trigger reconnects at https://dxr.mozilla.org/mozilla-central/source/dom/push/PushServiceWebSocket.jsm#247-251. Reconnecting means tearing down the socket and "starting over", https://dxr.mozilla.org/mozilla-central/source/dom/push/PushServiceWebSocket.jsm#346-350, https://dxr.mozilla.org/mozilla-central/source/dom/push/PushServiceWebSocket.jsm#183. We don't keep any previous state around when we handle handshake replies, https://dxr.mozilla.org/mozilla-central/source/dom/push/PushServiceWebSocket.jsm#612-617, so if that's what we're getting, then we should still be seeing "hello" as the context.

Thanks for digging into this!

This is the query that I used to look at the data: https://sql.telemetry.mozilla.org/queries/67038/source#169792

Although there are some spikes, it's true that when we get rid of noise (ignore periods were too few events are received), we get a lot less of this sawtooth pattern.

So, it could totally be related to our query in Telemetry and not so much about the client code.

It can very a lot by channel, which could mean it comes from Megaphone

Whiteboard: poucave delivery-checks prod remotesettings-uptake-release/max-age
Whiteboard: poucave delivery-checks prod remotesettings-uptake-release/max-age → telescope poucave delivery-checks prod remotesettings-uptake-release/max-age

We improved the Telemetry queries to remove noise, and ignore periods of time where few clients were reporting values. This seems to have fixed the problem.

See https://github.com/mozilla-services/telescope/blob/ab679a6ca9e2d7addf6f9f8b173b21c5f896ec72/checks/remotesettings/uptake_max_age.py#L46-L48

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: