Investigation: clients with duplicated sequence numbers
Categories
(Data Platform and Tools :: Glean: SDK, task, P3)
Tracking
(Not tracked)
People
(Reporter: janerik, Unassigned)
References
Details
In my research on the glean event timestamps I noticed that some clients sent re-sent some of their pings.
This is visible as pings with the same (client_id, document_id, seq)
tuples, but with different submission_timestamps
.
Notably submission_timestamps
is on different days (re-submissions on the same day should be caught by the copy_dedupe
task).
The overall number of clients for which this happens is low for every app (like <1%).
Some of those clients re-submitt a ping pretty much every day (like 42 pings in a 60 day window, or 30 pings in a 30 day window from today backwards)
Given the low number of clients this is not very concerning, though it can affect analysis if you inspect specific client data.
For now this bug acts merely as documentation for future-us.
Reporter | ||
Comment 1•1 year ago
|
||
I spoke a bit too soon.
For Fenix release we see about 1.4% of all clients week-by-week send us duplicated sequence numbers.
For Desktop release on a 1% sample of the data we get ~0.4% (all of release takes too long to query)
Reporter | ||
Comment 2•1 year ago
|
||
More precisely this is ping specific:
We currently have a sharp increase in WAU on Fenix Nightly (measured on events and baseline pings), thus skewing the numbers.
Ping | Timeframe | Dup % |
---|---|---|
baseline | 2023-09-11 | 2.3% |
baseline | 2023-11-13 | 1.0% |
metrics | 2023-09-11 | 0.8% |
metrics | 2023-11-13 | 0.74% |
events | 2023-09-11 | 1.4% |
events | 2023-11-13 | 2.3% |
On Fenix release:
Ping | Timeframe | Dup % |
---|---|---|
baseline | 2023-09-11 | 2.26% |
baseline | 2023-11-13 | 2.27% |
metrics | 2023-09-11 | 0.89% |
metrics | 2023-11-13 | 0.95% |
events | 2023-09-11 | 1.44% |
events | 2023-11-13 | 1.46% |
Reporter | ||
Comment 3•1 year ago
•
|
||
Firefox iOS release is pretty stable, so I only report numbers from November
ping | Dup % |
---|---|
baseline | 11.6% |
metrics | 1.7% |
events | 10.6% |
Those baseline and events numbers are shockingly high.
Could that be because we upload in the background and iOS lets that to finish, but doesn't give us enough time to clean out the files?
Can we reproduce this?
(Note: take those numbers as "unverified" until I get someone else to look at my queries!)
Reporter | ||
Comment 4•1 year ago
|
||
I re-ran the numbers today, there's a bit of a downward trend this year (10% -> 7-9%), but why and if that trends keeps on we don't know.
I think this is worth some work:
- Validate the analysis, make sure what I'm looking at is valid. Is my "by week" look valid? Is it hiding anything?
- When/For how long do these dupes happen? Within days? Consistently the same ping over days/weeks from a specific client?
- Come up with a potential hypothesis why that happens
- I phrased one above: Could that be because we upload in the background and iOS lets that to finish, but doesn't give us enough time to clean out the files?
- Anyway to locally reproduce that?
- How do we handle this?
- Can we collect additional information about what pings we try to upload when? e.g. on request for an upload store UUID + timestamp and send that along with everything else?
- Can we use this stored information to avoid dupes client-side? Or do we need to apply something server-side to delete dupes within a certain window?
More questions than answers. Fixing this will require some Swift experience (that's where the uploader is implemented).
Reporter | ||
Updated•1 year ago
|
Comment 5•1 year ago
|
||
Just adding this to the investigation: I looked at this another way, counting dupes by client_id + sequence number over the last 90 days and ended up seeing about 15% of clients send us dupes, but this amounts to less that 1% of all pings sent.
Updated•11 months ago
|
Reporter | ||
Updated•10 months ago
|
Description
•