Uptick of failures in Firefox 57: history fails with "date <some_date> cannot be a future date"

RESOLVED FIXED in Firefox 57

Status

()

defect
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: markh, Assigned: lina)

Tracking

({regression})

Trunk
Firefox 58
Points:
---
Bug Flags:
in-testsuite +
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox56 unaffected, firefox57+ fixed, firefox58 fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Around 35% of failures for this engine in 57 are failing with the above error. 57/58 has the lowest general success rate for this engine (99.6% vs the 99.9 we'd expect)

Bug 1377944 seems the likely candidate, but I haven't verified that it can be explained by the patch there.

I suspect that either "this" device has a clock that's behind, or the "other" device has a clock that's forward. It's likely to be easy to fix (ie, we'd just use now()), so we should ASAP and request uplift.

https://nbviewer.jupyter.org/gist/mhammond/649a6f6c5606cff0e0d9d8a4f0b9e9a1/Success-Failure-Error%20rates%20per%20engine%2C%20by%20version.ipynb
Comment hidden (mozreview-request)
(Reporter)

Comment 2

2 years ago
mozreview-review
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

https://reviewboard.mozilla.org/r/186542/#review191628

any chance of a test? ;)
Comment hidden (mozreview-request)
(Assignee)

Comment 4

2 years ago
[Tracking Requested - why for this release]: This is a visible regression in history sync, per Mark's numbers in comment 0. Users are very likely to hit this in the wild: visits synced from another device with a fast clock, even by a few seconds, will cause the entire history sync to fail.
Assignee: nobody → kit
Status: NEW → ASSIGNED
(Reporter)

Comment 5

2 years ago
cc Chris as he asked to be informed up 57 uplift requests.

For additional context on Kit's tracking request: The patch here is small and low risk. However, only 0.03% of syncs appear to be hitting this issue, so it's not widespread. But 0.03% more users having broken history sync in 57 is still bad.

To help release management make a decision here: using the metric of "should this be in a dot release if found at release time", I'd say "we'd probably ask for uplift if an existing dot release was being made, but wouldn't ask for one purely for this issue."
(Reporter)

Comment 6

2 years ago
mozreview-review
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

https://reviewboard.mozilla.org/r/186542/#review191660

::: services/sync/modules/engines/history.js:279
(Diff revision 2)
>                         visit.type + "; ignoring.");
>          continue;
>        }
>  
>        // Dates need to be integers.
> -      visit.date = Math.round(visit.date);
> +      let visitDateAsPRTime = Math.round(visit.date);

it seems this will not do the "right thing" in this case, but I believe it's actually impossible to do the right thing in this case :/ However, this will do the "closest possible thing to the right thing", so we have that going for us, which is nice...
Attachment #8915350 - Flags: review?(markh) → review+
Comment hidden (mozreview-request)
(Assignee)

Comment 8

2 years ago
mozreview-review-reply
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

https://reviewboard.mozilla.org/r/186542/#review191660

> it seems this will not do the "right thing" in this case, but I believe it's actually impossible to do the right thing in this case :/ However, this will do the "closest possible thing to the right thing", so we have that going for us, which is nice...

Following up on our IRC conversation, I think you're completely correct. The intent is to avoid inserting duplicate visits for the same date, so we should use the clamped date for lookup in `curVisits`, even though it's impossible or extremely unlikely to match. However, we should *also* add clamped visits to `curVisits`, because an incoming record with multiple out-of-bounds visit dates and the same type will clamp to the same date. I added that, and extended the test to cover this.
(Assignee)

Comment 9

2 years ago
mozreview-review
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

https://reviewboard.mozilla.org/r/186542/#review191694

::: services/sync/modules/engines/history.js:293
(Diff revision 3)
>          continue;
>        }
>  
> -      visit.date = PlacesUtils.toDate(visit.date);
> +      // Note the visit key, so that we don't add duplicate visits with
> +      // clamped timestamps.
> +      curVisits.push(visitKey);

`curVisits` would be more efficient as a set, but I'd like to keep this patch small, and maybe change this in a trivial follow-up.
(Assignee)

Comment 10

2 years ago
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

It also occurred to me that we should clamp dates older than `EARLIEST_BOOKMARK_TIMESTAMP` to `EARLIEST_BOOKMARK_TIMESTAMP`, not the current date. What I had before was not a clamp. :-) Mark, would you mind taking another look, please?
Attachment #8915350 - Flags: review+ → review?(markh)
(Reporter)

Comment 11

2 years ago
mozreview-review
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

https://reviewboard.mozilla.org/r/186542/#review191700

Great, thanks!
Attachment #8915350 - Flags: review?(markh) → review+

Comment 12

2 years ago
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/85b73d4c4c9c
Clamp future and far past synced history visit dates. r=markh
https://hg.mozilla.org/mozilla-central/rev/85b73d4c4c9c
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
57+ based on Comment 4. This is the type of bug that could bite us post release.
Please request Beta approval on this when you get a chance.
Flags: needinfo?(kit)
(Assignee)

Comment 16

2 years ago
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

I hoped to have Sync ping data before requesting approval, but filing the request now so that we don't miss the train.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1377944.
[User impact if declined]: Users won't see their synced history from other devices if they're syncing history, and the clocks on their other devices are faster.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes. We'll keep an eye on the Sync telemetry ping data this week.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: The patch adds an extra function call to clamp history visit timestamps, and fixes how we detect duplicate visits. We're not sure it fixes the issue completely yet, but it's an improvement over an outright failure.
[String changes made/needed]: None.
Flags: needinfo?(kit)
Attachment #8915350 - Flags: approval-mozilla-beta?
Comment on attachment 8915350 [details]
Bug 1405566 - Clamp future and far past synced history visit dates.

Severe recent regression, beta57+
Attachment #8915350 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #16)
> [Is this code covered by automated tests?]: Yes.
> [Has the fix been verified in Nightly?]: Yes. We'll keep an eye on the Sync
> telemetry ping data this week.
> [Needs manual test from QE? If yes, steps to reproduce]: No.

Setting qe-verify- based on Kit's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
(Reporter)

Comment 20

2 years ago
Telemetry implies this patch didn't fix the issue - this error remains the top failure for 57 and 58
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Out of idle curiosity, I ran the following query in redash last night: https://sql.telemetry.mozilla.org/queries/48779/source.

This lists the history errors for buildsIds after this patch landed (in 58). It doesn't seem to have a single instance of the reported error, and so, well, I'm not convinced that it's actually not fixed (In fact, I'm fairly convinced of the opposite), I think we just need to (annoyingly) be a bit more careful about how we determine these things.
(Reporter)

Comment 22

2 years ago
Thanks Thom - I can also confirm with adhoc analysis that later build IDs don't report this error - I don't quite understand why it is still being reported in the larger analysis when most others go away quickly after the fix landing, but I agree this is actually fixed - thanks!
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.