Closed Bug 1206424 Opened 9 years ago Closed 8 years ago

Intermittent test_data.html | Test timed out

Categories

(Core :: DOM: Notifications, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed
firefox-esr45 --- unaffected

People

(Reporter: philor, Assigned: lina)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 2 obsolete files)

      No description provided.
Blocks: 1206969
Asking push mailing list for help on this, bug 1210307, and bug 1164432.  They all seem to have spiked at the same time over the weekend.
Seeing this pretty frequently on OSX e10s as well.
Fixed by bug 1244816.
Status: NEW → RESOLVED
Closed: 8 years ago
Depends on: 1244816
Resolution: --- → FIXED
Looks like this still failure is still hanging around with low frequency on Windows XP :(
No longer blocks: e10s-tests-osx
Status: RESOLVED → REOPENED
Flags: needinfo?(wchen)
Resolution: FIXED → ---
Kit?
Flags: needinfo?(kcambridge)
I think I see what's going on, at least for the pushes since error logging was enabled...

https://treeherder.mozilla.org/logviewer.html#?repo=fx-team&job_id=9064440#L2919-L2986
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=26773782#L2919-L2989
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=26943337#L3286-L3351

Looks like a `NaN` makes its way in when we recalculate the quota (maybe when we read `prefs.get("maxQuotaPerSubscription")`. I think we would hit https://dxr.mozilla.org/mozilla-central/rev/369a5ee3a2880a4a98df3a00bf3db8d8f36b181b/dom/push/PushService.jsm#824-829 if we got an invalid value from Places). That causes us to trash the message and eventually time out.
Assignee: nobody → kcambridge
Flags: needinfo?(wchen)
Flags: needinfo?(kcambridge)
Attached patch pushQuotaNumber.patch (obsolete) — Splinter Review
Attachment #8749481 - Flags: review?(wchen)
Comment on attachment 8749481 [details] [diff] [review]
pushQuotaNumber.patch

Review of attachment 8749481 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/push/PushRecord.jsm
@@ +32,5 @@
> +// Returns the maximum number of background messages that can be sent to a
> +// subscription. Subscriptions that exceed this quota are disabled until the
> +// user visits the page again.
> +function getMaxQuota() {
> +  return prefs.get("maxQuotaPerSubscription", 0, Ci.nsISupportsPRUint32);

Even simpler: we could make this a constant instead of a pref. WDYT?
Attached patch pushQuotaNumber.patch (obsolete) — Splinter Review
I decided to make it a constant, anyway. :-)
Attachment #8749481 - Attachment is obsolete: true
Attachment #8749481 - Flags: review?(wchen)
Attachment #8749482 - Flags: review?(wchen)
Comment on attachment 8749482 [details] [diff] [review]
pushQuotaNumber.patch

Review of attachment 8749482 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/push/PushRecord.jsm
@@ +87,2 @@
>        this.quota = Math.min(
>          Math.round(8 * Math.pow(daysElapsed, -0.8)),

Good catch on the NaN quota. I'm not so sure that it's due to getting a bad value from prefs since the service should be around until shutdown and this code shouldn't be running at shutdown. I think it's valuable to have the quota configurable from preferences so I'd like to keep it if it's not the source of the bug.

I suspect it's coming from Math.pow and a lastVisit value from the future based on the fact that Date.now and the time in the database get the time from the same source on Linux while Windows does it differently in each case [1][2] and this only reproduces on Windows.

[1] https://dxr.mozilla.org/mozilla-central/source/js/src/vm/Time.cpp#154
[2] https://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/ntmisc.c#178

What do you think about just trying to fix the daysElapsed to always be non-negative first to see if it fixes the bug and continue investigation or remove the pref if that doesn't work?
Attachment #8749482 - Flags: review?(wchen)
Attached patch nanDays.patchSplinter Review
(In reply to William Chen [:wchen] from comment #66)
> What do you think about just trying to fix the daysElapsed to always be
> non-negative first to see if it fixes the bug and continue investigation or
> remove the pref if that doesn't work?

Good call! I kept the `setQuota` change, because I think it's a bit easier to follow when unrolled, but reverted the others. Also added a test.
Attachment #8749482 - Attachment is obsolete: true
Attachment #8750734 - Flags: review?(wchen)
Attachment #8750734 - Flags: review?(wchen) → review+
https://hg.mozilla.org/mozilla-central/rev/bccabd9ff624
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
We probably want to uplift this to Aurora/Beta as well when we're reasonably happy with it.
Flags: needinfo?(kcambridge)
Sure. I'll request uplift next week, assuming we don't see more oranges on m-c or the integration branches. Keeping ni? until then.
Looking strong :)
Blocks: 1274326
Comment on attachment 8750734 [details] [diff] [review]
nanDays.patch

This isn't quite a test-only change, so filling out the form.

Approval Request Comment
[Feature/regressing bug #]: This bug.
[User impact if declined]: Intermittent Push mochitest failures.
[Describe test coverage new/current, TreeHerder]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=79ff837704fe
[Risks and why]: Very low risk. No change to existing behavior, and we haven't seen this failure on m-c since the patch landed.
[String/UUID change made/needed]: None.
Flags: needinfo?(kcambridge)
Attachment #8750734 - Flags: approval-mozilla-beta?
Attachment #8750734 - Flags: approval-mozilla-aurora?
Comment on attachment 8750734 [details] [diff] [review]
nanDays.patch

Let's uplift this to Aurora48. For beta47, I would prefer to not uplift. We can let this one ride the trains from Aurora48 to Beta48. 

At this point, I would like to only accept uplifts for critical recent regressions, severe stability, security, perf, mlk issues only. This is to minimize code churn and to ensure we ship a high-quality Fx47 in 2 weeks.
Attachment #8750734 - Flags: approval-mozilla-beta?
Attachment #8750734 - Flags: approval-mozilla-beta-
Attachment #8750734 - Flags: approval-mozilla-aurora?
Attachment #8750734 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.