Intermittent services/common/tests/unit/test_blocklist_certificates.js | xpcshell return code: 0

RESOLVED FIXED

Status

Cloud Services
Firefox: Common
P3
normal
RESOLVED FIXED
11 months ago
9 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: glasserc)

Tracking

({intermittent-failure})

unspecified
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox55 fixed)

Details

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

11 months ago
treeherder
Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=66523325&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/ELOfdNEeRgOFudv4TGv_PQ/runs/0/artifacts/public/logs/live_backing.log

Updated

11 months ago
Priority: -- → P3

Comment 1

10 months ago
6 failures in 836 pushes (0.007 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* autoland: 3
* mozilla-aurora: 2
* mozilla-central: 1

Platform breakdown:
* android-4-3-armv7-api15: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1329069&startday=2017-02-06&endday=2017-02-12&tree=all
Component: Sync → Firefox: Common
Product: Firefox → Cloud Services
And this is why you don't want to just let "xpcshell return code: 0" intermittent bug sit. This bug is about some "Unexpected exception Error: Request timeout. at resource://services-common/kinto-http-client.js:1866" failure which produces no other usable failure message, but now we've left autoland broken for 12.5 hours while starring permaorange caused by the patch for bug 1224528 as though it were this intermittent.

Comment 3

10 months ago
17 failures in 833 pushes (0.02 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* autoland: 14
* mozilla-inbound: 3

Platform breakdown:
* android-4-3-armv7-api15: 12
* android-4-2-x86: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1329069&startday=2017-02-13&endday=2017-02-19&tree=all
(Assignee)

Comment 4

10 months ago
The "Request timeout." comes from the kinto-http.js library, which enforces a timeout on all requests (and it defaults to 5 seconds). This is probably too short for real users on slow connections, but probably long enough that it *usually* works in tests. However, I believe bug 1335519 and bug 1333677 are representations of the same underlying timeout occasionally being exceeded on occasional try builds. Guided by https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges#Using_magical_timeouts_to_cause_delays, I've opened https://github.com/Kinto/kinto-http.js/issues/160 to remove the underlying timeout from the kinto-http.js library, or at least stop defaulting to 5 seconds.
Comment hidden (mozreview-request)
(Assignee)

Comment 6

10 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=abf171ce998e
Comment on attachment 8843091 [details]
Bug 1329069: Upgrade kinto-http-client.js to v4.0.0,

https://reviewboard.mozilla.org/r/116840/#review118524
Attachment #8843091 - Flags: review?(MattN+bmo) → review+
(Assignee)

Updated

9 months ago
Keywords: checkin-needed
(Assignee)

Comment 8

9 months ago
I was waiting to ensure that this was OK based on my understanding of the necko code in https://groups.google.com/forum/#!searchin/mozilla.dev.tech.network/timeout%7Csort:date/mozilla.dev.tech.network/RJao1Fs9ZNU/Od8hzAYQAgAJ. I just spoke with mcmanus and it seems like this change is OK and won't leave lingering sockets anywhere or anything like that.

Comment 9

9 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6302bbf6cc67
Upgrade kinto-http-client.js to v4.0.0, r=MattN
Keywords: checkin-needed

Comment 10

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6302bbf6cc67
Status: NEW → RESOLVED
Last Resolved: 9 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
(Assignee)

Updated

9 months ago
Duplicate of this bug: 1333677
(Assignee)

Updated

9 months ago
Duplicate of this bug: 1335519
Assignee: nobody → eglassercamp
These failures affect 53/54 as well, but I'm guessing this isn't a great candidate for backport either. Any suggestions for what to do on the branches? :)
Flags: needinfo?(eglassercamp)
(Assignee)

Comment 14

9 months ago
I guess going from 2.7.0 to 4.0.0 doesn't seem conducive to stability :) Then again, looking at it further, the two major version bumps were because of: removing this default timeout, and removing a polyfill for isomorphic-fetch, which I think we don't use here anyhow. So it might be safe to just uplift.

The other thing we could do is to just extend the timeout from 5 seconds to 300 seconds (which is the natural timeout of an xpcshell test anyhow), in every place where a KintoClient is created. This might be a good change to make anyhow to improve reliability of blocklist updates and things like that, but it seems a little bit like overkill to just stop intermittent failures in 53/54. My understanding based on the OrangeFactor robot posts was that this tends to happen fewer than one time in a normal week. Is that correct?

I could probably whip up a patch against 53/54 if you want, although I don't really know the process.
Flags: needinfo?(eglassercamp)
You need to log in before you can comment on or make changes to this bug.