1261494 - Intermittent testUnifiedTelemetryClientId | Client id file does not exist yet - or Retrieving clientId from JS has timed out -

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Reporter

Description

•

9 years ago

https://treeherder.mozilla.org/logviewer.html#?job_id=24993673&repo=mozilla-inbound https://treeherder.mozilla.org/logviewer.html#?job_id=24991642&repo=mozilla-inbound

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Reporter

Comment 1

•

9 years ago

https://treeherder.mozilla.org/logviewer.html#?job_id=8345475&repo=fx-team

Summary: Intermittent testUnifiedTelemetryClientId | Client id file does not exist yet - → Intermittent testUnifiedTelemetryClientId | Client id file does not exist yet - or Retrieving clientId from JS has timed out -

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 2

•

9 years ago

Shucks. I had difficulty with "Retrieving clientId" when I created the test (I increased the timeout from 2000ms to 20000ms) – maybe I should increase it more? Emulators are rough! :( Without looking at the code, I don't know what could be causing Client id file does not exist yet because, afaik, we remove the client ID file before each test, but it's possible this file is just left over. Then again, maybe this file is now being created by the startup code in the application (e.g. bug 1255657 moved this earlier in the startup process), before the test can run.

Depends on: 1249491

Comment hidden (Intermittent Failures Robot)

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Updated

•

9 years ago

Blocks: core-ping

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

9 years ago

Assignee: nobody → michael.l.comella

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 13

•

9 years ago

Like intermittent bug 1268998 comment 6, I wonder if this could also be related to running the uploader in automation – soft dep.

Depends on: 1270191

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 14

•

9 years ago

re comment 13, backed up by the logs: 16:54:34 INFO - 05-25 16:51:17.342 D/GeckoTelemetryUploadSer( 2860): Service started 16:54:34 INFO - 05-25 16:51:18.220 I/Robocop ( 2860): {"message":"","time":1464220278202,"source":"robocop","status":"PASS","test":"testUnifiedTelemetryClientId","thread":null,"subtest":"The toolbar is not in the editing state","action":"test_status","pid":null} ... 16:54:34 INFO - 05-25 16:51:24.731 D/GeckoTelemetryUploadSer( 2860): Telemetry upload success! 16:54:34 INFO - 05-25 16:51:24.771 D/GeckoTelemetryUploadSer( 2860): Service finished: upload and prune attempts completed 16:54:34 INFO - 05-25 16:51:25.701 I/Robocop ( 2860): {"message":"","time":1464220285681,"source":"robocop","status":"PASS","test":"testUnifiedTelemetryClientId","thread":null,"subtest":"Waiting for UrlEditText to be input method target.","action":"test_status","pid":null}

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 16

•

8 years ago

I expect this to go away with the landing of bug 1270191 – please reopen if not so. The failure rate is low enough that I don't think we need to uplift to aurora.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → DUPLICATE

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 17

•

8 years ago

Still occurring: https://treeherder.mozilla.org/logviewer.html#?job_id=29344044&repo=mozilla-inbound

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 18

•

8 years ago

The logs indicate the telemetry upload service is now disabled in automation, however, I realized that I don't know the Gecko side of things. Georg, do you know if Gecko telemetry could be creating the client ID file during tests in automation? If you don't know, is there any logging output associated with it I could search the logs (e.g. [1]) for? [1]: https://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/738bd6b78f096f1d78ff07cd482fc21d586d13a7870696eba59eef240ff3f89e8644ec461c7835a7d2896d9082b2a0aecbea64703d36ca25ebe1701b3075505f

Flags: needinfo?(gfritzsche)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 19

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5ed93eccba8

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 20

•

8 years ago

Attached file Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. — Details

Review commit: https://reviewboard.mozilla.org/r/57700/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/57700/

Attachment #8759864 - Flags: review?(gkruglov)

Attachment #8759865 - Flags: review?(gkruglov)

Attachment #8759866 - Flags: review?(gkruglov)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 21

•

8 years ago

Attached file Bug 1261494 - Add assertions & logging to resetTest for sanity checking. (obsolete) — Details

Review commit: https://reviewboard.mozilla.org/r/57702/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/57702/

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 22

•

8 years ago

Attached file Bug 1261494 - Add logging before each test to help debug intermittent. (obsolete) — Details

Review commit: https://reviewboard.mozilla.org/r/57704/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/57704/

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

8 years ago

Keywords: leave-open

Comment hidden (Intermittent Failures Robot)

Georg Fritzsche [:gfritzsche]

Comment 24

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #18) > The logs indicate the telemetry upload service is now disabled in > automation, however, I realized that I don't know the Gecko side of things. > > Georg, do you know if Gecko telemetry could be creating the client ID file > during tests in automation? Yes, if it doesn't exist yet we would create the client id on the first request: https://dxr.mozilla.org/mozilla-central/rev/3e8ee3599a67edd971770af4982ad4b0fe77f073/toolkit/modules/ClientID.jsm#128

Flags: needinfo?(gfritzsche)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 25

•

8 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #24) > > Georg, do you know if Gecko telemetry could be creating the client ID file > > during tests in automation? > > Yes, if it doesn't exist yet we would create the client id on the first > request: > https://dxr.mozilla.org/mozilla-central/rev/ > 3e8ee3599a67edd971770af4982ad4b0fe77f073/toolkit/modules/ClientID.jsm#128 Since we're testing this functionality directly, is it possible to disable the other background processes during the test that might attempt to create a client ID? Looking at my code, since I don't know the calling code, my (likely poor) solution would be to: 1) Disable ClientID.getClientID 2) Add a second method with a boolean override (e.g. ClientID.getClientID(underTesting)) which we can call from testUnifiedTelemetryClientId [1] [1]: https://dxr.mozilla.org/mozilla-central/rev/3e8ee3599a67edd971770af4982ad4b0fe77f073/mobile/android/tests/browser/robocop/testUnifiedTelemetryClientId.js#18

Flags: needinfo?(gfritzsche)

:Grisha Kruglov

Comment 26

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. https://reviewboard.mozilla.org/r/57700/#review54796

Attachment #8759864 - Flags: review?(gkruglov) → review+

:Grisha Kruglov

Comment 27

•

8 years ago

Comment on attachment 8759865 [details] Bug 1261494 - Add assertions & logging to resetTest for sanity checking. https://reviewboard.mozilla.org/r/57702/#review54862 ::: mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testUnifiedTelemetryClientId.java:57 (Diff revision 1) > + > if (resetJSCache) { > resetJSCache(); > } > - getClientIdFile().delete(); > - getFHRClientIdFile().delete(); > + for (final File file : filesToDeleteOnReset) { > + file.delete(); Might also be useful to check return value of the `delete` call, but perhaps it doesn't matter since you're checking `exists` value right after.

Attachment #8759865 - Flags: review?(gkruglov) → review+

:Grisha Kruglov

Comment 28

•

8 years ago

Comment on attachment 8759866 [details] Bug 1261494 - Add logging before each test to help debug intermittent. https://reviewboard.mozilla.org/r/57704/#review54864

Attachment #8759866 - Flags: review?(gkruglov) → review+

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 29

•

8 years ago

Georg suggested trying to prevent Gecko from creating new client ID files through method ordering: Gecko caches the client ID so if we preload the client ID into the cache, it won't try to create a new file. As such, we can keep the cache full until we decide it's time for Gecko to read/create a new client ID file. There is still an opportunity for Gecko to attempt to access the client ID file before we can call this method, however, it greatly reduces the it.

Flags: needinfo?(gfritzsche)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 30

•

8 years ago

I'm going to land the patches I wrote for this bug (so far) because they add helpful logging but I may end up undoing some of them as I implement comment 29.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 31

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/57700/diff/1-2/

Attachment #8759864 - Attachment description: Bug 1261494 - Remove outdated comment regarding disabling upload service. → Bug 1261494 - Block Gecko from screwing with the client ID file by resetting the cache at the last minute.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

8 years ago

Attachment #8759865 - Attachment is obsolete: true

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

8 years ago

Attachment #8759866 - Attachment is obsolete: true

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 32

•

8 years ago

https://hg.mozilla.org/integration/fx-team/rev/576a3ad4b084e3772951cab80695519fbd92f8c5 Bug 1261494 - Remove outdated comment regarding disabling upload service. r=grisha https://hg.mozilla.org/integration/fx-team/rev/67b5ee17fc49e542c717c6b0c7710bd7307a7829 Bug 1261494 - Add assertions & logging to resetTest for sanity checking. r=grisha https://hg.mozilla.org/integration/fx-team/rev/389014745e195369fb0930d0edc542a137649e24 Bug 1261494 - Add logging before each test to help debug intermittent. r=grisha

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 33

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=707f6bf7c6c8

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

8 years ago

Attachment #8759864 - Flags: review+ → review?(gkruglov)

Carsten Book [:Tomcat]

Comment 34

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/576a3ad4b084 https://hg.mozilla.org/mozilla-central/rev/67b5ee17fc49 https://hg.mozilla.org/mozilla-central/rev/389014745e19

:Grisha Kruglov

Comment 35

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. https://reviewboard.mozilla.org/r/57700/#review55802 That resetJSCache method is interesting... LGTM, hacky stuff though.

Attachment #8759864 - Flags: review?(gkruglov) → review+

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 36

•

8 years ago

https://hg.mozilla.org/integration/fx-team/rev/370eaabb204751e618b049d7cbc0cc8298d7b2c9 Bug 1261494 - Block Gecko from screwing with the client ID file by resetting the cache at the last minute. r=grisha

Carsten Book [:Tomcat]

Comment 37

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/370eaabb2047

Comment hidden (Intermittent Failures Robot)

Geoff Brown [:gbrown]

Comment 45

•

8 years ago

:mcomella -- This is still a very frequent failure. Can it be skipped?

Flags: needinfo?(michael.l.comella)

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 48

•

8 years ago

(In reply to Geoff Brown [:gbrown] from comment #45) > :mcomella -- This is still a very frequent failure. Can it be skipped? No – we use it to ensure we can correctly retrieve the client ID, Java <-> JS, which we can't do with a unit test. I'll prioritize fixing this. Also: (In reply to Georg Fritzsche [:gfritzsche] from bug 1261596 comment #9) > Michael, can you tell whats going on here? > Could this be fallout from bug 1233986?

Flags: needinfo?(michael.l.comella)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 49

•

8 years ago

It looks like the increase in failures started around 6/26 & 6/29 (comment 39 & comment 40). The patch in this bug landed on 6/12 (comment 37) so it's unlikely to have negatively affected the failure rate [1]. The patch in bug 1233986 landed several times but was backed out, finally landing on 6/28 (bug 1233986 comment 54), which is the middle of the range I saw above for the increase in failures. As the most likely cause, I'll take a look at the code and see what could have affected this test. [1]: Tangentially, the patch in this bug appears to have reduced the failure rate (prior to the 6/26-6/29 increase) but not have eliminated it entirely, like I expected it to.

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 57

•

8 years ago

Looking at the errors, it looks like we timeout retrieving the client ID the first time [1]. My biggest lead is: 1) Gecko retrieves the client ID and registers its `.then(...)` callbacks 2) We attempt to retrieve the client ID by registering our `.then(...)` callbacks 3) The client ID is retrieved & the promises resolve. Since promises resolve in the order they were registered, Gecko attempts upload 4) Gecko's upload timeout (90s [2]) is longer than our "retrieve client ID from JS" timeout (20s) so we fail. bug 1233986 moved Gecko telemetry upload to occur earlier during startup so its more likely to run into this Georg, do we attempt to upload during automation? Specifically the Android rocobop tests. I'll drop a try run with an increased Java timeout (120s?) to see if this appears to fix the issue. [1]: https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/testUnifiedTelemetryClientId.java#77 [2]: https://dxr.mozilla.org/mozilla-central/rev/88bebcaca249aeaca9197382e89d35b02be8292e/toolkit/components/telemetry/TelemetrySend.jsm#68

Flags: needinfo?(gfritzsche)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 58

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b5c9f02d79b

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 59

•

8 years ago

Georg, also would the JS _log.trace statements appear in the test output? I can't seem to find any – I'm specifically looking for those TelemetrySend calls. If not, is there a way to enable them or add my own log statements?

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 61

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3fc4a3de8be

Georg Fritzsche [:gfritzsche]

Comment 62

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #59) > Georg, also would the JS _log.trace statements appear in the test output? I > can't seem to find any – I'm specifically looking for those TelemetrySend > calls. > > If not, is there a way to enable them or add my own log statements? I don't know where the testing prefs are set for those Fennec tests, but you enable this by setting e.g. "toolkit.telemetry.log.level" to "Trace".

Flags: needinfo?(gfritzsche)

Georg Fritzsche [:gfritzsche]

Comment 63

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #57) > Looking at the errors, it looks like we timeout retrieving the client ID the > first time [1]. > > My biggest lead is: > 1) Gecko retrieves the client ID and registers its `.then(...)` callbacks > 2) We attempt to retrieve the client ID by registering our `.then(...)` > callbacks > 3) The client ID is retrieved & the promises resolve. Since promises > resolve in the order they were registered, Gecko attempts upload > 4) Gecko's upload timeout (90s [2]) is longer than our "retrieve client ID > from JS" timeout (20s) so we fail. > > bug 1233986 moved Gecko telemetry upload to occur earlier during startup so > its more likely to run into this It should just have changed to submitting "too early" pings with the client-id instead of without, although it could have changed the timing a little bit. > Georg, do we attempt to upload during automation? Specifically the Android > rocobop tests. It is possible, although we shouldn't. We tried to set the "toolkit.telemetry.server" pref to a dummy address for the tests we found out about: https://dxr.mozilla.org/mozilla-central/search?q=toolkit.telemetry.server&redirect=false

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 65

•

8 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #62) > (In reply to Michael Comella (:mcomella) from comment #59) > > Georg, also would the JS _log.trace statements appear in the test output? I > > can't seem to find any – I'm specifically looking for those TelemetrySend > > calls. > > > > If not, is there a way to enable them or add my own log statements? > > I don't know where the testing prefs are set for those Fennec tests, but you > enable this by setting e.g. "toolkit.telemetry.log.level" to "Trace". Geoff, do you know how I might enable this about:config setting during a try run?

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Comment 66

•

8 years ago

Try adding something here: http://hg.mozilla.org/mozilla-central/annotate/tip/testing/mochitest/runrobocop.py#l228

Flags: needinfo?(gbrown)

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 68

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #61) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3fc4a3de8be fyi, this push was invalidated because of bug 1286942. Waiting on bug 1286942 to be fixed in order to push to try and fix this.

Depends on: 1286942

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 69

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7246befb62f

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 70

•

8 years ago

My push from comment 69 made no functional changes and failed this test 7/7 times (+1 infra issue, +1 still running). My push from comment 58 added a timeout longer than the telemetry upload time and passed this test 13/13 times (+1 infra issue). I have a high suspicion this is related to attempting to upload during the test. I will try to add some logging (via comment 66) to confirm this suspicion.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 71

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0ce448e5a5e

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 73

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #70) > My push from comment 58 added a timeout longer than the telemetry upload > time and passed this test 13/13 times (+1 infra issue). In the worst case (or if someone wants this fixed right now), we can increase the timeout so this no longer fails in automation and test fixes via try. I'll have more time to dig in tomorrow.

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 75

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=479e1a07dec9

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 76

•

8 years ago

Looking at the logcat from the first failure from the push from comment 71 [1], submitPing does not run until after the test has already timed-out and failed. However, I notice that Telemetry's deferred init runs in between when we initially request the clientID from JS and when we timeout. The getClientID method seems to return correctly here because the log statements indicate the methods before & after it run. Perhaps the telemetry init is taking too long and, given JS' single-threaded nature, is blocking us from getting the client ID. My push in comment 75 increases the delayed init timeout so hopefully Java gets control of the JS thread sooner. [1]: https://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/870f1bd220acef368e6be70fb47a75b83beb9cbfb60399e33d601e5536ca1fa0f5a6b5a880b2cd10bcca49a5b95b92dbdee371952844ca7efa05cf4c2aa05854

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 77

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0fc578deaf9

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 81

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=92c05436c1c3

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 82

•

8 years ago

My last push (comment 81) added log statements [1] to say how long it takes to retrieve the client ID from JS. In both runs, it takes 30 seconds to retrieve it initially and between 0 & 1 seconds to retrieve it each subsequent time. Georg, do you know of any timeouts that could cause the client ID to take 30 seconds to retrieve? e.g. upload timeout, Gecko load timeout, etc. If not, I'm just going to increase the timeout to 35 seconds because context-switching between my context graph work and fennec work is really expensive. [1]: Look for the logtag "lol"

Flags: needinfo?(gfritzsche)

Comment hidden (Intermittent Failures Robot)

Georg Fritzsche [:gfritzsche]

Comment 84

•

8 years ago

This sounds like we actually hit the Telemetry initialization delay in the test. We delay it to 60 seconds after "profile-after-change", maybe the other 30sec already passed waiting on other operations before you timed it? There are two ways around this: * directly initializing Telemetry to trigger "testing" behavior: https://dxr.mozilla.org/mozilla-central/rev/462dc6b44adb4573e8ce6b0dd688c206ebd516f7/toolkit/components/telemetry/tests/unit/test_TelemetryController.js#109 * setting the pref "toolkit.telemetry.initDelay" to a reasonable value: http://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/preferences.html#testing

Flags: needinfo?(gfritzsche)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 85

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dea6fee5beb6

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 86

•

8 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #84) > This sounds like we actually hit the Telemetry initialization delay in the > test. This appears to be the issue – my latest push reduced the telemetry initialization delay to one second and it took 0 seconds to retrieve the client ID the first time (as opposed to the duration comment 82). I wonder if that's why this test has always had intermittent issues. To second guess our assumptions, why does retrieving a client ID, which I'd expect to have no (or few) dependencies, require a general "telemetry init" to occur? Ignoring that for now, the quick solutions would be: (In reply to Georg Fritzsche [:gfritzsche] from comment #84) > * setting the pref "toolkit.telemetry.initDelay" to a reasonable value: 1) I would prefer to do this – telemetry will automatically be ready whenever anyone wants to test it & it's simpler than sending a message to JS. However, could this affect startup time for autophone tests? Will it noticeably affect the runtime of our robocop test suites? And using: > * directly initializing Telemetry to trigger "testing" behavior: 2) We could do this for each test that needs to test telemetry but it seems unreasonable to force people to remember to initialize telemetry for each test. 3) We can make this call during BaseTest or UITest initialization, which would prevent it from affecting autophone & friends (from #1), but could still affect the overall robocop runtime duration. Geoff, do you have an opinion?

Flags: needinfo?(gbrown)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 87

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #86) > To second guess our assumptions, why does retrieving a client ID, which I'd > expect to have no (or few) dependencies, require a general "telemetry init" > to occur? Then again, while I feel this could be worthwhile to do, we'd also run into this issue again if anyone else tries to write a integration test for telemetry so we might as well fix the problem generically for all telemetry code.

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 88

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2017843315fe

Geoff Brown [:gbrown]

Comment 89

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #86) > Geoff, do you have an opinion? I trust your judgement. Personally, I would go with option #1, setting the pref for all robocop tests, as you did in your try push. There is very little use of robocop in autophone, and currently there are no autophone performance tests that use robocop. I'd check on the robocop test suite run time, but unless the difference is significant, I think it will be fine.

Flags: needinfo?(gbrown)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 90

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/57700/diff/2-3/

Attachment #8759864 - Attachment description: Bug 1261494 - Block Gecko from screwing with the client ID file by resetting the cache at the last minute. → Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing.

Attachment #8759864 - Flags: review+ → review?(gbrown)

Geoff Brown [:gbrown]

Comment 91

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. https://reviewboard.mozilla.org/r/57700/#review64670

Attachment #8759864 - Flags: review?(gbrown) → review+

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 92

•

8 years ago

https://hg.mozilla.org/integration/fx-team/rev/7c1ddc8223812aed002c47b4a169d7550a7c9c99 Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. r=gbrown

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 93

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. Approval Request Comment [Feature/regressing bug #]: initially implemented w/ small amount of intermittents, other bugs made it worse [User impact if declined]: Users are unaffected but we will see a lot of intermittents [Describe test coverage new/current, TreeHerder]: Tested locally. [Risks and why]: Extremely low. We change an about:config preference before running robocop tests. In the most likely worst case, the additional processing time could slow down the running of our robocop test suites. However, in my testing, I did not see any noticeable impact. [String/UUID change made/needed]: None

Attachment #8759864 - Flags: approval-mozilla-aurora?

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 94

•

8 years ago

Correction: (In reply to Michael Comella (:mcomella) from comment #93) > [Describe test coverage new/current, TreeHerder]: Tested locally. And on try (comment 88).

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 95

•

8 years ago

Failures on beta & aurora seem uncommon so we don't need to land this change there, but it could be nice. Since merge is soon, I did not flag for uplift to 48 (even though we don't change any shipping code).

Comment hidden (Intermittent Failures Robot)

Georg Fritzsche [:gfritzsche]

Comment 97

•

8 years ago

(In reply to Michael Comella (:mcomella) from comment #86) > (In reply to Georg Fritzsche [:gfritzsche] from comment #84) > > This sounds like we actually hit the Telemetry initialization delay in the > > test. > > This appears to be the issue – my latest push reduced the telemetry > initialization delay to one second and it took 0 seconds to retrieve the > client ID the first time (as opposed to the duration comment 82). I wonder > if that's why this test has always had intermittent issues. > > To second guess our assumptions, why does retrieving a client ID, which I'd > expect to have no (or few) dependencies, require a general "telemetry init" > to occur? We just trigger the client id as part of the general telemetry init. We don't want to hit the disk early to not impact startup performance. > Ignoring that for now, the quick solutions would be: > > (In reply to Georg Fritzsche [:gfritzsche] from comment #84) > > * setting the pref "toolkit.telemetry.initDelay" to a reasonable value: > > 1) I would prefer to do this – telemetry will automatically be ready > whenever anyone wants to test it & it's simpler than sending a message to > JS. However, could this affect startup time for autophone tests? Will it > noticeably affect the runtime of our robocop test suites? If this startup performance hit is fine, that sounds like the easiest option. If startup perf is a concern here, maybe you can set prefs per-test instead before starting Gecko?

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

8 years ago

Keywords: leave-open

Comment hidden (Intermittent Failures Robot)

Carsten Book [:Tomcat]

Comment 99

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7c1ddc822381

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox50: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 50

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

status-firefox48: affected → wontfix

status-firefox49: --- → affected

Sylvestre Ledru [:Sylvestre]

Comment 100

•

8 years ago

Comment on attachment 8759864 [details] Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. Fix an intermittent, taking it.

Attachment #8759864 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Comment 101

•

8 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #97) > > To second guess our assumptions, why does retrieving a client ID, which I'd > > expect to have no (or few) dependencies, require a general "telemetry init" > > to occur? > > We just trigger the client id as part of the general telemetry init. > We don't want to hit the disk early to not impact startup performance. To be explicit, the issue I'm having is that when I try to access the client ID without changing the telemetry init delay, the ID takes 30 seconds the first time to access [1]. When I change the telemetry init delay to 1 second, it takes 1 second to access the client ID the first time. The issue presumably occurs because telemetry is not initialized yet. To me, this is unexpected because retrieving the client ID doesn't appear to have any dependencies on waiting for telemetry to be initialized first. If I had to guess, I'd think this is an issue in the way we spawn and handle the async tasks or promises to init telemetry & access the client ID. For example, an init telemetry task initially tries to access the client ID, e.g.: ClientID.getClientID().then(function (clientID) { // Do some stuff }); Since we also supply a method to `Promise.then` and these functions run sequentially, `// Do some stuff` has to finish running before my code, which accesses the client ID in the Java integration test, is allowed to run. While I saw no issues with the Promise code specifically that would cause this to be an issue, I don't know how the async task code, yields, etc. would affect the Promise resolution. For example, does the async task code register Promise handlers when the async tasks are run, or when they're read by the interpreter? If it's the latter, the handler added by telemetry init could be registered prior to the Java test handler, blocking it from running until the telemetry init handler is run, which is not until telemetry init occurs. In another possible case, how does promise resolution work with yield (e.g. how we frequently access the client ID in the telemetry code [2])? Does the code after the `yield` statement all have to run before the next promise handler is called? If so, in my provided sample, we'd have to wait for the upload attempt to finish before we can return (and thus run the Java test handler). [1]: Subsequent accesses take about 1 second. [2]: https://dxr.mozilla.org/mozilla-central/rev/ddeb0295df692695b36295177d6790e5393e1f9a/toolkit/components/telemetry/TelemetryController.jsm#463

Ryan VanderMeulen [:RyanVM]

Comment 102

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/42c3abe40b3b

status-firefox49: affected → fixed

Comment hidden (Intermittent Failures Robot)

Ryan VanderMeulen [:RyanVM]

Comment 104

•

8 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/f0c525d1274c

Comment hidden (Intermittent Failures Robot)

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Assignee

Updated

•

8 years ago

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Bug 1261494 - Reduce telemetry init delay to 1 second for integration testing. 8 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 58 bytes, text/x-review-board-request	gbrown : review+ Sylvestre : approval-mozilla-aurora+	Details
Bug 1261494 - Add assertions & logging to resetTest for sanity checking. 8 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 58 bytes, text/x-review-board-request	Grisha : review+	Details
Bug 1261494 - Add logging before each test to help debug intermittent. 8 years ago Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla] 58 bytes, text/x-review-board-request	Grisha : review+	Details