Closed Bug 1244295 Opened 10 years ago Closed 10 years ago

client ID is null for new Nightly installs, causing initial telemetry pings to fail

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(firefox46 fixed, firefox47 fixed, fennec46+)

RESOLVED FIXED
Firefox 47
Tracking Status
firefox46 --- fixed
firefox47 --- fixed
fennec 46+ ---

People

(Reporter: mcomella, Assigned: mcomella)

References

Details

Attachments

(5 files)

When checking client ID in about:telemetry, it's initially null. It appears to be generated at some later point. Going to sync sign-in (Settings -> Sign in) appears to generate the ID. This does not affect local developer builds. Some thoughts from Finkle via irc: 13:52 <@mfinkle> mcomella: TelemetryController.jsm uses ClientID.jsm 13:53 <@mfinkle> http://mxr.mozilla.org/mozilla-central/source/toolkit/modules/ClientID.jsm 13:53 <@mfinkle> mcomella: http://mxr.mozilla.org/mozilla-central/source/toolkit/modules/ClientID.jsm#112 13:53 <@mfinkle> reads the clientId from <profile>/healthreport/state.json 13:56 <@mfinkle> mcomella: sorry, <profile>/datareporting/state.json is the main path 13:56 <@mfinkle> the healthreport path is a fallback
GeckoProfile.java should probably just ensure the clientId (state.json) is created.
+1. Allowing Java to read and write state.json is why we need thorough tests on both 'sides' that guarantee the format.
Since we can't wait on Gecko, we can't use ClientID.jsm and we'll have to copy it's logic. _doLoadClientID in ClientID.jsm looks like the method we want to copy: https://mxr.mozilla.org/mozilla-central/source/toolkit/modules/ClientID.jsm?rev=235ea6e28138#94 I'm not sure how to ensure the logic in these two files remains identical in case of future updates. Also, I think there are some issues as far as locking the file system is concerned to ensure we both don't try to read/write at the same time and put the filesystem & the client ID in memory into an inconsistent state.
(In reply to Michael Comella (:mcomella) from comment #3) > I'm not sure how to ensure the logic in these two files remains identical in > case of future updates. 1. It's generating a UUID. How much could it change? 2. Add comments in both places. 3. Add tests on both sides that run both code paths and will fail if the format changes. > Also, I think there are some issues as far as > locking the file system is concerned to ensure we both don't try to > read/write at the same time and put the filesystem & the client ID in memory > into an inconsistent state. 1. The odds of this happening are very slim. 2. Any discrepancy will be resolved after a restart. 3. Never generate-then-store. generate-write-then-read. 4. Optionally, check to see if Gecko is running. If it is, ask Gecko for the ID.
It'd be great to land this with the initial core ping (45 – bug 1205835) but this could get messy and we might want to prioritize bug 1243585 so I'm not married to the idea.
tracking-fennec: --- → 45+
This is not strictly necessary for bug 1205835.
No longer blocks: 1205835
Depends on: 1205835
Assignee: nobody → michael.l.comella
Georg, I notice ClientID.jsm validates the client ID [1] before putting it into the cache – how important is the validation? I was considering leaving it out of my implementation to reduce the code complexity. [1]: https://mxr.mozilla.org/mozilla-central/source/toolkit/modules/ClientID.jsm#206
Flags: needinfo?(gfritzsche)
Additionally, we'll try to migrate the client ID from FHR if it doesn't already exist. One discrepancy is that ClientID.jsm validates the client IDs with a regex. It seemed unnecessary so I left it out. Review commit: https://reviewboard.mozilla.org/r/34875/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/34875/
Attachment #8719086 - Flags: review?(mark.finkle)
I tested locally and this appears to work. Unfortunately, I can't test the FHR migration locally. As such, still TODO: tests!
Nick, I'm using FakeProfileTestCase in order to write my tests – these tests don't yet run in automation, right?
Flags: needinfo?(nalexander)
(In reply to Michael Comella (:mcomella) from comment #10) > Nick, I'm using FakeProfileTestCase in order to write my tests – these tests > don't yet run in automation, right? Looks like this is unused entirely: https://dxr.mozilla.org/mozilla-central/search?q=FakeProfileTestCase&case=true. I don't know where this originates. In general, no -- nothing under background/junit3 runs in automation. The march continues, however: Bug 1064004.
Flags: needinfo?(nalexander)
Attachment #8719086 - Flags: review?(mark.finkle) → review+
Comment on attachment 8719086 [details] MozReview Request: Bug 1244295 - Create client ID if it doesn't already exist in GeckoProfile. r=mfinkle https://reviewboard.mozilla.org/r/34875/#review31501 ::: mobile/android/base/java/org/mozilla/gecko/GeckoProfile.java:765 (Diff revision 1) > + public boolean mkParentDirs(final String filename) { How about: ensureParentDir ?
(In reply to Michael Comella (:mcomella) from comment #7) > Georg, I notice ClientID.jsm validates the client ID [1] before putting it > into the cache – how important is the validation? I was considering leaving > it out of my implementation to reduce the code complexity. We have seen malformed client ids coming in from the wild. There could be a variety of issues that cause this (JS engine bugs, file corruption, ...) and as we expect to look up individual clients by UUID this breaks things. The client id is really central to much of analysis, so lets validate it.
Flags: needinfo?(gfritzsche)
(In reply to Nick Alexander :nalexander from comment #11) > Looks like this is unused entirely: > https://dxr.mozilla.org/mozilla-central/ > search?q=FakeProfileTestCase&case=true. I don't know where this originates. I remember it being used for some Services work – perhaps it became unused when FHR was removed? > In general, no -- nothing under background/junit3 runs in automation. The > march continues, however: Bug 1064004. Is there a better place to run these tests? For the things I'm looking to test, I think I could actually use an off-device unit test, caveat that it relies on GeckoProfile which relies on some Android stuff (though perhaps not during construction – does it make sense to run unit tests in that case?).
Flags: needinfo?(nalexander)
https://reviewboard.mozilla.org/r/34875/#review31501 > How about: ensureParentDir ? I was copying the framework method, `File.mkdirs`, but since I changed the return value from that method, I might as well make it more readable while I'm at it.
(In reply to Michael Comella (:mcomella) from comment #15) > (In reply to Nick Alexander :nalexander from comment #11) > > Looks like this is unused entirely: > > https://dxr.mozilla.org/mozilla-central/ > > search?q=FakeProfileTestCase&case=true. I don't know where this originates. > > I remember it being used for some Services work – perhaps it became unused > when FHR was removed? Nuke it to the ground! > > In general, no -- nothing under background/junit3 runs in automation. The > > march continues, however: Bug 1064004. > > Is there a better place to run these tests? For the things I'm looking to > test, I think I could actually use an off-device unit test, caveat that it > relies on GeckoProfile which relies on some Android stuff (though perhaps > not during construction – does it make sense to run unit tests in that > case?). If you /can/ write junit4/off-device tests, I would: they're *so much easier* to work with. However, I don't really know how much Android stuff you can test -- I'm still learning about Robolectric myself. (For example, I recently hit "NoClassDefFound" errors using mobile/android/thirdparty code. This gets better, I think, after we find a way to address Bug 1233882.) You can often use Mockito to transparently mock the bits requiring "real Android" or "real Fennec", allowing you to test the JSON handling, storage, and network bits transparently. I've been basing my tests off sebastian's downloadable content work, which sets a great model here. You could do the same, and ask him or me questions?
Flags: needinfo?(nalexander)
Comment on attachment 8719086 [details] MozReview Request: Bug 1244295 - Create client ID if it doesn't already exist in GeckoProfile. r=mfinkle Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34875/diff/1-2/
Additionally, we log some of the Exceptions thrown while retrieving the client ID to make it clearer what is happening. The underlying GeckoProfile methods ensure the profile path is not printed so we don't have to worry about leaking that. Review commit: https://reviewboard.mozilla.org/r/35347/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/35347/
Attachment #8720532 - Flags: review?(mark.finkle)
Added testGetDir to sanity check how the profile is set up for the test and left it in as a bonus. Additionally, changed access levels on the ensureParentDirs method because it only needed to be `protected` for testing. Review commit: https://reviewboard.mozilla.org/r/35349/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/35349/
Attachment #8720533 - Flags: review?(mark.finkle)
Comment on attachment 8720533 [details] MozReview Request: Bug 1244295 - Add junit4 tests for GeckoProfile.getClient & friends. r=mfinkle Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35349/diff/1-2/
Comment on attachment 8720541 [details] MozReview Request: Bug 1244295 - Add regression warning to ClientID.jsm getClientID method. r=gfritzsche Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35351/diff/1-2/
Attachment #8720541 - Attachment description: MozReview Request: Bug 1244295 - Add regression warning to ClientID.jsm getClientID method. r=gfritzche → MozReview Request: Bug 1244295 - Add regression warning to ClientID.jsm getClientID method. r=gfritzsche
Attachment #8720541 - Flags: review?(gfritzsche)
This isn't making 45.
tracking-fennec: 45+ → 46+
Comment on attachment 8720533 [details] MozReview Request: Bug 1244295 - Add junit4 tests for GeckoProfile.getClient & friends. r=mfinkle https://reviewboard.mozilla.org/r/35349/#review32003 I added some small comments. Nice tests! I learned something! ::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/TestGeckoProfile.java:32 (Diff revision 2) > + public static final String PROFILE_NAME = "profileName"; nit: private? ::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/TestGeckoProfile.java:37 (Diff revision 2) > + public TemporaryFolder dirContainingProfile = new TemporaryFolder(); nit: newline after, just to separate the annotated block. I did not know about @Rule -- nifty! ::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/TestGeckoProfile.java:43 (Diff revision 2) > + final Context context = RuntimeEnvironment.application; I have some changes to the Robolectric application to do with folding base into app that might break you. We'll be able to find a way forward, I'm confident. ::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/TestGeckoProfile.java:55 (Diff revision 2) > + public void testGetClientId() throws Exception { Prefer to split this up into multiple tests if you can. The less state, the better. ::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/TestGeckoProfile.java:215 (Diff revision 2) > + nit: kill newline.
Attachment #8720533 - Flags: review+
Comment on attachment 8720541 [details] MozReview Request: Bug 1244295 - Add regression warning to ClientID.jsm getClientID method. r=gfritzsche https://reviewboard.mozilla.org/r/35351/#review32093 Are there tests (now or coming up) that cover: * scenario 1 (creating a client id): * Fennec starts on fresh profile * Java code creates datareporting/state.json with a known client id * Gecko starts * Gecko generates and sends saved-session * We assert that the client id in the ping is the same * scenario 2 (picking up existing client ids): * Let the profile already contain {datareporting,healthreport}/state.json in the current format * The Java code picks up the client id * Trigger & receive a core ping, assert the client id That doesn't cover everything, but should get make the dependencies relatively robust already.
Attachment #8720541 - Flags: review?(gfritzsche) → review+
Comment on attachment 8720532 [details] MozReview Request: Bug 1244295 - Validate client IDs before sending them in a Telemtry report. r=mfinkle https://reviewboard.mozilla.org/r/35347/#review32105
Attachment #8720532 - Flags: review?(mark.finkle) → review+
Comment on attachment 8720533 [details] MozReview Request: Bug 1244295 - Add junit4 tests for GeckoProfile.getClient & friends. r=mfinkle https://reviewboard.mozilla.org/r/35349/#review32107 LGTM
Attachment #8720533 - Flags: review?(mark.finkle) → review+
https://reviewboard.mozilla.org/r/35349/#review32003 > nit: private? We spoke on IRC and nalexander said I could go either way. It looks like the JUnit wiki does list members as private [1] so I'll do it too. [1]: https://github.com/junit-team/junit/wiki/Test-fixtures > I have some changes to the Robolectric application to do with folding base into app that might break you. We'll be able to find a way forward, I'm confident. After I reimported, everything seems to be working correctly. > Prefer to split this up into multiple tests if you can. The less state, the better. Oh yes, this is much clearer – thanks for the feedback. Additionally, this made me split out `writeJSONObjectToFile` and friends to a new class: `helpers.FileUtil`.
Comment on attachment 8719086 [details] MozReview Request: Bug 1244295 - Create client ID if it doesn't already exist in GeckoProfile. r=mfinkle Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34875/diff/2-3/
Comment on attachment 8720532 [details] MozReview Request: Bug 1244295 - Validate client IDs before sending them in a Telemtry report. r=mfinkle Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35347/diff/1-2/
Comment on attachment 8720533 [details] MozReview Request: Bug 1244295 - Add junit4 tests for GeckoProfile.getClient & friends. r=mfinkle Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35349/diff/2-3/
Comment on attachment 8720541 [details] MozReview Request: Bug 1244295 - Add regression warning to ClientID.jsm getClientID method. r=gfritzsche Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35351/diff/2-3/
(In reply to Georg Fritzsche [:gfritzsche] from comment #26) > Are there tests (now or coming up) that cover: > > * scenario 1 (creating a client id): > * Fennec starts on fresh profile > * Java code creates datareporting/state.json with a known client id > * Gecko starts > * Gecko generates and sends saved-session > * We assert that the client id in the ping is the same Not at the moment – there are no tests that interact with Java & Gecko. > * scenario 2 (picking up existing client ids): > * Let the profile already contain > {datareporting,healthreport}/state.json in the current format > * The Java code picks up the client id > * Trigger & receive a core ping, assert the client id Yes, however I manually create the client ID files in datareporting & healthreport in Java, rather than creating them via Gecko. Additionally, I'm unit testing the code that gets the profile ID (GeckoProfile) rather than the code that constructs the telemetry core ping (TelemetryPingGenerator) so I don't actually verify the client ID from GeckoProfile is successfully put in a ping. I have two more bugs open for testing: * bug 1249156 – tests for ClientID.jsm's getClientID method (I was thinking unit tests) * bug 1241681 - generic tests for telemetry code I'll add these as a TODO in bug 1241681 as integration tests.
(In reply to Michael Comella (:mcomella) from comment #35) > I'll add these as a TODO in bug 1241681 as integration tests. Nevermind, I filed bug 1249491. bug 1241681 should be better sorted for upload testing, I think.
https://hg.mozilla.org/integration/fx-team/rev/57a14ecc71e8728766853091e12b462f250c5c49 Bug 1244295 - Create client ID if it doesn't already exist in GeckoProfile. r=mfinkle https://hg.mozilla.org/integration/fx-team/rev/a805bd0f0a35c97a2f80491672981531ba39a85c Bug 1244295 - Validate client IDs before sending them in a Telemtry report. r=mfinkle https://hg.mozilla.org/integration/fx-team/rev/414d0894875cd433ee42d3ffd5bedbfb4dd6bc4d Bug 1244295 - Add junit4 tests for GeckoProfile.getClient & friends. r=mfinkle https://hg.mozilla.org/integration/fx-team/rev/5058ad56af493c51b0135ac2d7b34df0b829b76d Bug 1244295 - Add regression warning to ClientID.jsm getClientID method. r=gfritzsche https://hg.mozilla.org/integration/fx-team/rev/b995a277b126f554e86b555408cc636d0f92dade Bug 1244295 - Add getClientId test for when client ID file already exists. r=me
While writing the uplift request, I realized I never tested the state where there is an existing Gecko client ID (except in my unit test where I created the file on disk myself, so it's possible I screwed it up). Before uplifting, it'd be good to do one (or preferably both) of: 1) verify all of the Nightly users didn't just get new client IDs 2) Write the integration test Georg described in comment 26 (bug 1249491)
Flags: needinfo?(michael.l.comella)
(like verification request in bug 1244861 comment 24) Alessio, an analysis request! I landed code here to create the client ID file from Java if it doesn't exist, rather than waiting for Gecko to create the file. However, it's (ever so slightly) possible that my code doesn't work as expected and overwrites the existing file with a new client ID. Can you verify that client IDs we were seeing before this patch landed are still being seen after this patch landed? The code is still only in nightly.
Flags: needinfo?(michael.l.comella) → needinfo?(alessio.placitelli)
(move my NI to bug 1249491, clear NI because it's too late to uplift)
Actually, we can uplift to 46. NI to uplift.
Flags: needinfo?(michael.l.comella)
(In reply to Michael Comella (:mcomella) from comment #40) > (like verification request in bug 1244861 comment 24) Alessio, an analysis > request! I landed code here to create the client ID file from Java if it > doesn't exist, rather than waiting for Gecko to create the file. However, > it's (ever so slightly) possible that my code doesn't work as expected and > overwrites the existing file with a new client ID. Can you verify that > client IDs we were seeing before this patch landed are still being seen > after this patch landed? The code is still only in nightly. Hey :-D I'm not sure this could be answered reliably server-side. There could be other reasons for old client ids not appearing in the new submissions (e.g. clients stopping telemetry submission, profile resets, etc.). I think that's better tested client side :(
Flags: needinfo?(alessio.placitelli)
I think verifying this server-side would mean looking at retention (d1/d7/...) and see whether that regressed. I don't think that is stable enough on Nightly, although we could look at the beta/release retention numbers/graphs for regressions once we have them. I agree that manual QA testing client-side seems to be best here for now.
In addition to the java-side unit tests I've already written, I will look into bug 1249491.
Comment on attachment 8719086 [details] MozReview Request: Bug 1244295 - Create client ID if it doesn't already exist in GeckoProfile. r=mfinkle This uplift request affects all patches landed in this bug. Approval Request Comment [Feature/regressing bug #]: Likely fix for bug 1257589; feature originally implemented in bug 1205835 [User impact if declined]: We're currently not receiving the expected data from core telemetry pings and thus can't rely on the data. Assuming this is the proper fix, will we be able to use the core ping to important information about our users (see https://gecko.readthedocs.org/en/latest/toolkit/components/telemetry/telemetry/core-ping.html for data collected). [Describe test coverage new/current, TreeHerder]: In Aurora/Nightly for over a month. Unit test for Java method landed with patch. Java <-> JS integration tests are written and passing in bug 1249491 and half-reviewed. [Risks and why]: The biggest concern is that this could cause a race between Java and Gecko to write a client ID to the telemetry file (bug 1255657 to fix this issue). However, the probability is incredibly low and fixed on restart (e.g. I made the decision to ship with this possibility for simplicity purposes). See https://bugzilla.mozilla.org/show_bug.cgi?id=1249491#c8 for related discussion. If a simultaneous race occurs, the users' client ID can be changed (in which case, they appear to be a new user on both our dashboards) or the file invalidated (in which case, we should create a new valid file and the user gets a new ID, again appearing as a new user). As for other risks, we manipulate JSON objects and write them to disk, both of which are overly verbose and tricky in Java. That being said, between the tests in this bug and the tests in bug 1249491, the path is well tested and I'm confident they are correct. [String/UUID change made/needed]: None
Flags: needinfo?(michael.l.comella)
Attachment #8719086 - Flags: approval-mozilla-beta?
Nick, it'd be cool if we could build this as a release build and verify it fixes the issue – do you know what the build flags are for that?
Flags: needinfo?(nalexander)
Verifying this sounds like a good idea. I am a bit nervous about uplifting this for beta 8 (going to build next Monday). Can we live with it riding with aurora? If we land this in beta 8 and it causes issues, will you be around next week to help with a fix or backout?
Flags: needinfo?(michael.l.comella)
rnewman do you have an opinion here? I notice you are involved over in bug 1255657, so maybe you can help evaluate this.
Flags: needinfo?(rnewman)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #49) > rnewman do you have an opinion here? I notice you are involved over in bug > 1255657, so maybe you can help evaluate this. This is a blocker for making our core telemetry useful, so assuming risk doesn't spread outside of the component, I'd throw this as far up the train as we can manage. Give me ten minutes to review the landed code and I'll come back with a more precise assessment of risk/reward.
The changes here look safe; the only risk I see is that some filesystem oddity causes us to error out attempting to get the client ID during ping submission at launch, but that's already safely handled inside the telemetry uploader. In short, I don't think there's any chance of this making things worse than they are. Re Bug 1255657: the issue that bug will address is about data consistency and the value of analytics, and it should only be a rare edge case. We don't need to wait for that in order to reduce risk in the client. If we can verify this fix before uplift, then great; regardless, I think this is OK to take for B8.
Flags: needinfo?(rnewman)
I agree with Richard and I will be available next week to handle any possible fallout.
Flags: needinfo?(michael.l.comella)
(In reply to Michael Comella (:mcomella) from comment #47) > Nick, it'd be cool if we could build this as a release build and verify it > fixes the issue – do you know what the build flags are for that? You can configure the build to be like this pretty easily. You more or less use https://dxr.mozilla.org/mozilla-central/source/mobile/android/config/mozconfigs/android-api-15/release in your mozconfig. You can't, of course, sign the APK in the same way. I'm not really sure if there are other differences; RyanVM runs these "beta-on-try" builds that must tick off these boxes.
Flags: needinfo?(nalexander)
Ryan, assuming this gets uplifted, when you run your "beta-on-try" build for beta 8, can you ping me? I'd like to download it and make sure the uploader works as expected. (I think it's more effective for me to use this than to lose development time clobbering for a native build locally)
Flags: needinfo?(ryanvm)
I'm confused what you're asking for here. You want to try this patch out on a simulated Beta build now before it gets uplifted? If so, here's a link to a Try push from today off Aurora tip. https://treeherder.mozilla.org/#/jobs?repo=try&revision=b08a81f65b90 If you're looking for something else, let me know.
Flags: needinfo?(ryanvm)
Comment on attachment 8719086 [details] MozReview Request: Bug 1244295 - Create client ID if it doesn't already exist in GeckoProfile. r=mfinkle Fix for undercounting users on android, please uplift to beta. This should make it into beta 8 build today.
Attachment #8719086 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #55) > I'm confused what you're asking for here. You want to try this patch out on > a simulated Beta build now before it gets uplifted? Can I get a simulated release with this patch (e.g. on beta after the uplift, if easier)?
Flags: needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #59) > You can just download the tinderbox build, no? i.e. > http://archive.mozilla.org/pub/mobile/tinderbox-builds/mozilla-beta-android- > api-15/1459796726/ But this is configured as a beta build, right? The configs on the release builds are different (e.g. telemetry becomes opt-in) which is why we missed this bug the first time around. Is it possible/do you make builds using the release config?
Flags: needinfo?(ryanvm)
Ah yes, Telemetry. That keys off the release channel, so a push to Try off mozilla-beta with |--enable-update-channel=release| set in common.override should be all that's necessary to get a simulated build. And you could also set |ac_add_options --with-branding=mobile/android/branding/official| if you wanted to be really fancy about it. But no, I typically don't run Beta-as-Release Try simulations since they're intended to be as close to identical as humanly possible as-is.
Flags: needinfo?(ryanvm)
Kevin, would it be possible for you to build & verify the release build (as in comment 61) to test that this fix has the desired effect before merge? In the current system, you should see "Unable to get client ID to generate core ping: returning" in the logcat output each time Fennec is opened (either warm or cold start) – the verification would be to just ensure this text doesn't appear! You should be able to grep for "GeckoTelemetryUpload" in the logs to see the output.
Flags: needinfo?(kbrosnan)
Already in release, don't need verification.
Flags: needinfo?(kbrosnan)
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: