Closed
Bug 1370111
Opened 7 years ago
Closed 7 years ago
Crash in java.lang.IllegalStateException: Telemetry missing ''started'' timestamp at org.mozilla.gecko.sync.telemetry.TelemetryCollector.build(TelemetryCollector.java)
Categories
(Firefox for Android Graveyard :: Android Sync, defect, P1)
Tracking
(firefox54 unaffected, firefox55blocking fixed)
RESOLVED
FIXED
Firefox 55
Tracking | Status | |
---|---|---|
firefox54 | --- | unaffected |
firefox55 | blocking | fixed |
People
(Reporter: marcia, Assigned: Grisha)
References
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(2 files, 1 obsolete file)
This bug was filed from the Socorro interface and is report bp-2ecd301c-edaa-47b8-8287-7e4d70170604. ============================================================= New crash which showed up in Build ID 20170604100226: http://bit.ly/2rqn6F8. So far 20 crashes/30 installs Part of the signature in the second line mentions Aurora: data@app@org.mozilla.fennec_aurora-2@base.apk@classes.dex@0xec1fc5
Comment 1•7 years ago
|
||
This regression has been introduced by patch [1] to fix bug 1308337. :Grisha, could you investigate ? [1] https://hg.mozilla.org/mozilla-central/rev/02d589c299a4
Blocks: 1308337
Flags: needinfo?(gkruglov)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → gkruglov
Status: NEW → ASSIGNED
Component: General → Android Sync
Flags: needinfo?(gkruglov)
Product: Firefox for Android → Android Background Services
Assignee | ||
Updated•7 years ago
|
Priority: -- → P1
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 8•7 years ago
|
||
mozreview-review |
Comment on attachment 8874636 [details] Bug 1370111 - Post: More explicit global error handling for Sync Telemetry https://reviewboard.mozilla.org/r/145974/#review149888 ::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/telemetry/TelemetryCollector.java:52 (Diff revision 2) > - @VisibleForTesting protected ExtendedJSONObject error; > - private String hashedUID; > - private String hashedDeviceID; > + // It's possible that these fields are read/written from different threads. > + // Volatile is used to ensure memory visibility. > + @VisibleForTesting protected volatile ExtendedJSONObject error; > + private volatile String hashedUID; > + private volatile String hashedDeviceID; > private final ArrayList<Bundle> devices = new ArrayList<>(); I wonder if visibility of contents of `devices` might be an issue, and if I should use a `ConcurrentLinkedQueue` or something similar. There's no concurrent access, but I _think_ it will be read (`build()`) in a thread that's different from the one in which it's populated.
Comment 9•7 years ago
|
||
mozreview-review |
Comment on attachment 8874635 [details] Bug 1370111 - Set 'started' telemetry timestamp before attempting to talk to the Token Server https://reviewboard.mozilla.org/r/145972/#review149930 This looks sensible. You won't see the same problem with "finished" because it's set only once, immediately before sending the local broadcast. Correct?
Attachment #8874635 -
Flags: review?(nalexander) → review+
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8874636 [details] Bug 1370111 - Post: More explicit global error handling for Sync Telemetry https://reviewboard.mozilla.org/r/145974/#review149932 ::: mobile/android/services/src/main/java/org/mozilla/gecko/fxa/sync/FxAccountSyncAdapter.java:177 (Diff revision 2) > @Override > public void handleError(GlobalSession globalSession, Exception ex, String reason) { > super.handleError(globalSession, ex, reason); > - this.telemetryCollector.setError(TelemetryCollector.KEY_ERROR_INTERNAL, reason); > + // If an error hasn't been set downstream, record what we know at this point. > + if (!telemetryCollector.hasError()) { > + telemetryCollector.setError(TelemetryCollector.KEY_ERROR_INTERNAL, reason, ex); You could also make `setError` do this, looking ahead to a future where you keep the _first_ error rather than the _last_ error.
Attachment #8874636 -
Flags: review?(nalexander) → review+
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8874637 [details] Bug 1370111 - Post: Use a helper function for grabbing monotonic time https://reviewboard.mozilla.org/r/145976/#review149934 1) It doesn't make sense to upload `SystemClock.elapsedRealtime()`, ever. You don't do that right now, but your naming doesn't guard against it. Make it `.start()` and `.finish()`, push the clock management into the `TelemetryCollector`, and save the confusion later, when folks use millis since the epoch. 2) Let's not use `now()` -- rnewman and I definitely expect `now()` to be `System.currentTimeMillis()`, not `SystemClock.elapsedRealtime()`.
Attachment #8874637 -
Flags: review?(nalexander) → review-
Comment 12•7 years ago
|
||
This crash is ranked #2 in FA topcrashers and it's a startup crash.
Keywords: topcrash
Comment 14•7 years ago
|
||
grisha: push the first two parts when they're green; we can talk about the post eventually.
Comment 15•7 years ago
|
||
Since this is a high volume top crash and startup crash, I'm marking this bug as a blocker for 55 nightly to move to beta.
status-firefox54:
--- → unaffected
tracking-firefox55:
--- → blocking
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8874637 -
Attachment is obsolete: true
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #15) > Since this is a high volume top crash and startup crash, I'm marking this > bug as a blocker for 55 nightly to move to beta. Also relevant are Bug 1370221, Bug 1370656, and Bug 1370668, which all have patches ready to land (waiting for the tree to open).
Comment 19•7 years ago
|
||
Pushed by gkruglov@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/84818c89ef9c Set 'started' telemetry timestamp before attempting to talk to the Token Server r=nalexander https://hg.mozilla.org/integration/autoland/rev/12af2c27c2b2 Post: More explicit global error handling for Sync Telemetry r=nalexander
Comment 20•7 years ago
|
||
OK, I figure those patches should land today sometime when the sheriffs get everything squared away with m-c. Thanks.
Comment 21•7 years ago
|
||
This crash seems to occur during a background sync when the browser is NOT active. What is really odd is that it seems to be telemetry related, yet it occurs if telemetry is disabled.
Comment 22•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/84818c89ef9c https://hg.mozilla.org/mozilla-central/rev/12af2c27c2b2
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Comment 23•7 years ago
|
||
(In reply to Bill Gianopoulos [:WG9s] from comment #21) > This crash seems to occur during a background sync when the browser is NOT > active. What is really odd is that it seems to be telemetry related, yet it > occurs if telemetry is disabled. Oh, dear -- I didn't check during review that this code is observing the Telemetry preferences. grisha, did we just miss this?
Flags: needinfo?(gkruglov)
Comment 24•7 years ago
|
||
(In reply to Nick Alexander :nalexander from comment #23) > (In reply to Bill Gianopoulos [:WG9s] from comment #21) > > This crash seems to occur during a background sync when the browser is NOT > > active. What is really odd is that it seems to be telemetry related, yet it > > occurs if telemetry is disabled. > > Oh, dear -- I didn't check during review that this code is observing the > Telemetry preferences. grisha, did we just miss this? I wonder if it really only fails if telemetry is disabled and that is why there is no a Telemetry started timestamp.
Comment 25•7 years ago
|
||
(In reply to Bill Gianopoulos [:WG9s] from comment #24) > (In reply to Nick Alexander :nalexander from comment #23) > > (In reply to Bill Gianopoulos [:WG9s] from comment #21) > > > This crash seems to occur during a background sync when the browser is NOT > > > active. What is really odd is that it seems to be telemetry related, yet it > > > occurs if telemetry is disabled. > > > > Oh, dear -- I didn't check during review that this code is observing the > > Telemetry preferences. grisha, did we just miss this? > I wonder if it really only fails if telemetry is disabled and that is why > there is no a Telemetry started timestamp. Reading a bit more, I think it's already covered. What's happening is that Sync is always collecting data and shuttling it to the Telemetry uploader (in the same Android process). The Telemetry uploader observes the Telemetry preferences, so I think we're OK -- but I should have verified this and understood the implications, including bugs like this. What's happening is that bugs in the "collecting data" and "shuttling" phases still impact the system, even when the Telemetry preference is off. This isn't ideal, but it's not a bad way to architect the system, so I don't think it needs to change -- we just need to iron out the bugs. In this case, this bug is in the "shuttling" phase, so it bites regardless of the Telemetry preferences.
Comment 26•7 years ago
|
||
Well there are 2 reasons you might want to disable telemetry data. One is the privacy thing and the other is a performance thing. What is the point of wasting resources collecting a mess of data unnecessarily if nothing is go9ing to be done with it because telemetry reporting is disabled?
Comment 27•7 years ago
|
||
(In reply to Bill Gianopoulos [:WG9s] from comment #26) > Well there are 2 reasons you might want to disable telemetry data. One is > the privacy thing and the other is a performance thing. What is the point > of wasting resources collecting a mess of data unnecessarily if nothing is > go9ing to be done with it because telemetry reporting is disabled? Trade-offs: complexity around managing the pref from multiple places, etc. I agree, not ideal -- but not a show-stopper either. We could file follow-up.
Comment 28•7 years ago
|
||
(In reply to Nick Alexander :nalexander from comment #27) > (In reply to Bill Gianopoulos [:WG9s] from comment #26) > > Well there are 2 reasons you might want to disable telemetry data. One is > > the privacy thing and the other is a performance thing. What is the point > > of wasting resources collecting a mess of data unnecessarily if nothing is > > go9ing to be done with it because telemetry reporting is disabled? > > Trade-offs: complexity around managing the pref from multiple places, etc. > I agree, not ideal -- but not a show-stopper either. We could file > follow-upro Yes the whole reason i turned this off was performance because I had a browser with 6 tabs open and had 3 processes using most of the cpu seemingly dealing with telemetry data. I turned this off to try to help performance and it seems it makes no difference because it lamely collects all the data and then does nothing with it in the telemetry disabled case.
Comment 29•7 years ago
|
||
(In reply to Bill Gianopoulos [:WG9s] from comment #28) > (In reply to Nick Alexander :nalexander from comment #27) > > (In reply to Bill Gianopoulos [:WG9s] from comment #26) > > > Well there are 2 reasons you might want to disable telemetry data. One is > > > the privacy thing and the other is a performance thing. What is the point > > > of wasting resources collecting a mess of data unnecessarily if nothing is > > > go9ing to be done with it because telemetry reporting is disabled? > > > > Trade-offs: complexity around managing the pref from multiple places, etc. > > I agree, not ideal -- but not a show-stopper either. We could file > > follow-upro > Yes the whole reason i turned this off was performance because I had a > browser with 6 tabs open and had 3 processes using most of the cpu seemingly > dealing with telemetry data. I turned this off to try to help performance > and it seems it makes no difference because it lamely collects all the data > and then does nothing with it in the telemetry disabled case. I think what you are seeing is a function of Sync starting, Sync crashing entire process, Sync re-starting, Sync crashing entire process, etc -- lots of stuff is happening with the re-start that will peg a CPU as Android frantically tries to get a successful sync to happen. Please, file the follow-up ticket -- I'm trying to agree with you while also explaining why we took the initial action we took!
Assignee | ||
Comment 30•7 years ago
|
||
Correctly functioning sync telemetry should have a negligible performance impact. However, what is somewhat concerning to me is that we are storing telemetry data regardless of the user preference. If I'm reading the code correctly, this also means that whenever user opts-in to telemetry, we will upload telemetry that was collected while user was opted out. This isn't specific to sync ping - I think our core ping "suffers" from the same issue.
Flags: needinfo?(gkruglov)
Comment 31•7 years ago
|
||
(In reply to :Grisha Kruglov from comment #30) > Correctly functioning sync telemetry should have a negligible performance > impact. > > However, what is somewhat concerning to me is that we are storing telemetry > data regardless of the user preference. If I'm reading the code correctly, > this also means that whenever user opts-in to telemetry, we will upload > telemetry that was collected while user was opted out. This isn't specific > to sync ping - I think our core ping "suffers" from the same issue. Yes lots of security privacy performance issues here with the current implementation. Obviously needs to be in a new bug, however. This one is done!
Assignee | ||
Comment 32•7 years ago
|
||
(In reply to Bill Gianopoulos [:WG9s] from comment #31) > Yes lots of security privacy performance issues here with the current > implementation. Obviously needs to be in a new bug, however. This one is > done! See Bug 1371400.
Updated•7 years ago
|
Product: Android Background Services → Firefox for Android
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•