Closed
Bug 1452745
Opened 7 years ago
Closed 7 years ago
Crash pings frequently seem to not have the displayVersion set correctly
Categories
(Toolkit :: Crash Reporting, defect)
Toolkit
Crash Reporting
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: wlach, Assigned: wlach)
References
Details
Attachments
(2 files)
:sylvestre noticed a few days ago that the mac beta main crash rate looked rather low:
https://data-missioncontrol.dev.mozaws.net/#/beta/mac
i.e. over the entire 60 beta cycle so far, we have only recorded 263 crashes
Looking more closely on stmo, it appears that the majority of the recorded crashes on Mac had a display version of "60.0" and *no* usage hours:
https://sql.telemetry.mozilla.org/queries/52441/source
Initially I had thought that we were not setting the displayVersion field correctly, based on this code:
https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/toolkit/crashreporter/client/ping.cpp#210
https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/toolkit/crashreporter/client/ping.cpp#323
However, on further investigation it appears as if most crash pings *do* have the field as expected:
https://gist.github.com/wlach/b26b59bd71ece458eeaf886354548863
... so there's evidently also some kind of problem in telemetry-streaming. I will file a bug for that, but meanwhile, :chutten, do you have any idea why we might not be setting the displayVersion field in crash pings correctly so much of the time? I'm super confused.
Flags: needinfo?(chutten)
Comment 1•7 years ago
|
||
In https://gist.github.com/wlach/b26b59bd71ece458eeaf886354548863 you're not limiting to main crashes. So it could be that the pings echo exactly what you're seeing in error_aggregates.
displayVersion comes from application.version which comes from the Version key in the .extra file.
Near as I can tell, that's all in cross-platform code[1] which shouldn't have any mac-specific behaviour.
Originally I wondered if I could blame this on release candidates, where the display version becomes that of a release version... but I think it's too early for that (we're still at 60.0 beta 10)
Now I wonder if we could blame it on pings not being built by pingsender. Pings built and sent by CrashManager have their application section built by TelemetryController which gets its displayVersion from MOZ_APP_VERSION_DISPLAY instead of from AppData (I can't tell if they have to be the same or not).
We could check this by splitting your gist's analysis based on whether or not there was an "meta/X-PingSender-Version" field in the ping or not. If all the '60.0's came from not-pingsender-sent pings, then we may have a lead.
[1]: https://searchfox.org/mozilla-central/rev/7ccb618f45a1398e31a086a009f87c8fd3a790b6/toolkit/xre/nsAppRunner.cpp#3465
Flags: needinfo?(chutten)
Assignee | ||
Comment 2•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #1)
> In https://gist.github.com/wlach/b26b59bd71ece458eeaf886354548863 you're not
> limiting to main crashes. So it could be that the pings echo exactly what
> you're seeing in error_aggregates.
Ah good catch.
> We could check this by splitting your gist's analysis based on whether or
> not there was an "meta/X-PingSender-Version" field in the ping or not. If
> all the '60.0's came from not-pingsender-sent pings, then we may have a lead.
>
> [1]:
> https://searchfox.org/mozilla-central/rev/
> 7ccb618f45a1398e31a086a009f87c8fd3a790b6/toolkit/xre/nsAppRunner.cpp#3465
It looks like it's the opposite of what you thought: pingsender pings *never* have the correct display version (which is actually what I'd expect given the source code I saw in the first comment):
https://gist.github.com/wlach/b26b59bd71ece458eeaf886354548863
Interestingly on Linux it looks like we have a large quantity of non-pingsender crash pings with the beta-less version (60.0) but I'm willing to chalk that up to that being a non-official version (note how few pingsender pings there are on that platform).
Can you think of a good person to assign this to?
Flags: needinfo?(chutten)
Assignee | ||
Comment 4•7 years ago
|
||
This does not appear to be a problem on nightly because the version is specified as the same as the display version:
https://sql.telemetry.mozilla.org/queries/52459/source
Assignee | ||
Comment 5•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #2)
> Can you think of a good person to assign this to?
On reflection this seems simultaneously more urgent and less hard to do than I thought, so I think I'm going to take a stab at fixing this myself.
Assignee: nobody → wlachance
Flags: needinfo?(chutten)
Comment 6•7 years ago
|
||
Gabriele Svelto's a good resource for pingsender stuff.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #7)
> Created attachment 8966707 [details]
> Bug 1452745 - Annotate crash pings with actual display and platform versions
>
> Before we were falling back to using the raw version, which isn't
> correct on at least beta (i.e. we would get `60.0` instead of `60.0b1`).
>
> Review commit: https://reviewboard.mozilla.org/r/235414/diff/#index_header
> See other reviews: https://reviewboard.mozilla.org/r/235414/
So I'm pretty confident this works, but am uncertain how to test this (it compiles fine). :gsvelto, do you have any ideas? (I've already flagged you for review)
Comment 9•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #8)
> So I'm pretty confident this works, but am uncertain how to test this (it
> compiles fine). :gsvelto, do you have any ideas? (I've already flagged you
> for review)
You can test this manually with the following procedure:
- Get the dummy telemetry server from https://github.com/mozilla/gzipServer
- Launch it on your machine (note, it requires Python 2.7 IIRC):
python gzipServer.py --port 8080
- Launch your nightly build and set the following three preferences, you'll have to create the last one:
toolkit.telemetry.enabled true
toolkit.telemetry.server "http://localhost:8080"
toolkit.telemetry.send.overrideOfficialCheck true
- Close your nightly build and launch it again, if you're using mach you'll need to enable the crash reporter explicitly:
./mach run --enable-crash-reporter
- Find the PID of the main process and kill it with:
kill -ABRT <PID of the main process>
- This will crash Firefox and launch the crashreporter client which in turn will assemble the ping and send it. The gzipServer.py script should save it to disk so you can inspect it.
Note that it is technically possible to write a unit test covering this using the functionality we recently implemented in bug 379290. However we haven't written any yet and it might be tricky to get everything right.
Comment 10•7 years ago
|
||
I just realized I made the same mistake when implementing bug 1407557, I'll file a follow up for fixing that too.
https://searchfox.org/mozilla-central/rev/b55e1a1cbcaee34878e133fbac20c4c2af6e11b5/mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java#190
Assignee | ||
Comment 11•7 years ago
|
||
Just finished testing with a custom displayVersion. It works! Thanks for the instructions.
(In reply to Gabriele Svelto [:gsvelto] from comment #9)
> (In reply to William Lachance (:wlach) (use needinfo!) from comment #8)
> You can test this manually with the following procedure:
>
> - Get the dummy telemetry server from https://github.com/mozilla/gzipServer
> - Launch it on your machine (note, it requires Python 2.7 IIRC):
>
> python gzipServer.py --port 8080
>
> - Launch your nightly build and set the following three preferences, you'll
> have to create the last one:
>
> toolkit.telemetry.enabled true
This part appears to be out of date. Checking with :chutten, you now need to add `MOZ_TELEMETRY_REPORTING=1` to your .mozconfig.
Comment 12•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #11)
> Just finished testing with a custom displayVersion. It works! Thanks for the
> instructions.
YW
> > - Launch your nightly build and set the following three preferences, you'll
> > have to create the last one:
> >
> > toolkit.telemetry.enabled true
>
> This part appears to be out of date. Checking with :chutten, you now need to
> add `MOZ_TELEMETRY_REPORTING=1` to your .mozconfig.
Thanks! I hadn't done it in a while and I didn't notice that the preference is locked now. Thanks for the heads up.
Comment 13•7 years ago
|
||
mozreview-review |
Comment on attachment 8966707 [details]
Bug 1452745 - Annotate crash pings with actual display and platform versions
https://reviewboard.mozilla.org/r/235414/#review241260
Looking good!
Attachment #8966707 -
Flags: review?(gsvelto) → review+
Comment 14•7 years ago
|
||
I filed bug 1453238 for the Fennec fix.
Comment 15•7 years ago
|
||
Pushed by wlachance@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/706454cc63c4
Annotate crash pings with actual display and platform versions r=gsvelto
Comment 16•7 years ago
|
||
mozreview-review |
Comment on attachment 8966707 [details]
Bug 1452745 - Annotate crash pings with actual display and platform versions
https://reviewboard.mozilla.org/r/235414/#review241390
::: toolkit/components/telemetry/TelemetryEnvironment.jsm:1347
(Diff revision 1)
> applicationName: Services.appinfo.name || null,
> architecture: Services.sysinfo.get("arch"),
> buildId: Services.appinfo.appBuildID || null,
> version: Services.appinfo.version || null,
> vendor: Services.appinfo.vendor || null,
> + displayVersion: AppConstants.MOZ_APP_VERSION_DISPLAY || null,
Adding fields to the Environment is subject to Data Collection Review.
Assignee | ||
Comment 17•7 years ago
|
||
Data review request for chutten to approve the new field in the telemetry environment.
Attachment #8966988 -
Flags: review?(chutten)
Comment 18•7 years ago
|
||
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes. The Telemetry Environment documentation will be updated to include this field.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes, standard Telemetry checkbox.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes and no. As part of the environment it will be under some decent levels of scrutiny by many, but there is no facility (or need) for monitoring it. It is a duplication of the field of the same name in the application section of the common ping format.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1.
Is the data collection request for default-on or default-off?
Default on.
Does the instrumentation include the addition of any new identifiers?
No. It is a duplicate field.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No. Though I have filed bug 1453344 to hopefully remove the duplicate field in future.
---
result: datareview+
Updated•7 years ago
|
Attachment #8966988 -
Flags: review?(chutten) → review+
Comment 19•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Comment 20•7 years ago
|
||
Heya! Sorry, I just stumbled upon this :) William, any chance you could also file a PR against https://github.com/mozilla/probe-dictionary/blob/master/environment.json to add the |displayVersion| field there as well?
Flags: needinfo?(wlachance)
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #20)
> Heya! Sorry, I just stumbled upon this :) William, any chance you could also
> file a PR against
> https://github.com/mozilla/probe-dictionary/blob/master/environment.json to
> add the |displayVersion| field there as well?
I can do this, but I don't think I'll have time before I return from PTO in a week. Leaving NI open to keep this on my radar.
Does this need to be uplifted to Beta60? This is so we can use MC dashboard beta crash data reliably.
Flags: needinfo?(jcristau)
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #22)
> Does this need to be uplifted to Beta60? This is so we can use MC dashboard
> beta crash data reliably.
I thought of this but didn't think it was worth the churn at this point. On the one hand, the patch is simple so it's low risk. On the other, since most of the beta cycle has gone by without this information, so in practice we can't really rely on 60's overall beta numbers as any kind of baseline within mission control even if we do uplift the patch now. I think having info on the end of the beta cycle might have *some* value, but it would be extremely limited.
If you want to uplift you can, but my vote would be to just let this ride the trains.
Assignee | ||
Comment 24•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #21)
> (In reply to Alessio Placitelli [:Dexter] from comment #20)
> > Heya! Sorry, I just stumbled upon this :) William, any chance you could also
> > file a PR against
> > https://github.com/mozilla/probe-dictionary/blob/master/environment.json to
> > add the |displayVersion| field there as well?
>
> I can do this, but I don't think I'll have time before I return from PTO in
> a week. Leaving NI open to keep this on my radar.
Done now, over to you Alessio.
https://github.com/mozilla/probe-dictionary/pull/40
Flags: needinfo?(wlachance)
Comment 25•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #24)
> (In reply to William Lachance (:wlach) (use needinfo!) from comment #21)
> > (In reply to Alessio Placitelli [:Dexter] from comment #20)
> > > Heya! Sorry, I just stumbled upon this :) William, any chance you could also
> > > file a PR against
> > > https://github.com/mozilla/probe-dictionary/blob/master/environment.json to
> > > add the |displayVersion| field there as well?
> >
> > I can do this, but I don't think I'll have time before I return from PTO in
> > a week. Leaving NI open to keep this on my radar.
>
> Done now, over to you Alessio.
>
> https://github.com/mozilla/probe-dictionary/pull/40
Thanks!
Comment 26•7 years ago
|
||
comment 23 makes sense to me, looking forward to seeing this in beta61!
Flags: needinfo?(jcristau)
You need to log in
before you can comment on or make changes to this bug.
Description
•