Closed Bug 1407557 Opened 7 years ago Closed 7 years ago

Add crash ping support to Fennec

Categories

(Firefox for Android Graveyard :: Metrics, enhancement)

enhancement
Not set
normal

Tracking

(firefox60 fixed)

RESOLVED FIXED
Firefox 60
Tracking Status
firefox60 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

Details

Attachments

(1 file, 3 obsolete files)

Fennec currently doesn't send crash pings. Apart from the crash reporter activity Fennec does not handle crashes in any other way. To implement this we should add a new type of ping to Fennec's specific telemetry machinery and write it out directly from the crash reporter activity if possible.
Hi Garbriele May I know why crash pings are not sent? How do I investigate this if I want to help?
Flags: needinfo?(gsvelto)
Hi Nevin, crash pings in Firefox are sent via the CrashManager [1] or via the crashreporter client [2]. Both are disabled in Fennec since it uses its own crashreporter client implementation which is an Android activity [3]. Fennec also has its own telemetry implementation [4] which makes things more complicated. So what needs to be done here is that we need to add a new ping type for Fennec as described here [4] and then integrate it in the crashreporter activity so that whenever we hit a crash we also assemble the ping and queue it for sending. BTW I intended to work on this bug this week, but if you're more knowledgeable of Fennec than I am - which I'm sure you are - feel free to pick it up. [1] http://searchfox.org/mozilla-central/rev/1285ae3e810e2dbf2652c49ee539e3199fcfe820/toolkit/components/crashes/CrashManager.jsm#663 [2] http://searchfox.org/mozilla-central/rev/1285ae3e810e2dbf2652c49ee539e3199fcfe820/toolkit/crashreporter/client/crashreporter.cpp#738 [3] http://searchfox.org/mozilla-central/rev/1285ae3e810e2dbf2652c49ee539e3199fcfe820/mobile/android/base/java/org/mozilla/gecko/CrashReporter.java [4] http://searchfox.org/mozilla-central/rev/1285ae3e810e2dbf2652c49ee539e3199fcfe820/mobile/android/base/java/org/mozilla/gecko/telemetry/TelemetryDispatcher.java#23
Flags: needinfo?(gsvelto)
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Attached patch [PATCH] WIP (obsolete) — Splinter Review
This contains the basic machinery required for sending a crash ping from Fennec's crash reporter activity. There's still quite a few things missing though: - the crash ping is essentially empty, it needs to be populated by parsing the .extra file like we do in the desktop's crashreporter client - I have yet to check if the URL the ping is sent to is correct. Most likely not since it's generated using the same format for Fennec core pings and I doubt that's correct (even though the ping type is correctly set to 'crash')
Attached patch [PATCH v2] WIP (obsolete) — Splinter Review
The crash ping is now submitted to the correct URL, it still needs to be fully populated though.
Attachment #8941103 - Attachment is obsolete: true
Attached patch [PATCH v3] WIP (obsolete) — Splinter Review
The ping is now fully populated, save for the telemetry client ID. Fennec crash dumps do not seem to contain the TelemetryClientId annotation; I need to investigate why and eventually find out how to retrieve it.
Attachment #8941369 - Attachment is obsolete: true
Attachment #8941963 - Attachment is obsolete: true
Attachment #8942388 - Flags: review?(michael.l.comella) → review?(cnevinchen)
Nevin, do you have sufficient telemetry expertise to review this? If not, feel free to send it back to me. FYI: I'm only on Firefox for Android for critical Activity Stream issues and this falls outside that scope.
Thanks for redirecting the review Michael, I wasn't unsure who to ask but since we had talked about this I thought you were a good fit. Just so that I know who to refer to in the future, who's in charge of Fennec these days?
Flags: needinfo?(michael.l.comella)
I'll have to redirect this to Nevin for more details. To my knowledge: - Nevin - JanH - (platform GeckoView team led by Snorp - not directly working on Fennec) - Critical Activity Stream issues and triage: me - Build improvements: nalexander
Flags: needinfo?(michael.l.comella) → needinfo?(cnevinchen)
Comment on attachment 8942388 [details] Bug 1407557 - Add crash pings to Fennec; https://reviewboard.mozilla.org/r/212668/#review219926 I can only comment on coding style. For deeper understanding about Crash Report, I think Michael is more knowledgeable. I can't commit the schedule of the review for now. Sorry ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:164 (Diff revision 1) > + USER32_BEFORE_BLOCKLIST_ANNOTATION, > + VERSION_ANNOTATION, > + }; > + > + public TelemetryCrashPingBuilder(String crashId, String clientId, HashMap<String, String> annotations) { > + super(4); nit: use a static field maybe better than a magic number :) maybe: private final staitc int PING_VERSION = 4; ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:231 (Diff revision 1) > + node.put(CRASH_DATE_ATTR, CurrentDate(ISO8601_DATE)); > + node.put(CRASH_TIME_ATTR, CurrentDate(ISO8601_DATE_HOURS)); > + node.put(HAS_CRASH_ENVIRONMENT_ATTR, true); > + node.put(CRASH_ID_ATTR, crashId); > + node.put(MINIDUMP_SHA256_HASH_ATTR, annotations.get(MINIDUMP_SHA256HASH_ANNOTATION)); > + node.put(PROCESS_TYPE_ATTR, "main"); // TODO: Is this correct on Fennec? I don't know who can answer this question. ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:273 (Diff revision 1) > + * @param annotations A map holding the crash annotations > + * @param architecture The platform architecture > + * @param xpcomAbi The XPCOM ABI version > + * @returns A JSON object representing the ping's application node > + */ > + private static ExtendedJSONObject CreateApplicationNode(HashMap<String, String> annotations, nit: createApplicationNode(... ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryPingBuilder.java:92 (Diff revision 1) > docType + '/' + > appName + '/' + > appVersion + '/' + > appUpdateChannel + '/' + > - appBuildId; > + appBuildId + > + (version == 4 ? "?v=4" : ""); nit: same as above.
Comment on attachment 8942388 [details] Bug 1407557 - Add crash pings to Fennec; https://reviewboard.mozilla.org/r/212668/#review219926 > I don't know who can answer this question. As far as I know Fennec is still single-process so all crashes should come from the main process. I wonder if we should treat uncaught Java exceptions differently. We can fix it later anyway, compared to not having crash pings at all this is no big deal.
After talking on IRC I'm switching the review to :jchen. I'll upload a patch with nits addressed shortly.
Flags: needinfo?(cnevinchen)
Comment on attachment 8942388 [details] Bug 1407557 - Add crash pings to Fennec; https://reviewboard.mozilla.org/r/212668/#review220596 Looks good! r- just to make sure comments are addressed. ::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:162 (Diff revision 2) > > + try { > + // Find the profile name and path. Since we don't have any other way of getting it within > + // this context we extract it from the crash dump path. > + final File profileDir = passedMinidumpFile.getParentFile().getParentFile(); > + final String profileName = getProfileName(profileDir); I don't think you really need the actual profile name here (and some profiles may not have names). You should be able to just use `GeckoProfile.CUSTOM_PROFILE`, which is defined as an empty string. ::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:167 (Diff revision 2) > + final String profileName = getProfileName(profileDir); > + > + if (profileName != null) { > + // Extract the crash dump ID and telemetry client ID, we need profile access for the latter. > + final String passedMinidumpName = passedMinidumpFile.getName(); > + final String crashId = passedMinidumpName.substring(0, passedMinidumpName.length() - 4); A comment would be nice (i.e. why subtract 4?) ::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:168 (Diff revision 2) > + > + if (profileName != null) { > + // Extract the crash dump ID and telemetry client ID, we need profile access for the latter. > + final String passedMinidumpName = passedMinidumpFile.getName(); > + final String crashId = passedMinidumpName.substring(0, passedMinidumpName.length() - 4); > + final GeckoProfile profile = GeckoProfile.get(this, profileName, profileDir.getPath()); `GeckoProfile.get(this, profileName, profileDir)` should work. ::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:175 (Diff revision 2) > + > + // Assemble and send the crash ping > + final TelemetryCrashPingBuilder pingBuilder = > + new TelemetryCrashPingBuilder(crashId, clientId, mExtrasStringMap); > + final TelemetryDispatcher dispatcher = new TelemetryDispatcher(profileDir.getPath(), profileName); > + dispatcher.queuePingForUpload(this, pingBuilder); Do we make sure the pings are uploaded before exiting the crash reporter? Or maybe we could queue the ping and upload later? ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:29 (Diff revision 2) > + * > + * See https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/data/crash-ping.html > + * for details on the crash ping. > + */ > +public class TelemetryCrashPingBuilder extends TelemetryPingBuilder { > + private static final String LOGTAG = StringUtils.safeSubstring(TelemetryCrashPingBuilder.class.getSimpleName(), 0, 24); `LOGTAG = "GeckoTelemetryCrashPingBuilder"` ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:31 (Diff revision 2) > + * for details on the crash ping. > + */ > +public class TelemetryCrashPingBuilder extends TelemetryPingBuilder { > + private static final String LOGTAG = StringUtils.safeSubstring(TelemetryCrashPingBuilder.class.getSimpleName(), 0, 24); > + > + private static final String APPLICATION_ATTR = "application"; I would just inline these string constants instead of using `static final` fields. The reason is Android has some limitations around the number of fields in an application, so I think it's better to play safe. ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:200 (Diff revision 2) > + node.put(CRASH_DATE_ATTR, currentDate(ISO8601_DATE)); > + node.put(CRASH_TIME_ATTR, currentDate(ISO8601_DATE_HOURS)); > + node.put(HAS_CRASH_ENVIRONMENT_ATTR, true); > + node.put(CRASH_ID_ATTR, crashId); > + node.put(MINIDUMP_SHA256_HASH_ATTR, annotations.get(MINIDUMP_SHA256HASH_ANNOTATION)); > + node.put(PROCESS_TYPE_ATTR, "main"); // TODO: Is this correct on Fennec? Fennec only has one process so "main" sounds right.
Attachment #8942388 - Flags: review?(nchen) → review-
Comment on attachment 8942388 [details] Bug 1407557 - Add crash pings to Fennec; https://reviewboard.mozilla.org/r/212668/#review220968 ::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:175 (Diff revision 2) > + > + // Assemble and send the crash ping > + final TelemetryCrashPingBuilder pingBuilder = > + new TelemetryCrashPingBuilder(crashId, clientId, mExtrasStringMap); > + final TelemetryDispatcher dispatcher = new TelemetryDispatcher(profileDir.getPath(), profileName); > + dispatcher.queuePingForUpload(this, pingBuilder); AFAIK queuePingForUpload() will hand over the ping to the telemetry activity which in the current implementation tries to send the ping immediately (though the crash reporter doesn't wait for that to happen). If the ping cannot be uploaded it is queued and sent when the activity has a chance (for example when the phone reconnects to the network). Ideally we want the ping to be sent as soon as possible to detect spikes in crash rates quickly. ::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCrashPingBuilder.java:31 (Diff revision 2) > + * for details on the crash ping. > + */ > +public class TelemetryCrashPingBuilder extends TelemetryPingBuilder { > + private static final String LOGTAG = StringUtils.safeSubstring(TelemetryCrashPingBuilder.class.getSimpleName(), 0, 24); > + > + private static final String APPLICATION_ATTR = "application"; I just tried to match the style used in the core ping builder: https://searchfox.org/mozilla-central/rev/e3cba77cee3ff1be38313abe9c804d13c51bd95b/mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCorePingBuilder.java#54 I'm glad to use inline string constants instead. Those are not subject to change anyway so there's not much value in making dedicated constants for them. I did it only for consistency with existing code.
Comment on attachment 8942388 [details] Bug 1407557 - Add crash pings to Fennec; https://reviewboard.mozilla.org/r/212668/#review220596 > I don't think you really need the actual profile name here (and some profiles may not have names). You should be able to just use `GeckoProfile.CUSTOM_PROFILE`, which is defined as an empty string. I tried but unfortunately it doesn't work. The telemetry upload logic cannot find the appropriate folder when using GeckoProfile.CUSTOM_PROFILE.
I believe I've addressed all the review comments. Additionally I fixed the extraction of the stack traces annotations. I hadn't noticed it was wrong but since I've got the patch for bug 1307153 almost ready I could test it.
Attachment #8942388 - Flags: review?(nchen) → review+
Thanks for the quick review!
Mark, before I land this I wanted to make sure that it won't cause troubles with the telemetry back-end. :dexter recommended asking you about it but feel free to redirect the NI if you're the wrong person. The patch here will enable Fennec to send crash pings: the ping format is completely identical to the one used on desktop Firefox, the only difference will be in the contents (for example the application name, the architecture type, etc...). Here's an example: http://jsoneditoronline.org/?id=e0f89f5c20230407753931152a5d290e I'll hold off landing this until I'm sure it won't cause problems with the data pipeline.
Flags: needinfo?(mreid)
This should be fine. We validate incoming crash pings against a JSON Schema[1], so as long as these pings are identical to the desktop ones, these should be validated without issue. It would be good to monitor the ingestion success at [2] as this rolls out to make sure there is no increase in failures. [1] https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/schemas/telemetry/crash/crash.4.schema.json [2] https://pipeline-cep.prod.mozaws.net/dashboard_output/graphs/analysis.moz_telemetry_doctype_monitor_crash.ingestion_error.html
Flags: needinfo?(mreid)
Great, thanks, ready for landing then.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
Blocks: 1307153
Can someone confirm that we are now getting crash pings for Fennec nightly and that everything is OK (or point me to the data)? Thanks.
Flags: needinfo?(mreid)
We are indeed getting crash pings for Fx 60. Looks like they started arriving on Jan 27th. Here's a basic query that shows the counts by day: https://sql.telemetry.mozilla.org/queries/51232#137407
Flags: needinfo?(mreid)
Blocks: 1453238
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: