Closed Bug 745980 Opened 12 years ago Closed 12 years ago

integrate breakpad and crash reporter into desktop webapp runtime

Categories

(Firefox Graveyard :: Webapp Runtime, enhancement, P1)

enhancement

Tracking

(blocking-kilimanjaro:+)

VERIFIED FIXED
Firefox 16
blocking-kilimanjaro +

People

(Reporter: jsmith, Assigned: myk)

References

Details

(Whiteboard: [blocking-webrtdesktop1+], [qa!])

Attachments

(2 files, 2 obsolete files)

Requirement: Initiate a crash when the webRT has started.

Case scene:

1. Install http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/myk@mozilla.com-1bbb79853e99/try-macosx64/ on Mac OS X 10.5
2. Install an application on this build
3. Launch that same application on Max OS X 10.5 this build
4. Crash occurs

Expected:

Breakpad should appear, same as Firefox when it crashes.

Actual:

Breakpad does not appear. You can get the report by clicking report.

Additional Notes:

This is a really big concern, cause dogfooding users will be unable to report crashes on the WebRT.
Severity: critical → enhancement
Component: Web Apps → Desktop Runtime
Product: Firefox → Web Apps
QA Contact: webapps → desktop-runtime
Summary: Breakpad is not integrated into chromeless - Cannot capture crashes integrated into Socorro → integrate breakpad and crash reporter into desktop webapp runtime
In disagreement that this should be called an enhancement for a future release - We won't be able to get accurately get crashes that occur with the desktop runtime effectively, which will be really problematic for doing effective crash tracking for the webRT.
Crashkill team - Could you provide insight here? Will this block a merge to Aurora?
My opinion - I don't think we should merge to Aurora without proper crash reporting integration. Why? Cause we won't be able to accurately capture crash information to know why the webRT crashed, understand statistics on what are top crashers, etc.
(In reply to Jason Smith from comment #1)
> In disagreement that this should be called an enhancement for a future
> release - We won't be able to get accurately get crashes that occur with the
> desktop runtime effectively, which will be really problematic for doing
> effective crash tracking for the webRT.

It sounds like you're suggesting a priority rather than a severity.  The Severity field is a bit wonky, and perhaps "enhancement" shouldn't be in it, but this isn't a bug, as we didn't intend to integrate breakpad and crash reporter in the initial implementation.
(In reply to Myk Melez [:myk] [@mykmelez] from comment #4)
> It sounds like you're suggesting a priority rather than a severity.  The
> Severity field is a bit wonky, and perhaps "enhancement" shouldn't be in it,
> but this isn't a bug, as we didn't intend to integrate breakpad and crash
> reporter in the initial implementation.

True, although this may be a special case, as it may be a feature needed for our first release of the WebRT that we did not initially think of (a must have). For now, I'll avoid the semantic details and get to the real issue - I think we need this implemented for a first release of the WebRT. Otherwise, it's going to pain for all of us (e.g. developers - lack of information to fix a crash, qa - lack of crash reports to capture critical quality issues, security - crashes specific to security vulnerabilities will be harder to find), especially with non-deterministic crashes.
Blocks: 746022
ted, any tips on adding Breakpad to desktop WebRT? Bug 725408 has the patch for reusing Firefox as the runtime.
(In reply to Jason Smith from comment #2)
> Crashkill team - Could you provide insight here? Will this block a merge to
> Aurora?

I'm not sure if it would actually block, but I think it's strongly recommended to have this done. I'm still not sure what the actual layout of the WebRT(s) on our different platforms is. If on desktop this uses the Firefox executable (or even process), then it might be pretty simple, if it's an own executable, it might need more more, but I think getting Breakpad to work is probably relatively easy. You could try to load the testcase in bug 323394 in the WebRT and see if a Crash Reporter comes up. ;-)
Skimming that patch is tough since it's pretty big, but it looks like you're creating your own application.ini file? If so, you should include a [Crash Reporter] section like we do for other apps:
http://mxr.mozilla.org/mozilla-central/source/build/application.ini.in#73

(That just gets propogated into the nsXREAppData struct, so you could manually fiddle it there if you had to as well.)
Blocks: 737571
blocking-kilimanjaro: --- → ?
Whiteboard: [marketplace-beta-]
blocking-kilimanjaro: ? → +
Blocks: 753801
Priority: -- → P1
Whiteboard: [marketplace-beta-] → [marketplace-beta-] [blocking-webrtdesktop1+]
Target Milestone: --- → M1
No longer blocks: 737571
Whiteboard: [marketplace-beta-] [blocking-webrtdesktop1+] → [blocking-webrtdesktop1+]
Component: Desktop Runtime → Webapp Runtime
Product: Web Apps → Firefox
Target Milestone: M1 → Firefox 15
I tried simply adding a [Crash Reporter] section to the runtime's INI file (commenting out #if MOZILLA_OFFICIAL so it would be enabled on my test build).

After launching an app, a Crash Reports directory was added to its datadir, and an InstallTime[timestamp] file was created in it.

Then I pointed the app at that crashing testcase and restarted it, whereupon the runtime hung and eventually crashed.  The Crash Reporter dialog didn't appear at that point, but some additional files appeared in the Crash Reports directory: LastCrash, submit.log, and a pending directory with a .dmp file and a .extra file in it.

So some Crash Reporting functionality is enabled, but it isn't all working.

kairo, ted: any further hints on what needs doing to get the client working?
Okay, sounds like you're most of the way there.

I have a possible theory:
Somehow it's not finding the crashreporter app. When we init crash reporting from XRE_Main:
http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp#2941

we pass down the XREDirectory from the nsXREAppData, which is where we expect to find the crashreporter binary:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#637

If that's not set properly, we might be failing to find the crashreporter and thus not able to do anything after the crash.
FWIW, I wouldn't put this section in your INI file, I would just copy the Firefox value before running XRE_main. That way if we ever need to change the Firefox value we don't need to worry about old values left over in webrt INI files.

Did the submit.log say anything interesting?
Btw Myk - Using Andy's hack to see about:crashes, I get the following message on about:crashes (before any patch is applied):

Submitted Crash Reports

This application has not been configured to display crash reports. The preference breakpad.reportURL must be set.
(In reply to Jason Smith [:jsmith] from comment #12)
> This application has not been configured to display crash reports. The
> preference breakpad.reportURL must be set.

This is set for Firefox here: http://mxr.mozilla.org/comm-central/source/mozilla/browser/app/profile/firefox.js#864

Sounds like it's just a pref for the viewing side though, and you said that not even the crash reporter app did come up, so that missing pref for viewing stuff is probably not relevant to that. Probably relevant to bug 753801 though.
(In reply to Ted Mielczarek [:ted] from comment #10)
> I have a possible theory:
> Somehow it's not finding the crashreporter app. When we init crash reporting
> from XRE_Main:
> http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsAppRunner.
> cpp#2941
> 
> we pass down the XREDirectory from the nsXREAppData, which is where we
> expect to find the crashreporter binary:
> http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/
> nsExceptionHandler.cpp#637
> 
> If that's not set properly, we might be failing to find the crashreporter
> and thus not able to do anything after the crash.

I stepped through the code that sets up crash reporting, but I didn't see anything wrong.  In particular, XREDirectory was correct.

And then, silly me, I realized that I do get the Crash Reporter dialog if I press Ignore in the assertion dialog.  I was pressing Abort instead.

The Details... button doesn't work in the dialog, and the first time I tried to submit a report the dialog claimed that it was unable to submit it, although submit.log, which had been empty, read:

[05/24/12 15:00:13] Crash report submission failed: The operation completed successfully.

But the second time I tried to submit a report, it got submitted successfully:

[05/24/12 16:54:15] Crash report submitted successfully

And the file in the submitted/ subdir says it has crash ID bp-98a96d90-0654-4e9e-8c34-f26832120524.


(In reply to Benjamin Smedberg  [:bsmedberg] from comment #11)
> FWIW, I wouldn't put this section in your INI file, I would just copy the
> Firefox value before running XRE_main. That way if we ever need to change
> the Firefox value we don't need to worry about old values left over in webrt
> INI files.

Hmm, it's not clear what you mean by "webrt INI files", but I'm putting the section into the runtime's INI file, which lives in the Firefox bindir and gets updated whenever Firefox does, rather than the individual INI files for each webapp.

And the [Crash Reporter]ServerURL field's value has an 'id' parameter that presumably needs to be set to the runtime's app ID rather than Firefox's.  So it seems both simpler and more robust to put this section into the runtime's INI file than to copy and munge Firefox's value for that field.
(In reply to Myk Melez [:myk] [@mykmelez] from comment #14)
> And the file in the submitted/ subdir says it has crash ID
> bp-98a96d90-0654-4e9e-8c34-f26832120524.

Now that's something! Looking at that report (linked here in Bugzilla), I see that something 's strange there, where does that messed up product name "www.giscloud.com;https;-1" come from?
Other than that, I guess that's a self-built build so that's why it has the "default" channel and no symbols (therefore a pretty useless stack and signature), which is OK in this case. Should be better in an official build.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #15)
> (In reply to Myk Melez [:myk] [@mykmelez] from comment #14)
> > And the file in the submitted/ subdir says it has crash ID
> > bp-98a96d90-0654-4e9e-8c34-f26832120524.
> 
> Now that's something! Looking at that report (linked here in Bugzilla), I
> see that something 's strange there, where does that messed up product name
> "www.giscloud.com;https;-1" come from?

That's the origin directory of the application - GIS Cloud.
Blocks: k9o-webrt
(In reply to Jason Smith [:jsmith] from comment #16)
> That's the origin directory of the application - GIS Cloud.

Hmm, we need all crash reports to have the same product name there for WebRT to be able to process them in Socorro. That origin surely makes sense in a different field of the report, though.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #15)
> Looking at that report (linked here in Bugzilla), I
> see that something 's strange there, where does that messed up product name
> "www.giscloud.com;https;-1" come from?

The fix for bug 747409 <http://hg.mozilla.org/mozilla-central/rev/4c1aa39b85a9> sets nsXREAppData::name to the webapp origin (f.e. "www.giscloud.com;https;-1") on Windows so the code that looks for an existing process when you start up a xulapp treats each webapp as a distinct product.


(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17)
> Hmm, we need all crash reports to have the same product name there for WebRT
> to be able to process them in Socorro.

nsAppRunner includes the xulapp's "product ID" in the crash report <http://hg.mozilla.org/mozilla-central/file/79262a88881d/toolkit/xre/nsAppRunner.cpp#l2958>, and that value is the same for all webapps (webapprt@mozilla.org).  Could Socorro use that value instead of "product name" to process these reports?

(The Crash Reporter dialog references the name of the app, presumably getting it from the same place as Socorro, so I think we'll want to modify the fix for bug 747409 to make nsXREAppData::name be the name of the app instead of the origin.  But that still won't address the Socorro issue.)

This is my previous patch with a bit of additional commentary and factoring out the runtime ID, which is now inserted into the runtime's INI file twice.

I haven't been able to test this on Mac, since I couldn't get the testcase in bug 323394 to crash the process there, but I would be happy to do so if I could figure out how.

I still don't know why the Details button doesn't work.
Assignee: nobody → myk
Attachment #626261 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #627844 - Flags: review?(benjamin)
(In reply to Myk Melez [:myk] [@mykmelez] from comment #18)
> (The Crash Reporter dialog references the name of the app, presumably
> getting it from the same place as Socorro, so I think we'll want to modify
> the fix for bug 747409 to make nsXREAppData::name be the name of the app
> instead of the origin.  But that still won't address the Socorro issue.)

Could that cause a problem if I say, launch two apps with the same name? A vague memory of mine I think indicates that we were using the origin to uniquely identify the app since the one app per origin specification guarantees that we can safely uniquely identify an app by its origin. 

Note - There may be a better way to do this. Or maybe there is a way to have an "app name" vs. "app identifier," given that identifiers must be unique?

> I haven't been able to test this on Mac, since I couldn't get the testcase
> in bug 323394 to crash the process there, but I would be happy to do so if I
> could figure out how.

Marcia or Kairo - Could you identify a reproducible crash that exists on OS X at a URL for Myk to test the patch with?
(In reply to Myk Melez [:myk] [@mykmelez] from comment #18)
> The fix for bug 747409
> <http://hg.mozilla.org/mozilla-central/rev/4c1aa39b85a9> sets
> nsXREAppData::name to the webapp origin (f.e. "www.giscloud.com;https;-1")
> on Windows so the code that looks for an existing process when you start up
> a xulapp treats each webapp as a distinct product.

Hmm, I understand. And that's the only way to achieve that?

> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17)
> > Hmm, we need all crash reports to have the same product name there for WebRT
> > to be able to process them in Socorro.
> 
> nsAppRunner includes the xulapp's "product ID" in the crash report
> <http://hg.mozilla.org/mozilla-central/file/79262a88881d/toolkit/xre/
> nsAppRunner.cpp#l2958>, and that value is the same for all webapps
> (webapprt@mozilla.org).  Could Socorro use that value instead of "product
> name" to process these reports?

We could apply some rewrite in Socorro processing, but that probably takes a while to do and roll out. Also, it very likely will lose the origin info as it will just overwrite the product name with some value we find working well. What product name does non-Windows use (as you said this is Windows-only)?
It probably will make sense to send the origin or actually app/page URL in the URL field of the crash report. Your report didn't include any (we usually send the active tab of the tabbrowser element, but I think that's somewhere in app-specific code anyhow) so there's room for improvement there anyhow.

> (The Crash Reporter dialog references the name of the app, presumably
> getting it from the same place as Socorro, so I think we'll want to modify
> the fix for bug 747409 to make nsXREAppData::name be the name of the app
> instead of the origin.  But that still won't address the Socorro issue.)

I'm pretty sure that the Crash Reporter dialog gets it from the process that called it directly, or from application.ini, and not from the crash report data, which it sends to Socorro.

> I still don't know why the Details button doesn't work.

You get very little data displayed there usually (as we don't have the stack in a readable form on the client, we only walk and symbolize it on the server), but you should get at least *something*, yes.

(In reply to Jason Smith [:jsmith] from comment #19)
> Marcia or Kairo - Could you identify a reproducible crash that exists on OS
> X at a URL for Myk to test the patch with?

Attachment 208471 [details] from bug 323394 has proven to me to be a reliable crasher. If you can call that in a webapp, you should be able to trigger "nicely".
Comment on attachment 627844 [details] [diff] [review]
patch v1: enables crash reporting

I really don't think that defining WEBAPP_RUNTIME_ID is necessary, but I guess it's ok. KaiRo, are you going to file the followup bugs to expose this data in socorro usefully? In particular I assume that by default these reports should be aggregated with Firefox reports and not be presented as a separate product.
Attachment #627844 - Flags: review?(benjamin) → review+
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> In particular I assume that by default these
> reports should be aggregated with Firefox reports and not be presented as a
> separate product.

Oh? I would have thought this would be a separate product in Socorro. In any case, a number of crash field rewrites and some other special-casing might be in order if we want to present those as "Firefox" crashes.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #20)
> (In reply to Myk Melez [:myk] [@mykmelez] from comment #18)
> > The fix for bug 747409
> > <http://hg.mozilla.org/mozilla-central/rev/4c1aa39b85a9> sets
> > nsXREAppData::name to the webapp origin (f.e. "www.giscloud.com;https;-1")
> > on Windows so the code that looks for an existing process when you start up
> > a xulapp treats each webapp as a distinct product.
> 
> Hmm, I understand. And that's the only way to achieve that?

nsNativeAppSupportWin.cpp uses the name of the app to uniquely identify the MessageWindow <http://hg.mozilla.org/mozilla-central/file/e8a025a7101b/toolkit/xre/nsNativeAppSupportWin.cpp#l475>, which various code (the installer, the message loop, and something that checks for an existing process on startup) then takes advantage of <http://mxr.mozilla.org/mozilla-central/search?string=messagewindow>.

We could make all that code use the ID of the app instead, but that feels like a much bigger and scarier change.

Alternately, we could find some way to modify the "existing process on startup" check so it checks for an existing process in a different way for webapps, since that check was the proximate reason we changed the product name for webapps.

But I don't know enough about the code to know what that would be nor how risky it is.  And that might regress other bugs we incidentally fixed when we changed the product name for webapps.

So it seems safer for multiple reasons to give each webapp its own product name, although I understand that this causes problems on the Socorro side of things. :-/


> > (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17)
> > > Hmm, we need all crash reports to have the same product name there for WebRT
> > > to be able to process them in Socorro.
> > 
> > nsAppRunner includes the xulapp's "product ID" in the crash report
> > <http://hg.mozilla.org/mozilla-central/file/79262a88881d/toolkit/xre/
> > nsAppRunner.cpp#l2958>, and that value is the same for all webapps
> > (webapprt@mozilla.org).  Could Socorro use that value instead of "product
> > name" to process these reports?
> 
> We could apply some rewrite in Socorro processing, but that probably takes a
> while to do and roll out.

We're landing this in nightly with the aim of shipping it in Firefox 15, which merges to Aurora next Monday, so we do have some time before we ship, although obviously the sooner we stand up crash reports the better; preferably during the next six week Aurora cycle.


> What product name does non-Windows use (as you said this is
> Windows-only)?

Yes, it's Windows-only.  On Mac, the product name is Webapp Runtime <http://mxr.mozilla.org/mozilla-central/source/webapprt/application.ini.in#12>.


> It probably will make sense to send the origin or actually app/page URL in
> the URL field of the crash report. Your report didn't include any (we
> usually send the active tab of the tabbrowser element, but I think that's
> somewhere in app-specific code anyhow) so there's room for improvement there
> anyhow.

Indeed!  I have filed a followup bug 759502 on that.


> > (The Crash Reporter dialog references the name of the app, presumably
> > getting it from the same place as Socorro, so I think we'll want to modify
> > the fix for bug 747409 to make nsXREAppData::name be the name of the app
> > instead of the origin.  But that still won't address the Socorro issue.)
> 
> I'm pretty sure that the Crash Reporter dialog gets it from the process that
> called it directly, or from application.ini, and not from the crash report
> data, which it sends to Socorro.

I suspect it gets it from nsXREAppData::name.  In any case, I filed followup bug 759505 to make it display the name of the app.


> > I still don't know why the Details button doesn't work.
> 
> You get very little data displayed there usually (as we don't have the stack
> in a readable form on the client, we only walk and symbolize it on the
> server), but you should get at least *something*, yes.

In my case pressing it has no effect.  I don't see any error messages in the console, though.  Perhaps this is a product of my build not being "official"?


> Attachment 208471 [details] from bug 323394 has proven to me to be a
> reliable crasher. If you can call that in a webapp, you should be able to
> trigger "nicely".

That one crashes on me on Windows after a couple minutes, but I can't get it to crash on Mac, where it simply hangs instead.


This patch just undoes the refactoring of the ID into WEBAPP_RUNTIME_ID, which I agree with bsmedberg is overkill.  Carrying forward review, and this is the version I'll check in to get basic crash reporting up and running, although we can still iterate on the exact values we're sending to the server in followup bugs as needed.
Attachment #627844 - Attachment is obsolete: true
Attachment #628108 - Flags: review+
Comment on attachment 628108 [details] [diff] [review]
patch v2: undoes refactoring of webapp runtime ID

https://hg.mozilla.org/integration/mozilla-inbound/rev/9b8a4c4b8c2c
Attachment #628108 - Flags: checkin+
Whiteboard: [blocking-webrtdesktop1+] → [blocking-webrtdesktop1+], [qa+]
(In reply to Myk Melez [:myk] [@mykmelez] from comment #23)
> Yes, it's Windows-only.  On Mac, the product name is Webapp Runtime
> <http://mxr.mozilla.org/mozilla-central/source/webapprt/application.ini.
> in#12>.

OK, I guess that's what Linux will send as well, then (i.e. that's what we send everywhere except on Windows), so it's what we should use on the Socorro side. I'll file a bug for rewriting the Windows reports to that on the Socorro side. That is, if you agree with me on comment #22 - if you think we should go with what bsmedberg assumed and count those as plain "Firefox", the rewrite there needs to be different.

> Indeed!  I have filed a followup bug 759502 on that.

> I suspect it gets it from nsXREAppData::name.  In any case, I filed followup
> bug 759505 to make it display the name of the app.

Thanks, those should both be helpful.

> In my case pressing it has no effect.  I don't see any error messages in the
> console, though.  Perhaps this is a product of my build not being "official"?

Hmm, not sure. Let's make sure we try once we have official builds with crash reporting support.

> That one crashes on me on Windows after a couple minutes, but I can't get it
> to crash on Mac, where it simply hangs instead.

Hrm. I found that one with a search for "crash" and "tescase" keywords in Bugzilla, you could try if you find a better one but we are usually pretty good at fixing those with testcases, esp. when they're content HTML.
Blocks: 759723
Blocks: 759724
https://hg.mozilla.org/mozilla-central/rev/9b8a4c4b8c2c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [blocking-webrtdesktop1+], [qa+] → [blocking-webrtdesktop1+], [qa+:jsmith]
In catching bug 761498 on win 7 64-bit, I noticed that breakpad did not appear 3 times on an inbound build when I crashed evernote web. Reopening - this isn't fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [blocking-webrtdesktop1+], [qa+:jsmith] → [blocking-webrtdesktop1+]
Inbound build date: 6/4/2012
Breakpad appears in desktop firefox for the same crash, but not in the web runtime.
Target Milestone: Firefox 15 → Firefox 16
The obvious fix: webapprt/application.ini.in conditionally enables the crash reporter based on whether or not MOZILLA_OFFICIAL is defined, so we have to define MOZILLA_OFFICIAL in webapprt/Makefile.in in order for it to be available to webapprt/application.ini.in.
Attachment #632969 - Flags: review?(felipc)
Attachment #632969 - Flags: review?(felipc) → review+
Whiteboard: [blocking-webrtdesktop1+] → [blocking-webrtdesktop1+], [qa+]
https://hg.mozilla.org/mozilla-central/rev/a75072d8b414
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Sweet. I got breakpad to come up on windows. Will test on Mac and Linux to confirm.
(In reply to Jason Smith [:jsmith] from comment #32)
> Sweet. I got breakpad to come up on windows. Will test on Mac and Linux to
> confirm.

Sample report generated - https://crash-stats.mozilla.com/report/index/bp-543e0469-1555-4fae-bfb3-7dad12120615
(In reply to Jason Smith [:jsmith] from comment #33)
> (In reply to Jason Smith [:jsmith] from comment #32)
> > Sweet. I got breakpad to come up on windows. Will test on Mac and Linux to
> > confirm.
> 
> Sample report generated -
> https://crash-stats.mozilla.com/report/index/bp-543e0469-1555-4fae-bfb3-
> 7dad12120615

Can't tell if Mac works or not, as the test case given does not reliably crash on OS X. The crash did occur had a corrupt dump, which seems to be not right, but I need a reproducible test case here for an OS X crash (and linux as well).
Keywords: testcase-wanted
Finally found one and verified that the crash reporter comes up and sends a report on Mac.
Keywords: testcase-wanted
Verified on Windows, Mac, and Linux.
Status: RESOLVED → VERIFIED
Whiteboard: [blocking-webrtdesktop1+], [qa+] → [blocking-webrtdesktop1+], [qa!]
Flags: in-moztrap?(jsmith)
Myk - Would it be worth nominating this bug for Aurora? If so, can you nominate this?
QA Contact: desktop-runtime → jsmith
Flags: in-moztrap?(jsmith)
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: