Mozilla Crash Reporter never succeeds in sending crash reports

RESOLVED FIXED in Firefox 40

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: mconley, Assigned: ted)

Tracking

(Blocks 1 bug, {regression})

Trunk
mozilla41
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox40+ fixed, firefox41+ verified)

Details

Attachments

(3 attachments, 1 obsolete attachment)

I've been noticing this the last few days. I would hit a hang, and then nsTerminator would come along to kill things, and then the Mozilla Crash Reporter dialog would come up.

And like a good Nightly user, I would ask to submit the crash report while restarting Firefox.

The problem is that the dialog shows the spinner, saying "Submitting your report", and never goes away. So I have to suspect (and looking at ~/Library/Application Support/Firefox/Crash Reports/pending/ supports this) that the crash report is not being successfully sent, which isn't a great result.

Eventually I just kill the dialog.

If I go to about:crashes, I can see the pending crash report, and clicking it causes it to upload without any issues.
The crash reporter uses system networking libraries for submitting. If you load "https://crash-reports.mozilla.com/submit" in Safari, does it load properly?

If you've got a local build handy, try running it with MOZ_CRASHREPORTER=1 in the environment, then kill -ABRT the process to trigger a crash, and see if you can reproduce it. If so, maybe try attaching a debugger to the crash reporter application before clicking "restart and submit", and see where it's hanging?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #1)
> The crash reporter uses system networking libraries for submitting.

That seems rather risky to rely entirely on. Ideally, on each application startup an async task should be kicked off to check for failed (user approved) crash report submissions and send them itself. Broken submissions shouldn't be ignored. :/
Flags: needinfo?(mconley)
Gijs said he saw this and I just did too. Were there recent breakpad changes? If not, perhaps it's a server-side issue?

https://crash-reports.mozilla.com/submit loads fine for me in Safari and redirects to https://crash-stats.mozilla.com/home/products/Firefox
Ooh, I think I may have found the problem in Console:

crashreporter[15026]: *** setObjectForKey: object cannot be nil (key: TelemetryEnvironment)
[Tracking Requested - why for this release]: Breaks crash report upload

I would guess this was caused by bug 1121013, specifically https://hg.mozilla.org/mozilla-central/rev/5add1e8ab1e2. If this is the case then it affects Fx40.
Blocks: 1121013
Flags: needinfo?(mconley)
Keywords: regression
(In reply to Matthew N. [:MattN] from comment #5)
> If this is the case then it affects Fx40.

Well, I'm on Aurora 40 and having to manually submit reports, so hopefully it's the same issue.

A couple side points:

1) Tests to catch failures in the future would be nice. :|

2) It really would be nice to have the crash reporter dialog only ask for permission to submit on the first crash, then deal with comments and per-report permission to send URL via an in-content UI loaded on next start after a crash. This would be a lot cleaner and would make the effective crash turn-around time much lower. Doing everything via the crash reporter as-is the case now would then only be a last resort if everything else fails.
Flags: needinfo?(benjamin)
This sounds more like bug 1140132.
(In reply to Dave Garrett from comment #2)
> That seems rather risky to rely entirely on. Ideally, on each application
> startup an async task should be kicked off to check for failed (user
> approved) crash report submissions and send them itself. Broken submissions
> shouldn't be ignored. :/

As designed we figured it was less risky than relying on our networking stack which might be unusable in the case of a startup crash or other catastrophic event. I agree that "automatically try to resubmit crashes inside the browser that failed to submit from the crashreporter app" would be a good thing to implement.

(In reply to Dave Garrett from comment #6)
> 1) Tests to catch failures in the future would be nice. :|

Agreed. Unfortunately it's hard to write tests to test the native crash reporter dialogs. There's an ancient bug (bug 379290) on adding an automatic submission mode. If we fixed that up we could write some tests.
 
> 2) It really would be nice to have the crash reporter dialog only ask for
> permission to submit on the first crash, then deal with comments and
> per-report permission to send URL via an in-content UI loaded on next start
> after a crash. This would be a lot cleaner and would make the effective
> crash turn-around time much lower. Doing everything via the crash reporter
> as-is the case now would then only be a last resort if everything else fails.

This is bug 832967.
(In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> This sounds more like bug 1140132.

Yeah, that annotation comes from here:
https://hg.mozilla.org/mozilla-central/annotate/b9424d63fe35/toolkit/components/telemetry/TelemetryStartup.js#l41

which was added in that bug. I can't reproduce this locally. Crashing with "crash me now" and submitting the reports on my OS X 10.10.3 system works fine.

My best guess, based on the log in comment 4 and some code inspection, is that the JSON we're sticking in that TelemetryEnvironment annotation has invalid UTF-8 in some cases and the crashreporter is choking on it.

The Mac crashreporter creates an NSString from each std::string that it parsed out of the .extra file:
https://hg.mozilla.org/mozilla-central/annotate/b9424d63fe35/toolkit/crashreporter/client/crashreporter_osx.mm#l571

using [NSString stringWithUTF8String:]:
https://hg.mozilla.org/mozilla-central/annotate/b9424d63fe35/toolkit/crashreporter/client/crashreporter_osx.mm#l43

The docs don't say so explicitly, but some googling indicates that stringWithUTF8String can return `nil` in cases where it fails to parse the data.

We should:
1) Fix the client to drop annotations where stringWithUTF8String fails, it's better than losing the whole report.
2) Figure out what's in the annotation that it's choking on and fix that.

Can someone who has reproduced this attach (or email me) an .extra file from ~/Library/Application Support/Firefox/Crash Reports/pending from when you reproduced the problem? If you didn't go into about:crashes and submit it the files should still be under pending. I'd like to see what the TelemetryEnvironment line.
Actually for #2, I think the problem is that annotateCrashReport takes ACString as key/data:
https://dxr.mozilla.org/mozilla-central/source/xpcom/system/nsICrashReporter.idl#64

If we pass a non-ASCII string we probably just get garbage bytes. It's entirely possible for a JSON string to have non-ASCII bytes.
Tracking enabled for 40 & 41. From what I understand above, it sounds like providing reliable feedback would be handy from a ux pov, and the technical perspective you guys seem to understand.
Anthony I wonder if you can help out testing this with Ted if Andrei can't do it? Looks like we need someone with XP service pack 2.
Flags: needinfo?(anthony.s.hughes)
Flags: needinfo?(andrei.vaida)
To be clear, I'm able to reproduce this on OS X 10.8.5 - and I think the error that MattN pointed out in comment 4 is also OS X specific.

So any machine running OS X is probably fine.
Assignee: nobody → ted
Posted file MozReview Request: bz://1166759/ted (obsolete) —
/r/9197 - bug 1166759 - force annotateCrashReport arguments to be UTF-8. r?bsmedberg
/r/9199 - bug 1166759 - don't put nil values in POST data in Mac crash reporter. r?bsmedberg

Pull down these commits:

hg pull -r 1781a0cd8c70d5f244030d24336b6144b25f69ab https://reviewboard-hg.mozilla.org/gecko/
Attachment #8608871 - Flags: review?(benjamin)
https://reviewboard.mozilla.org/r/9195/#review7851

::: xpcom/system/nsICrashReporter.idl:17
(Diff revision 1)
>  [scriptable, uuid(ee431adc-21dd-42d9-968b-e19b4c62960a)]

Drive-by - you'll need to update the uuid.
I did both fixes that I mentioned in comment 9/comment 10:
* Change nsICrashReporter.annotateCrashReport to take AUTF8String parameters instead of ACString, so that if JavaScript callers pass in non-ASCII we get actual UTF-8 instead of garbage.
* Change the Mac crash reporter to drop any annotations it gets that aren't utf-8, so that we at least get a crash report.
mconley sent me one of his .extra files, and my theory seems sound:
```
>>> import codecs
>>> f = codecs.open('/home/luser/downloads/0E0A3910-6562-4260-841E-F56733B4F388.extra', 'r', 'utf_8')
>>> data = f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 668, in read
    return self.reader.read(size)
  File "/usr/lib/python2.7/codecs.py", line 474, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 7407: invalid start byte
```

The byte in question seems to be part of the Adobe Acrobat plugin description:
00001cd0  22 3a 22 41 64 6f 62 65  ae 20 41 63 72 6f 62 61  |":"Adobe. Acroba|

This would be why I couldn't reproduce, I don't have that plugin installed.
There's Unicode in my `.extra` files as well, specifically an `®` on my graphics drivers description. (Aurora 40 on Linux)
(In reply to Liz Henry (:lizzard) from comment #12)
> Anthony I wonder if you can help out testing this with Ted if Andrei can't
> do it? Looks like we need someone with XP service pack 2.

It's not clear to me what you need testing for in this bug.
Flags: needinfo?(anthony.s.hughes)
I think I have some STR that QA could try on any Apple hardware:

STR:

1) Make sure the Adobe Acrobat plugin is installed
2) Make sure that the crash reporter is enabled (Preferences > Advanced > Data Choices, "Enable Crash Reporter")
3) Crash the browser somehow. https://addons.mozilla.org/en-US/firefox/addon/nightly-tester-tools/?src=ss, I believe, allows you to crash the main process reliably.
3) Once the browser has been crashed, and the crash dialog comes up, attempt to submit the crash report.

ER:

After a few seconds, the crash reporter dialog should go away, having submitted the report successfully.

AR:

The crash report dialog never goes away - we just see it saying "Submitting your report", and showing a spinner forever. It'll stick around until you close the dialog manually.
Just an addendum to the previous comment, you can install "Crash me now" from http://people.mozilla.com/~tmielczarek/crashme/ and use Tools->Crash me now, or you can find the PID of the Firefox process and run `kill -ABRT <pid>` from a shell.
Blocks: 1140132
No longer blocks: 1121013
https://reviewboard.mozilla.org/r/9199/#review7915

r=me with mconley's UUID thing addressed and the logging if it's possible.

::: toolkit/crashreporter/client/crashreporter_osx.mm:573
(Diff revision 1)
> +    if (key && value) {

It seems like we want a LogMessage here, unless that's not possible for some reason.
Flags: needinfo?(benjamin)
Attachment #8608871 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/1c42ef354732
https://hg.mozilla.org/mozilla-central/rev/6fe131387fb2
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
(In reply to Liz Henry (:lizzard) from comment #12)
> Anthony I wonder if you can help out testing this with Ted if Andrei can't
> do it? Looks like we need someone with XP service pack 2.

We'll have a look at this. I'm assigning Catalin to try and verify it, based on Comment 20 (thanks Mike!).
Flags: needinfo?(andrei.vaida) → qe-verify+
QA Contact: catalin.varga
We should uplift this to Aurora once the fix is verified on Nightly.
Verified as fixed using:

FF 41
Build Id: 20150525030205
OS: Mac Os X 10.9.5
Flags: qe-verify+
Comment on attachment 8608871 [details]
MozReview Request: bz://1166759/ted

Approval Request Comment
[Feature/regressing bug #]: Regression from bug 1140132
[User impact if declined]: Some users' crash reports will fail to send on OS X
[Describe test coverage new/current, TreeHerder]: An xpcshell test case was aadded to cover part of this scenario.
[Risks and why]: Very low risk, the meat of the fix is simply changing an XPIDL type to force XPConnect to do UTF-8 translation instead of chopping off UTF-16 bytes.
[String/UUID change made/needed]: None
Attachment #8608871 - Flags: approval-mozilla-aurora?
Attachment #8608871 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I just tested latest Aurora and this appears fixed for my case as well. (comment 18)
Blocks: 1169230
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> (In reply to Dave Garrett from comment #6)
> > 1) Tests to catch failures in the future would be nice. :|
> 
> Agreed. Unfortunately it's hard to write tests to test the native crash
> reporter dialogs. There's an ancient bug (bug 379290) on adding an automatic
> submission mode. If we fixed that up we could write some tests.

I filed bug 1169230 on writing tests.
Attachment #8608871 - Attachment is obsolete: true
Attachment #8620338 - Flags: review+
Attachment #8620339 - Flags: review+
You need to log in before you can comment on or make changes to this bug.