Open Bug 981641 Opened 10 years ago Updated 2 years ago

Sometimes Breakpad produces invalid minidump files for crashes, which also results in missing the .extra file

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

People

(Reporter: whimboo, Unassigned)

References

Details

As seen while I was working on the investigation for bug 980938, the crash reporter does not always write an .extra file to disk. In some cases only the minidump file exists. In those cases Firefox is not able to send the report to Soccorro.
Henrik said he saw this while reproducing bug 980938.
So it's not the Crash Reporter which is creating that file bug Firefox itself. Just to add this happens when we run our tests with Mozmill and the crash reporter being disabled via the environment variable. mozcrash only backups the .dmp file but doesn't find the .extra file.
Summary: Crash reporter does not always write .extra file → Firefox does not always write .extra file
Summary: Firefox does not always write .extra file → Firefox does not always write an .extra file when it crashes
Bug 980938 is a plugin crash. Does this bug affect Firefox crashes or plugin crashes or both?

When you say "crash reporter being disabled via the environment variable" you just mean launching the crash reporter app, not turning off the crash reporting system entirely?
Flags: needinfo?(hskupin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Bug 980938 is a plugin crash. Does this bug affect Firefox crashes or plugin
> crashes or both?

In this specific case I have only tested this plugin crash. The crash handling feature in Mozmill is kinda new (released by end of last week), so I cannot give a full answer yet. 

> When you say "crash reporter being disabled via the environment variable"
> you just mean launching the crash reporter app, not turning off the crash
> reporting system entirely?

We are setting the environment variable MOZ_CRASHREPORTER_NO_REPORT, so that the crash reporter does not come up and breaks our follow-up tests. That's all, yes. See the code in Mozmill:

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L40
Flags: needinfo?(hskupin)
http://mxr.mozilla.org/mozilla-central/source/dom/plugins/test/mochitest/test_crash_notify_no_report.xul#100 I believe that the current behavior is intentional, although I'm not sure what the exact mechanism is which causes this.

The _NO_REPORT envvar changes the return value of the ShouldReport function in nsExceptionHandler.cpp and I think that propagates to cause us not to recognize or write an .extra file for plugin/content process crashes.
It's not related to this env variable, given that in most cases we have the .extra file. It only happens intermittently that we fail here. Also for the same kind of crash, which is the plugin crash during shutdown here.
As we have seen today we also end-up with empty .extra files.
Summary: Firefox does not always write an .extra file when it crashes → Firefox does not always create or write content into an .extra file when it crashes
Today we had a strange Flash crash (bug 995182) where the following is visible:

05:24:47 Timeout: bridge.execFunction("ce12c2f0-c174-11e3-953d-005056bb55a0", bridge.registry["{7660d60a-4b79-42aa-a492-80134a2799bf}"]["runTestFile"], ["c:\\jenkins\\workspace\\mozilla-aurora_remote\\data\\mozmill-tests\\firefox\\tests\\remote\\testSecurity\\testDVCertificate.js", null])
05:24:47
05:24:47 PROCESS-CRASH | c:\jenkins\workspace\mozilla-aurora_remote\data\mozmill-tests\firefox\tests\remote\testSecurity\testDVCertificate.js | application crashed [Unknown top frame]
05:24:47 Crash dump filename: c:\jenkins\workspace\mozilla-aurora_remote\data\profile\minidumps\771e8fbb-779c-4d9c-b706-40efd247c585.dmp
05:24:47 No symbols path given, can't process dump.
05:24:47 MINIDUMP_STACKWALK not set, can't process dump.
05:24:47 Traceback (most recent call last):
05:24:47   File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\lib\site-packages\mozrunner\base.py", line 176, in check_for_crashes
05:24:47     quiet=quiet)
05:24:47   File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\lib\site-packages\mozcrash\mozcrash.py", line 146, in check_for_crashes
05:24:47     shutil.move(d, dump_save_path)
05:24:47   File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\Lib\shutil.py", line 301, in move
05:24:47     copy2(src, real_dst)
05:24:47   File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\Lib\shutil.py", line 130, in copy2
05:24:47     copyfile(src, dst)
05:24:47   File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\Lib\shutil.py", line 82, in copyfile
05:24:47     with open(src, 'rb') as fsrc:
05:24:47 IOError: [Errno 13] Permission denied: 'c:\\jenkins\\workspace\\mozilla-aurora_remote\\data\\profile\\minidumps\\771e8fbb-779c-4d9c-b706-40efd247c585.dmp'
05:24:48 Timeout: bridge.set("f29624a1-c174-11e3-8458-005056bb55a0", Components.utils.import("resource://mozmill/modules/frame.js"))
05:24:48
05:25:19
05:25:19 ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv
05:25:19
05:25:19
05:25:19 ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv
05:25:19
05:25:19
05:25:19 ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv
05:25:19
05:25:19 TEST-PASS | testSecurity\testDVCertificate.js | testLarryBlue
05:25:19 TEST-END | testSecurity\testDVCertificate.js | finished in 92844ms
05:26:07 PROCESS-CRASH | c:\jenkins\workspace\mozilla-aurora_remote\data\mozmill-tests\firefox\tests\remote\testSecurity\testDVCertificate.js | application crashed [Unknown top frame]
05:26:07 Crash dump filename: c:\jenkins\workspace\mozilla-aurora_remote\data\profile\minidumps\771e8fbb-779c-4d9c-b706-40efd247c585.dmp
05:26:07 No symbols path given, can't process dump.
05:26:07 MINIDUMP_STACKWALK not set, can't process dump.
05:26:07 mozcrash INFO | Saved minidump as C:\Users\mozauto\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending\771e8fbb-779c-4d9c-b706-40efd247c585.dmp
05:26:07 mozcrash INFO | Saved app info as C:\Users\mozauto\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending\771e8fbb-779c-4d9c-b706-40efd247c585.extra

Not sure actually how many crashes those are (two or only 1?), but for 771e8fbb-779c-4d9c-b706-40efd247c585 which appeared multiple times the .extra file was not present at the time the crash happend. Only the second time when we called to mozcrash it detected the .extra file for the same crash id. That happened exactly 1:20min later. How can this be?

Interestingly this crash doesn't appear in about:config on that box. Also not in the sent reports folder on disk. Something must have wiped them out. :S
For plugin/content crashes, writing the crash dump and the extra do happen at different times.
When is the .extra file written at latest? I assume before a shutdown of Firefox. So it should always exist when the process exit. That's when we try to access it.
Over on bug 1024141 we have seen that in some cases when we miss the .extra file, the minidump is broken. I can't say for sure if that is true for all the cases here.

Andreea given that this is related to the Flash crasher you were investigating, I would like to ask you to take a machine and install the release version of Flash. Then run the test case for the flash crash and check if the minidump files are broken if no .extra file is provided. If that is always the case, we might found the reason for the issue on that bug.
Flags: needinfo?(andreea.matei)
CC'ing Robert so that he is aware of this problem that we not always send crash reports for plugin crashes.
The case of bug 1024141 is a Flash process crashing (not Firefox or even plugin-container). Flash crash reporting is best-effort but isn't as much a high priority as missing a Firefox crash. Especially as in that case, where the minidump itself isn't written fully.
Tested this on a staging win 7 machine, in 60 runs I got 2 dmp files without the .extra file, which I then opened with Windbg and they were corrupted/broken.

I'll leave it running to get some more cases and be sure it's always like this.
Flags: needinfo?(andreea.matei)
Ran about 150 times more the testcase and got just another case without the .extra file, where the dmp was also broken.
Thanks Andreea. So it indeed looks like that when we have a missing .extra file, the minidump file is broken. As result the .extra file is not written. So far we only recognized it for Flash plugin crashes, so lets update the summary accordingly. CC'ing Jerome to let him know about it.
Summary: Firefox does not always create or write content into an .extra file when it crashes → Sometimes Firefox produces invalid minidump files for crashes in the Flash plugin, which also results in missing the .extra file
Summary: Sometimes Firefox produces invalid minidump files for crashes in the Flash plugin, which also results in missing the .extra file → Sometimes Breakpad produces invalid minidump files for crashes in the Flash plugin, which also results in missing the .extra file
You really don't need to bother Jeromie about our crash-reporting system. There's nothing he can do to fix it.
OS: Windows 7 → All
Hardware: x86_64 → All
Summary: Sometimes Breakpad produces invalid minidump files for crashes in the Flash plugin, which also results in missing the .extra file → Sometimes Breakpad produces invalid minidump files for crashes, which also results in missing the .extra file
Version: 30 Branch → unspecified
Blocks: 1439522

Hi :gsvelto, could someone look into this issue? It has increased in frequency and caused 27 failures in the last week in bug 1439522.

Flags: needinfo?(gsvelto)

I had a look at bug 1439522 but it's not the same issue. In bug 1439522 the minidump is not being generated by Gecko, it's being generated by the test harness itself by calling Windows-specific functions to generate the minidump. See this line. Specifically this is happening in the mozcrash.py script, the relevant code is here.

BTW this particular issue - the .extra file not being written sometimes by Gecko - should have been fixed recently for the vast majority of the cases.

Flags: needinfo?(gsvelto)

Ah thanks :gsvelto! I'll see if someone who works on mozcrash can help with it.

No longer blocks: 1439522

Here some additional information as just discovered. Maybe we should update the bug's summary?

(In reply to Gabriele Svelto [:gsvelto] from bug 1439522 comment #49)

The stackwalker couldn't analyze the minidump because it's empty. That's interesting because when this happens we often attributed it to an issue in Gecko, but this is a minidump that was generated by mozcrash. mozcrash just calls the appropriate Windows API which means that the failure to write the minidump didn't happen in our code... which is encouraging. Either way yeah, I think we can close this bug.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.