Sometimes Breakpad produces invalid minidump files for crashes, which also results in missing the .extra file
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: whimboo, Unassigned)
References
Details
As seen while I was working on the investigation for bug 980938, the crash reporter does not always write an .extra file to disk. In some cases only the minidump file exists. In those cases Firefox is not able to send the report to Soccorro.
Comment 1•10 years ago
|
||
Henrik said he saw this while reproducing bug 980938.
Reporter | ||
Comment 2•10 years ago
|
||
So it's not the Crash Reporter which is creating that file bug Firefox itself. Just to add this happens when we run our tests with Mozmill and the crash reporter being disabled via the environment variable. mozcrash only backups the .dmp file but doesn't find the .extra file.
Reporter | ||
Updated•10 years ago
|
Comment 3•10 years ago
|
||
Bug 980938 is a plugin crash. Does this bug affect Firefox crashes or plugin crashes or both? When you say "crash reporter being disabled via the environment variable" you just mean launching the crash reporter app, not turning off the crash reporting system entirely?
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #3) > Bug 980938 is a plugin crash. Does this bug affect Firefox crashes or plugin > crashes or both? In this specific case I have only tested this plugin crash. The crash handling feature in Mozmill is kinda new (released by end of last week), so I cannot give a full answer yet. > When you say "crash reporter being disabled via the environment variable" > you just mean launching the crash reporter app, not turning off the crash > reporting system entirely? We are setting the environment variable MOZ_CRASHREPORTER_NO_REPORT, so that the crash reporter does not come up and breaks our follow-up tests. That's all, yes. See the code in Mozmill: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L40
Comment 5•10 years ago
|
||
http://mxr.mozilla.org/mozilla-central/source/dom/plugins/test/mochitest/test_crash_notify_no_report.xul#100 I believe that the current behavior is intentional, although I'm not sure what the exact mechanism is which causes this. The _NO_REPORT envvar changes the return value of the ShouldReport function in nsExceptionHandler.cpp and I think that propagates to cause us not to recognize or write an .extra file for plugin/content process crashes.
Reporter | ||
Comment 6•10 years ago
|
||
It's not related to this env variable, given that in most cases we have the .extra file. It only happens intermittently that we fail here. Also for the same kind of crash, which is the plugin crash during shutdown here.
Reporter | ||
Comment 8•10 years ago
|
||
As we have seen today we also end-up with empty .extra files.
Reporter | ||
Comment 9•10 years ago
|
||
Today we had a strange Flash crash (bug 995182) where the following is visible: 05:24:47 Timeout: bridge.execFunction("ce12c2f0-c174-11e3-953d-005056bb55a0", bridge.registry["{7660d60a-4b79-42aa-a492-80134a2799bf}"]["runTestFile"], ["c:\\jenkins\\workspace\\mozilla-aurora_remote\\data\\mozmill-tests\\firefox\\tests\\remote\\testSecurity\\testDVCertificate.js", null]) 05:24:47 05:24:47 PROCESS-CRASH | c:\jenkins\workspace\mozilla-aurora_remote\data\mozmill-tests\firefox\tests\remote\testSecurity\testDVCertificate.js | application crashed [Unknown top frame] 05:24:47 Crash dump filename: c:\jenkins\workspace\mozilla-aurora_remote\data\profile\minidumps\771e8fbb-779c-4d9c-b706-40efd247c585.dmp 05:24:47 No symbols path given, can't process dump. 05:24:47 MINIDUMP_STACKWALK not set, can't process dump. 05:24:47 Traceback (most recent call last): 05:24:47 File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\lib\site-packages\mozrunner\base.py", line 176, in check_for_crashes 05:24:47 quiet=quiet) 05:24:47 File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\lib\site-packages\mozcrash\mozcrash.py", line 146, in check_for_crashes 05:24:47 shutil.move(d, dump_save_path) 05:24:47 File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\Lib\shutil.py", line 301, in move 05:24:47 copy2(src, real_dst) 05:24:47 File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\Lib\shutil.py", line 130, in copy2 05:24:47 copyfile(src, dst) 05:24:47 File "c:\jenkins\workspace\mozilla-aurora_remote\mozmill-env-windows\python\Lib\shutil.py", line 82, in copyfile 05:24:47 with open(src, 'rb') as fsrc: 05:24:47 IOError: [Errno 13] Permission denied: 'c:\\jenkins\\workspace\\mozilla-aurora_remote\\data\\profile\\minidumps\\771e8fbb-779c-4d9c-b706-40efd247c585.dmp' 05:24:48 Timeout: bridge.set("f29624a1-c174-11e3-8458-005056bb55a0", Components.utils.import("resource://mozmill/modules/frame.js")) 05:24:48 05:25:19 05:25:19 ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv 05:25:19 05:25:19 05:25:19 ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv 05:25:19 05:25:19 05:25:19 ###!!! [Parent][MessageChannel::Call] Error: Channel timeout: cannot send/recv 05:25:19 05:25:19 TEST-PASS | testSecurity\testDVCertificate.js | testLarryBlue 05:25:19 TEST-END | testSecurity\testDVCertificate.js | finished in 92844ms 05:26:07 PROCESS-CRASH | c:\jenkins\workspace\mozilla-aurora_remote\data\mozmill-tests\firefox\tests\remote\testSecurity\testDVCertificate.js | application crashed [Unknown top frame] 05:26:07 Crash dump filename: c:\jenkins\workspace\mozilla-aurora_remote\data\profile\minidumps\771e8fbb-779c-4d9c-b706-40efd247c585.dmp 05:26:07 No symbols path given, can't process dump. 05:26:07 MINIDUMP_STACKWALK not set, can't process dump. 05:26:07 mozcrash INFO | Saved minidump as C:\Users\mozauto\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending\771e8fbb-779c-4d9c-b706-40efd247c585.dmp 05:26:07 mozcrash INFO | Saved app info as C:\Users\mozauto\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending\771e8fbb-779c-4d9c-b706-40efd247c585.extra Not sure actually how many crashes those are (two or only 1?), but for 771e8fbb-779c-4d9c-b706-40efd247c585 which appeared multiple times the .extra file was not present at the time the crash happend. Only the second time when we called to mozcrash it detected the .extra file for the same crash id. That happened exactly 1:20min later. How can this be? Interestingly this crash doesn't appear in about:config on that box. Also not in the sent reports folder on disk. Something must have wiped them out. :S
Comment 10•10 years ago
|
||
For plugin/content crashes, writing the crash dump and the extra do happen at different times.
Reporter | ||
Comment 11•10 years ago
|
||
When is the .extra file written at latest? I assume before a shutdown of Firefox. So it should always exist when the process exit. That's when we try to access it.
Reporter | ||
Comment 12•10 years ago
|
||
Over on bug 1024141 we have seen that in some cases when we miss the .extra file, the minidump is broken. I can't say for sure if that is true for all the cases here. Andreea given that this is related to the Flash crasher you were investigating, I would like to ask you to take a machine and install the release version of Flash. Then run the test case for the flash crash and check if the minidump files are broken if no .extra file is provided. If that is always the case, we might found the reason for the issue on that bug.
Reporter | ||
Comment 13•10 years ago
|
||
CC'ing Robert so that he is aware of this problem that we not always send crash reports for plugin crashes.
Comment 14•10 years ago
|
||
The case of bug 1024141 is a Flash process crashing (not Firefox or even plugin-container). Flash crash reporting is best-effort but isn't as much a high priority as missing a Firefox crash. Especially as in that case, where the minidump itself isn't written fully.
Comment 15•10 years ago
|
||
Tested this on a staging win 7 machine, in 60 runs I got 2 dmp files without the .extra file, which I then opened with Windbg and they were corrupted/broken. I'll leave it running to get some more cases and be sure it's always like this.
Comment 16•10 years ago
|
||
Ran about 150 times more the testcase and got just another case without the .extra file, where the dmp was also broken.
Reporter | ||
Comment 17•10 years ago
|
||
Thanks Andreea. So it indeed looks like that when we have a missing .extra file, the minidump file is broken. As result the .extra file is not written. So far we only recognized it for Flash plugin crashes, so lets update the summary accordingly. CC'ing Jerome to let him know about it.
Reporter | ||
Updated•10 years ago
|
Comment 18•10 years ago
|
||
You really don't need to bother Jeromie about our crash-reporting system. There's nothing he can do to fix it.
Reporter | ||
Updated•4 years ago
|
Comment 19•4 years ago
|
||
Hi :gsvelto, could someone look into this issue? It has increased in frequency and caused 27 failures in the last week in bug 1439522.
Comment 20•4 years ago
|
||
I had a look at bug 1439522 but it's not the same issue. In bug 1439522 the minidump is not being generated by Gecko, it's being generated by the test harness itself by calling Windows-specific functions to generate the minidump. See this line. Specifically this is happening in the mozcrash.py script, the relevant code is here.
BTW this particular issue - the .extra file not being written sometimes by Gecko - should have been fixed recently for the vast majority of the cases.
Comment 21•4 years ago
|
||
Ah thanks :gsvelto! I'll see if someone who works on mozcrash can help with it.
Reporter | ||
Comment 22•4 years ago
|
||
Here some additional information as just discovered. Maybe we should update the bug's summary?
(In reply to Gabriele Svelto [:gsvelto] from bug 1439522 comment #49)
The stackwalker couldn't analyze the minidump because it's empty. That's interesting because when this happens we often attributed it to an issue in Gecko, but this is a minidump that was generated by mozcrash. mozcrash just calls the appropriate Windows API which means that the failure to write the minidump didn't happen in our code... which is encouraging. Either way yeah, I think we can close this bug.
Updated•2 years ago
|
Description
•