Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Deploy new minidump_stackwalk to solve stackwalking issues with MSVC2010

VERIFIED FIXED in 2.4.4

Status

Socorro
Backend
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: ted, Assigned: rhelmer)

Tracking

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

6 years ago
Since we switched to VC++ 2010, we've been seeing some bad stacks in certain cases. It looks like the new compiler is producing stack unwind information that Breakpad doesn't know how to handle. There are two specific things that are causing problems:
1) A new operator in the unwind expressions: @. This appears to be an alignment operator.
2) A new built-in value: .raSearch. We haven't narrowed down exactly what the difference is between this and .raSearchStart.

I have a WIP Breakpad patch that adds support for both of these, but I'm not sure if it's correct, and if it's going to fix the problem.
(Reporter)

Comment 1

6 years ago
Created attachment 596661 [details] [diff] [review]
Add support for @ operator and .raSearch in Breakpad x86 stackwalker

Very much a WIP. I don't know if the @ implementation is correct, and I'm pretty sure the value of .raSearch isn't perfect.
(Reporter)

Comment 2

6 years ago
I put my patch for the @ operator up for review upstream:
http://breakpad.appspot.com/349002/

bsmedberg has a patch to make .raSearch handling work which he's also going to upstream.
Assignee: nobody → benjamin
The .raSearch fix is up at http://breakpad.appspot.com/349003/

I need the @ patch to land before I can upload a useful patch for the remaining issue, since it touches/refactors the same code blocks.
Created attachment 597449 [details] [diff] [review]
Postfix tokenizer fix, on top of the "@" change

For archiving.
(Reporter)

Comment 5

6 years ago
The @ operator landed upstream:
http://code.google.com/p/google-breakpad/source/detail?r=923
(Reporter)

Updated

6 years ago
Attachment #596661 - Attachment is obsolete: true
http://code.google.com/p/google-breakpad/source/detail?r=926
http://code.google.com/p/google-breakpad/source/detail?r=927

This bug is now ready to morph into a deployment bug. We need minidump_stackwalk recompiled against breakpad r927 and deployed to staging.
Assignee: benjamin → server-ops
Component: General → Server Operations: Web Operations
Product: Socorro → mozilla.org
QA Contact: general → cshields
Version: unspecified → other
Summary: Stackwalking issues with MSVC2010 → Deploy new minidump_stackwalk to solve stackwalking issues with MSVC2010
(Assignee)

Comment 7

6 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> http://code.google.com/p/google-breakpad/source/detail?r=926
> http://code.google.com/p/google-breakpad/source/detail?r=927
> 
> This bug is now ready to morph into a deployment bug. We need
> minidump_stackwalk recompiled against breakpad r927 and deployed to staging.

We now have a jenkins job for breakpad:
https://ci.mozilla.org/job/breakpad/52/

The Socorro job automatically pulls the last successful build (which is that one):
https://github.com/mozilla/socorro/blob/1c80b6cb7afcc0072d7b6a7010398d2cc0fcf906/scripts/build.sh#L12
https://ci.mozilla.org/job/breakpad/lastSuccessfulBuild/?

So I think we don't need to do anything special here, this should automatically go out on the next Socorro release (and should be on the dev server crash-stats-dev.allizom.org already)
rhelmer is going to help shepherd this onto stage so that it can go out on Wednesday with the scheduled release.
Assignee: server-ops → rhelmer
Created attachment 601042 [details]
Dump for testing on stage

I'm going to attach a .dump and .extra file to test this with.
Created attachment 601043 [details]
.extra file for testing on stage
(Assignee)

Updated

6 years ago
Attachment #601043 - Attachment mime type: application/xml → application/octet-stream
(Assignee)

Updated

6 years ago
Component: Server Operations: Web Operations → Backend
Product: mozilla.org → Socorro
QA Contact: cshields → backend
Target Milestone: --- → 2.4.4
(Assignee)

Comment 11

6 years ago
Created attachment 601403 [details]
json file for testing on stage

This is equivalent to the .extra file, but formatted as JSON so we can use the submitter tool.
(Assignee)

Comment 12

6 years ago
I inserted crashes but can't complete testing quite yet, we're doing a DB import on dev and staging hbase seems to be down (bug 730986).
Status: NEW → ASSIGNED
Depends on: 730986
(Assignee)

Comment 13

6 years ago
OK here is an example:
https://crash-stats-dev.allizom.org/report/index/e80d6c06-0bdb-4072-a71a-baf5f2120229

Can someone help to make sure this looks ok? I'd like to also make sure "normal" jobs work before marking verified.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Comment 14

6 years ago
KaiRo says comment 13 looks ok, and seems to do no harm with other crashes:
https://crash-stats-dev.allizom.org/report/index/ad99fc9c-c0ab-4454-8df4-5b37d2120229
Status: RESOLVED → VERIFIED

Comment 15

6 years ago
Unfortunately, it looks like we also still have MSVC2010-generated builds coming in with bad signatures/stacks:

https://crash-stats.mozilla.com/report/list?signature=hang%20%7C%20WaitForSingleObjectEx%20%7C%20google_breakpad%3A%3AExceptionHandler%3A%3AInitialize%28std%3A%3Abasic_string%3Cwchar_t%2C%20std%3A%3Achar_traits%3Cwchar_t%3E%2C%20std%3A%3Aallocator%3Cwchar_t%3E%20%3E%20const%26%2C%20bool%20%28%2A%29%28void%2A%2C%20_EXCEPTION_POINTERS%2A%2C%20MDRawAssertionInfo%2A%29%2C%20bool%20%28%2A%29%28wchar_t%20const%2A%2C%20wc...

e.g. bp-12b703af-625d-4f51-9d48-d9e242120305 and bp-3f51f95c-ce33-4301-b96a-066b02120305
Are you sure the symbols were present at processing time? Without actually having the processing log, this looks like the symbols weren't fully uploaded/synced yet.
Oh, ignore me, the google_breakpad symbols are in xul.dll also. I don't know.
(Reporter)

Comment 18

6 years ago
Kairo: those two crashes you linked are both 64-bit (note the amd64). We don't currently have useful unwind info in Win64 symbols. That's bug 548035.
You need to log in before you can comment on or make changes to this bug.