Replace the Breakpad-based minidump_stackwalk toolchain binary with rust-minidump's one
Categories
(Toolkit :: Crash Reporting, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox96 | --- | fixed |
People
(Reporter: gsvelto, Assigned: Gankra)
References
(Blocks 2 open bugs)
Details
Attachments
(3 files, 1 obsolete file)
As per title, it's a matter of replacing the toolchain task that builds it and adjusting mozcrash.py to pass the --human
parameter so that we get human-readable output instead of JSON.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Some notes:
- need to pass --human parameter
- output must conform to this regex (it does)
- must properly append --symbols-url to the url arg (the minidump_stackwalk binary used here implicitly accepts them in the position of symbols_path args, which I do not believe we do, and I do not want to support)
- Note that although the symbols_path argument to mozcrash.py does take other URLs, those are pre-processed away by _get_symbols, so we only need to handle this one case.
- These scripts expect a binary named
minidump_stackwalk
, which rust-minidump also produces, but I have the crate published on crates.io asminidump-stackwalk
, so that's whatcargo install
will name it. May need to do some renaming depending on how we build it.
Comment 2•3 years ago
|
||
I'm working on fixing bug #1594515 which touches minidump_stackwalk
and mozcrash. If you want me to add some future-proofing to ease some pain, let me know.
Assignee | ||
Comment 3•3 years ago
|
||
Oh that's interesting context. I'm confused by the discussion of a required "temp" directory argument since I don't see that in mozcrash.py. (offer appreciated, don't see anything that needs future-proofing though)
Assignee | ||
Comment 4•3 years ago
|
||
This is the first step in replacing a huge pile of our breakpad-based infra
with our new implementation (rust-minidump). This stackwalker is only used
for reporting crashes in local builds and CI, so it's a good first deploy.
Although most of the work on rust-minidump has been focused on the JSON output,
this uses the --human output, because it's primarily intended for humans to
directly read. There is however some minor parsing done on this format. This
is not strictly supported by --human (it has no schema) but it's not something
we plan to break. (This parsing is pre-existing, just recording the facts.)
The new build configs/scripts are hybridized from fix-stacks and dump_syms,
as this basically is a hybrid of the two. In particular it needs the openssl
vendoring tricks that dump_syms uses, but is a target binary that prefers
win32 over win64 (like fix-stacks).
Technically a regression but probably just culling legacy cruft at this point:
this patchset removes support for building a local copy of minidump-stackwalk
from source. You must now download a copy built on task-cluster using mozboot.
mozboot already did this, which is why this feature appears to be legacy cruft
-- there was little reason to build a local copy.
However rust-minidump's minidump-stackwalk has a far better portability story,
so you can build+install your own local copy by just running:
cargo install minidump-stackwalk
Assignee | ||
Comment 5•3 years ago
|
||
This version of minidump-stackwalk is now replaced with rust-minidump's
minidump-stackwalk, which we build from a FETCH. Not touching the other
stuff in this directory because I have no idea what it is.
Depends on D131315
Assignee | ||
Comment 6•3 years ago
|
||
Assignee | ||
Comment 7•3 years ago
|
||
that build seems to be caching the minidump-stackwalk builds, so here's the last try build that touched them: https://treeherder.mozilla.org/jobs?repo=try&revision=2cbb95ed2252de120f685b4829af6244cf422b2d
Assignee | ||
Comment 8•3 years ago
|
||
you can see it working in production in this even-earlier-try build's logs: https://treeherder.mozilla.org/logviewer?job_id=358264754&repo=try&lineNumber=21679-21703
Assignee | ||
Comment 9•3 years ago
|
||
Depends on D131316
Updated•3 years ago
|
Assignee | ||
Comment 10•3 years ago
|
||
Squashed the 64-bit stuff into the first commit after address review and doing some refactoring. I had broken breakpad-injector which is now a bit of an unrelated oddball.
New try push: https://treeherder.mozilla.org/jobs?repo=try&revision=86f8286d0da5a3cf2add65120ff6868990431389
Assignee | ||
Comment 11•3 years ago
|
||
Notice to Sheriffs and Tooling Folks
I am now pushing this change to try
's crash handler (and the one used for local builds). Ideally it won't change behaviour (meaningfully). So if it is behaving radically, different, please let me know! (Even if the change is positive, that's great to know!)
Specifically, this completely rewrites the tool that is used to produce a backtrace from a minidump when a test unexpectedly crashes (invoking mozcrash.py). This tool is used to generate this kind of line in the try frontend:
PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)]
And this information in the logs:
INFO - mozcrash Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe --human C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp Z:\task_163718277621850\build\symbols --symbols-url=https://symbols.mozilla.org/
INFO - mozcrash Saved minidump as Z:\task_163718277621850\build\blobber_upload_dir\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
INFO - PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)]
INFO - Crash dump filename: C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
INFO - Operating system: Windows NT
INFO - 10.0.19041
INFO - CPU: amd64
INFO - family 6 model 85 stepping 7
INFO - 8 CPUs
INFO -
INFO - Crash reason: EXCEPTION_BREAKPOINT
INFO - Crash address: 0x7ff8ae0df019
INFO - Process uptime: 2 seconds
INFO -
INFO - Thread 3 TaskController #2 (crashed)
INFO - 0 xul.dll!static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool) [SurfaceCache.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 1831 + 0x40]
INFO - rax = 0x00007ff8b541d9f9 rdx = 0x0000000000000000
INFO - rcx = 0x00007ff8e277c978 rbx = 0x0000000000000001
INFO - rsi = 0x00000041a977f350 rdi = 0x00000177b0f6f740
INFO - rbp = 0x00000177aa172130 rsp = 0x00000041a977f2b0
INFO - r8 = 0x00000041a977f820 r9 = 0x00007ff8ea530000
INFO - r10 = 0x00007ff8ea582651 r11 = 0x00000041a977ec70
INFO - r12 = 0x00007ff8e26e9630 r13 = 0x00000177b3a15120
INFO - r14 = 0x00000177b0f970d8 r15 = 0x00000177b28a2a00
INFO - rip = 0x00007ff8ae0df019
INFO - Found by: given as instruction pointer in context
INFO - 1 xul.dll!mozilla::image::DecodedSurfaceProvider::FinishDecoding() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 200 + 0x37]
INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130
INFO - rsp = 0x00000041a977f310 r12 = 0x00007ff8e26e9630
INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8
INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0dbf
INFO - Found by: call frame info
INFO - 2 xul.dll!mozilla::image::DecodedSurfaceProvider::Run() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 129 + 0x7]
INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130
INFO - rsp = 0x00000041a977f390 r12 = 0x00007ff8e26e9630
INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8
INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0929
INFO - Found by: call frame info
...
Only the new implementation will include the --human
flag in the Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe
line, if you aren't sure which implementation is being used. However I believe this line is omitted if fix-stacks.py is involved. In that case, you can use the fact that the backtraces will end with "unimplemented streams", a feature unique to the new tool:
INFO - Unimplemented streams encountered:
INFO - Stream 0x00000000 UnusedStream (Official) @ 0x00000000
INFO - Stream 0x00000015 SystemMemoryInfoStream (Official) @ 0x00002d98
INFO - Stream 0x00000016 ProcessVmCountersStream (Official) @ 0x00002f84
Nitty Gritty Details
I am specifically replacing the implementation of minidump_stackwalk used by mozcrash.py with our new rust-based implementation.
This implementation has been running on the staging servers for crash-stats since friday, and is slated to become the new crash-stats backend once we've done enough testing to be confident in that deployment. Deploying it to try
is another step in building our confidence in the new implementation.
On paper, this should be a significant upgrade to try's minidump-stackwalk, in that the one that is currently being used is a weird unmaintained fork of what's used on crash-stats. The new one should be faster and more reliable, and will be easy to update by just changing the commit in its toolchain fetch:
rust-minidump:
description: rust-minidump source code (for minidump-stackwalk)
fetch:
type: git
repo: https://github.com/luser/rust-minidump/
revision: 0c90e02544797317503d1c4cff8daab0cabdea86
However this will introduce some changes to how minidump-stackwalk is built:
- minidump-stackwalk no longer builds from checked-in source, so it will only need to be built if the toolchain fetch is updated (less builds and churn, yay!)
- this introduces a currently orphaned win64-minidump-stackwalk build, for future use in solving Bug 1410840. That can be removed if having an orphan tool sets off some annoying warnings/errors for the sheriffs.
- there was technically some wiring in
mach
to support building minidump-stackwalk locally. this no longer works, as there is no local source to use. mozboot was already downloadingminidump_stackwalk
for you, so it's unlikely this will affect anyone's workflow. If for whatever reason you want to build your own copy of minidump-stackwalk, you cancargo install minidump-stackwalk
, or checkout the rust-minidump tree and build it withcargo build --release
(that's all our build servers do!) (NB the binary is officiallyminidump-stackwalk
but we rename it tominidump_stackwalk
in CI to avoid needless churn)
Crash Analysis Power User Features
The new minidump-stackwalk should build and run locally on all mainstream platforms without any issue (I develop and test it in Windows Powershell!). I have done my best to write user documentation and the process for analyzing a firefox minidump is streamlined:
> minidump-stackwalk --symbols-url=https://symbols.mozilla.org/ /path/to/minidump.dmp
Because it is designed to be the backend for crash-stats (aka socorro), it can do ~all the analysis you expect there (more details are surfaced in the default JSON output than the human one). To help me test this, I have also created a tool for downloading minidump/annotations from crash-stats and compairing local minidump-stackwalk output to the values on crash-stats: socc-pair. While the comparison machinery is probably not necessarily useful to you, as a side-effect of its purpose it automates fetching all the details of a crash and running local analysis, and saves the results to files you can search through for details:
Output Files:
* Minidump: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.dmp
* Socorro Processed Crash: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.json
* Raw JSON: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.raw.json
* Local minidump-stackwalk Output: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.local.json
* Local minidump-stackwalk Logs: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.log.txt
NOTE: these files can contain protected user information. Although they are written to your system's default "temp", I recommend deleting the temp
socc-pair
directory regularly to ensure compliance with our protected data policies.
Notably this includes all of the logging rust-minidump did (in the .log.txt
), including tracing for the backtracer's analysis, which can help you debug strange backtraces:
[TRACE] unwind: starting stack unwind
[TRACE] unwind: unwinding NtGetContextThread
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind: .cfa: $rsp 8 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x000000ec00000000, caller_sp: 0x000000ec7fbfd790
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: unwinding 1013612281855
[TRACE] unwind: trying cfi
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd172c2a24, caller_sp: 0xec7fbfd7f8
[TRACE] unwind: unwinding <unknown in ntdll.dll>
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd162b7034, caller_sp: 0xec7fbfd828
[TRACE] unwind: unwinding BaseThreadInitThunk
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind: .cfa: $rsp 48 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x0000000000000000, caller_sp: 0x000000ec7fbfd858
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: instruction pointer was nullish, assuming unwind complete
[TRACE] unwind: finished stack unwind
If you are using minidump-stackwalk directly, you can get these same logs with --verbose=trace
. --output-file=x/y/z
and --log-file=x/y/z
arguments are included to make it easier to pipe these streams to files.
Comment 12•3 years ago
|
||
Pushed by abeingessner@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b0107c4d5274 Use rust-minidump's minidump-stackwalk for mozcrash.py r=gsvelto,KrisWright https://hg.mozilla.org/integration/autoland/rev/7b95b224608b remove tools/crashreporter/minidump_stackwalk r=gsvelto
Comment 13•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b0107c4d5274
https://hg.mozilla.org/mozilla-central/rev/7b95b224608b
Comment 14•2 years ago
|
||
Pushed by geoff@darktrojan.net: https://hg.mozilla.org/comm-central/rev/fc125e910ec9 Port bug 1741205 - Add missing fetch job linux64-rust-windows-1.56. rs=bustage-fix
Reporter | ||
Comment 15•2 years ago
|
||
Please send the info in comment 11 to the dev-platform and stability mailing lists for extra visibility.
Assignee | ||
Comment 16•2 years ago
|
||
done for dev-platform, stability seems to not like me (someone with perms, feel free to forward)
Assignee | ||
Comment 17•2 years ago
|
||
I forgot that mozboot only pulls in updates when you run ./mach bootstrap
,
so some people got the new mozcrash.py locally without actually having the
new rust-minidump-based version. So now we first run the stackwalk binary
with -V to check what version it is.
The rest of the details can be found in the comments I added.
Also updates rust-minidump to 0.9.6 get some CLI parsing fixes
and better --help documentation (socorro staging is already updated to
this version).
Comment 18•2 years ago
|
||
Pushed by abeingessner@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/550be0d3b535 add more robust version checking to mozcrash.py. r=KrisWright
Comment 19•2 years ago
|
||
bugherder |
Description
•