Crash in [@ mozilla::FramePointerStackWalk]
Categories
(Core :: mozglue, defect)
Tracking
()
People
(Reporter: cpeterson, Assigned: glandium)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
These crashes might be related to PHC being enabled in macOS Nightlies in mid September (bug 1576515). Only Beta and Nightly builds are affected. We have about 150 crash reports from macOS and only 3 from Windows 7 and 1 from Linux.
Crash report: https://crash-stats.mozilla.org/report/index/36ff00d6-f49c-4776-ba6d-5fe630211124
Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Top 3 frames of crashing thread:
0 libmozglue.dylib mozilla::FramePointerStackWalk mozglue/misc/StackWalk.cpp:926
1 libmozglue.dylib replace_free memory/replace/phc/PHC.cpp:1328
2 libmozglue.dylib replace_free memory/replace/phc/PHC.cpp:1328
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 3•3 years ago
|
||
There are possibly up to 3 bugs hidden in here:
- A bug in the crash report processing, which isn't able to get a proper stack. Whether the stack is entirely garbage or not is left to determine. If the stack is garbage, there might be something else going on that thrashes the stack.
- A bug in the frame pointer stack walker, which shouldn't be crashing
- A double-free that PHC may catch later, but we don't know. Because we get a stack trace in both the valid and invalid case, we don't know if we are in the invalid case.
In any case, this only happens on builds with PHC enabled (beta/nightly). This might also happen with the profiler enabled on release, but it doesn't look like to be from looking at random crash reports (they all go through free).
Note that the automation crash in comment 1 is interesting in that it suggests independent things happening between the first two bugs above, because the minidump stackwalk did succeed, while the frame pointer stack walker crashed. Looking into that one, it's actually somewhat different, in that the frame pointer stack walker simply isn't able to handle the situation because there's a function from a system library in the trace that doesn't have frame pointers (which is the case on amd64 mac). Obviously, the in-process stack walker should not crash in that case. Arm64 mac, where most of the crashes this was filed for, do have frame pointers for system libraries, so this would be yet another different issue.
Looking at a random arm64 mac one, something is busted in the stack. The first frame pointer seems to be pointing to 0x10 before what might be the next frame. This doesn't seem too good, but it's also hard to tell more from a casual look.
| Assignee | ||
Comment 4•3 years ago
|
||
Updated•3 years ago
|
| Assignee | ||
Comment 5•3 years ago
|
||
For the record:
A bug in the crash report processing, which isn't able to get a proper stack. Whether the stack is entirely garbage or not is left to determine. If the stack is garbage, there might be something else going on that thrashes the stack.
The frame pointers in the stack /are/ corrupted to some extent and can't be relied upon, for whatever reason. There doesn't seem to be a particular bug in the crash report processing. There might be a bug whereby frame pointers are being thrashed by some code, but it's entirely possible normal execution would recover from that. I guess we'll see after the patch lands.
A bug in the frame pointer stack walker, which shouldn't be crashing
This is what the patch addresses.
A double-free that PHC may catch later, but we don't know. Because we get a stack trace in both the valid and invalid case, we don't know if we are in the invalid case.
Once the patch lands, we should get PHC reports if those end up being double-frees.
| Assignee | ||
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Backed out for xpcshell failure on test_dmd.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/a6cb8b0bfb1dfc2f4f01be4517979411b65f5386
Log link: https://treeherder.mozilla.org/logviewer?job_id=375468321&repo=autoland&lineNumber=3855
There was also a failure on netwerk/test/unit/test_http3_early_hint_listener.js
| Assignee | ||
Updated•3 years ago
|
Comment 9•3 years ago
|
||
| bugherder | ||
Updated•3 years ago
|
Description
•