Closed Bug 1742864 Opened 4 years ago Closed 3 years ago

Crash in [@ mozilla::FramePointerStackWalk]

Categories

(Core :: mozglue, defect)

All
macOS
defect

Tracking

()

RESOLVED FIXED
101 Branch
Tracking Status
firefox-esr91 --- disabled
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- disabled
firefox99 --- disabled
firefox100 --- disabled
firefox101 --- fixed

People

(Reporter: cpeterson, Assigned: glandium)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

These crashes might be related to PHC being enabled in macOS Nightlies in mid September (bug 1576515). Only Beta and Nightly builds are affected. We have about 150 crash reports from macOS and only 3 from Windows 7 and 1 from Linux.

Crash report: https://crash-stats.mozilla.org/report/index/36ff00d6-f49c-4776-ba6d-5fe630211124

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Top 3 frames of crashing thread:

0 libmozglue.dylib mozilla::FramePointerStackWalk mozglue/misc/StackWalk.cpp:926
1 libmozglue.dylib replace_free memory/replace/phc/PHC.cpp:1328
2 libmozglue.dylib replace_free memory/replace/phc/PHC.cpp:1328
Has Regression Range: --- → yes

Mike, should we consider this as S2?

Flags: needinfo?(mh+mozilla)

There are possibly up to 3 bugs hidden in here:

  • A bug in the crash report processing, which isn't able to get a proper stack. Whether the stack is entirely garbage or not is left to determine. If the stack is garbage, there might be something else going on that thrashes the stack.
  • A bug in the frame pointer stack walker, which shouldn't be crashing
  • A double-free that PHC may catch later, but we don't know. Because we get a stack trace in both the valid and invalid case, we don't know if we are in the invalid case.

In any case, this only happens on builds with PHC enabled (beta/nightly). This might also happen with the profiler enabled on release, but it doesn't look like to be from looking at random crash reports (they all go through free).

Note that the automation crash in comment 1 is interesting in that it suggests independent things happening between the first two bugs above, because the minidump stackwalk did succeed, while the frame pointer stack walker crashed. Looking into that one, it's actually somewhat different, in that the frame pointer stack walker simply isn't able to handle the situation because there's a function from a system library in the trace that doesn't have frame pointers (which is the case on amd64 mac). Obviously, the in-process stack walker should not crash in that case. Arm64 mac, where most of the crashes this was filed for, do have frame pointers for system libraries, so this would be yet another different issue.

Looking at a random arm64 mac one, something is busted in the stack. The first frame pointer seems to be pointing to 0x10 before what might be the next frame. This doesn't seem too good, but it's also hard to tell more from a casual look.

Flags: needinfo?(mh+mozilla)
Assignee: nobody → mh+mozilla
Status: NEW → ASSIGNED

For the record:

A bug in the crash report processing, which isn't able to get a proper stack. Whether the stack is entirely garbage or not is left to determine. If the stack is garbage, there might be something else going on that thrashes the stack.

The frame pointers in the stack /are/ corrupted to some extent and can't be relied upon, for whatever reason. There doesn't seem to be a particular bug in the crash report processing. There might be a bug whereby frame pointers are being thrashed by some code, but it's entirely possible normal execution would recover from that. I guess we'll see after the patch lands.

A bug in the frame pointer stack walker, which shouldn't be crashing

This is what the patch addresses.

A double-free that PHC may catch later, but we don't know. Because we get a stack trace in both the valid and invalid case, we don't know if we are in the invalid case.

Once the patch lands, we should get PHC reports if those end up being double-frees.

Component: Memory Allocator → mozglue
Hardware: x86_64 → All
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/192cc007fef6 Sanitize the input frame pointer given to DoFramePointerStackWalk. r=gerald

Backed out for xpcshell failure on test_dmd.js

Backout link: https://hg.mozilla.org/integration/autoland/rev/a6cb8b0bfb1dfc2f4f01be4517979411b65f5386
Log link: https://treeherder.mozilla.org/logviewer?job_id=375468321&repo=autoland&lineNumber=3855

There was also a failure on netwerk/test/unit/test_http3_early_hint_listener.js

Flags: needinfo?(mh+mozilla)
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/52d2bf9892fc Sanitize the input frame pointer given to DoFramePointerStackWalk. r=gerald
Flags: needinfo?(mh+mozilla)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: