Open Bug 1295918 Opened 8 years ago Updated 2 years ago

Include JS stacks in crash report

Tracking

()

Status:

NEW

People

(Reporter: kanru, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

WIP: Add nsStrictSystemPrincipal that only subsumes the nsSystemPrincipal 8 years ago Kan-Ru Chen [:kanru] (UTC+9) 12.90 KB, patch	bzbarsky : feedback-	Details \| Diff \| Splinter Review
Old WIP patch just for reference 8 years ago Kan-Ru Chen [:kanru] (UTC+9) 9.35 KB, patch		Details \| Diff \| Splinter Review

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Description

•

8 years ago

For crashes created in a controlled way like MOZ_CRASH, MOZ_ASSERT, NS_RUNTIMEABORT it might be worth to include the current JS stacks of the crashing thread to aid debugging. Full JS stacks might include content provided scripts which I think are PII so we should only show the stacks from chrome by default. It looks xpc_PrintJSStack allocates memory so we can't use it in OOM situation. Otherwise I assume it's quite safe to use it from MOZ_CRASH as long as we can avoid recursive MOZ_CRASH in xpc_PrintJSStack. bz, what do you think?

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 2

•

8 years ago

bz, does the usage of xpc_PrintJSStack in comment 0 sounds reasonable to you?

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 3

•

8 years ago

Yes, that seems reasonable. We should consider renaming that function to make it clear that it can be used on non-main threads too.

Flags: needinfo?(bzbarsky)

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 4

•

8 years ago

Attached patch WIP: Add nsStrictSystemPrincipal that only subsumes the nsSystemPrincipal (obsolete) — Details — Splinter Review

Hi bz, I'd like to know what do you think about creating a new principal type that only subsumes the system principal before I continue. For privacy reason we don't want to include the js stacks from content so I need to restrict the stacks reported. We can use a principal to filter the stacks but I can't use the system principal because it subsumes all other principal.

Attachment #8789746 - Flags: feedback?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 5

•

8 years ago

Comment on attachment 8789746 [details] [diff] [review] WIP: Add nsStrictSystemPrincipal that only subsumes the nsSystemPrincipal This totally breaks the conceptual model of principal subsumption, which is a transitive relationship (that is, principals form a transitively complete DAG under the "subsumes" relation). We could filter by principal equality instead of subsumes. But that's still a privacy leak because it shows what extensions are installed, no?

Attachment #8789746 - Flags: feedback?(bzbarsky) → feedback-

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 6

•

8 years ago

(In reply to Boris Zbarsky [:bz] (busy, pun intended) from comment #5) > Comment on attachment 8789746 [details] [diff] [review] > WIP: Add nsStrictSystemPrincipal that only subsumes the nsSystemPrincipal > > This totally breaks the conceptual model of principal subsumption, which is > a transitive relationship (that is, principals form a transitively complete > DAG under the "subsumes" relation). Totally understand. That was my concern too. > We could filter by principal equality instead of subsumes. But that's still > a privacy leak because it shows what extensions are installed, no? Installed extensions are already part of the public visible crash report but only contains the extension id and version. With the stacks we might leak the signature and maybe some code. I think if we want to filter that we have to filter by their compartment?

Boris Zbarsky [:bzbarsky]

Comment 7

•

8 years ago

Yeah. I think it might be fine to just filter by principal equality on system principal.

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 8

•

8 years ago

(In reply to Boris Zbarsky [:bz] (busy, pun intended) from comment #7) > Yeah. I think it might be fine to just filter by principal equality on > system principal. Done. I created a new FrameIter variant that filters by principal equality. I just realized that we want MOZ_CRASH to be 100% safe in release build so I think we can't malloc nor examine the js heap. The js stack annotation will only be useful when added manually in code or with MOZ_RELEASE_ASSERT.

(not currently active) Ted Mielczarek

Comment 9

•

8 years ago

We do use MOZ_CRASH from the OOM crash situation, so yeah, it wouldn't be safe to do it there. It should be safe to do from NS_RUNTIMEABORT (although I don't know what guarantees we have claimed about that). It would be wonderful if we could figure out a memory-safe way to gather this info so we could always include it in crash reports, but that's kind of a hard problem.

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 10

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9) > We do use MOZ_CRASH from the OOM crash situation, so yeah, it wouldn't be > safe to do it there. It should be safe to do from NS_RUNTIMEABORT (although > I don't know what guarantees we have claimed about that). It would be > wonderful if we could figure out a memory-safe way to gather this info so we > could always include it in crash reports, but that's kind of a hard problem. Can we do that in the breakpad client process? If I understand correctly breakpad will clone/create a new process and use ptrace or ReadProcessMemory to generate minidump. If we can run code in the client process, presumably it's safe to inspect the memory.

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 11

•

8 years ago

No, unfortunately. Breakpad does call clone(), but it still shares state with the crashed process, so if there's heap corruption or if we're out of memory things can still fail badly. We could do this for content processes, since we write their minidumps from the chrome process, but we'd have to be able to do it by reading process memory with ptrace/etc, not by calling functions from the target process. If we wrote something like that it would probably wind up being safe to use in the crashed process anyway. I've stumbled across related work lately. Uber released a sampling profiler for Python that uses ptrace to read the Python stack out of a Python process: http://eng.uber.com/pyflame/ and Julia Evans gave a keynote at RustConf about ruby-stacktrace, which does the same thing for Ruby processes (written in Rust, too): https://github.com/jvns/ruby-stacktrace/

Flags: needinfo?(ted)

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 12

•

8 years ago

I asked Sean and Steve today about how to reliably find the JS frames from another process. The simple answer is there no existing way to do that but there are codes and information we can use to build that tool. Sean, can you give us some detail about the best way to approach this or which people I should needinfo instead? The current requirements are: 1. When content processes crashes we want to know the JS frames Useful for investigate front-end code issues and jit crashes 2. We want to read the memory and find the info from the chrome process 3. So we need a way to find the entry point address from another process and generate the full JS stacks from there The expected results are: 1. The script filenames 2. The function names 3. The line numbers 4. The local variables if possible

Flags: needinfo?(sstangl)

(not currently active) Ted Mielczarek

Comment 13

•

8 years ago

So I was chatting with h4writer on IRC today about JS crash reporting things, and he noted that we have a GDB JS unwinder in the tree these days: https://dxr.mozilla.org/mozilla-central/source/js/src/gdb/mozilla/unwind.py I think we could take the strategy implemented there and make it work from inside Gecko on a crashed content process. I don't know if Breakpad has a good hook for this currently, but if you get a proof-of-concept working we can figure out some sensible way to get it in there.

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 14

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13) > So I was chatting with h4writer on IRC today about JS crash reporting > things, and he noted that we have a GDB JS unwinder in the tree these days: > https://dxr.mozilla.org/mozilla-central/source/js/src/gdb/mozilla/unwind.py > > I think we could take the strategy implemented there and make it work from > inside Gecko on a crashed content process. I don't know if Breakpad has a > good hook for this currently, but if you get a proof-of-concept working we > can figure out some sensible way to get it in there. Nice! This looks super useful. I'll see if I can make something work.

Flags: needinfo?(sstangl)

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 15

•

8 years ago

Attached patch Old WIP patch just for reference — Details — Splinter Review

Attachment #8789746 - Attachment is obsolete: true

(not currently active) Ted Mielczarek

Comment 16

•

8 years ago

Also: I've got a patch almost ready to go for bug 1336548 which will have us using a local fork of the Breakpad client code, so we'll be able to make whatever changes we need to it at that point. That should simplify some things where getting the API working between Gecko and Breakpad has been difficult.

Cameron McCormack (:heycam)

Comment 17

•

7 years ago

Looking at a crash report today I was just thinking how useful it would be to see the JS stack. (Also, is there a particular reason to limit it to controlled crashes and not just any crash?)

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Comment 18

•

7 years ago

(In reply to Cameron McCormack (:heycam) from comment #17) > Looking at a crash report today I was just thinking how useful it would be > to see the JS stack. (Also, is there a particular reason to limit it to > controlled crashes and not just any crash?) To summarize the current status: 1. I thought in controlled crash we might know it's safe to walk the JS stack. But we realized it's not always the case. We will want to use a external binary to do that. 2. The JS stack might include url and other PII data so we may want to restrict it to include chrome only url. But with a external JS stack walker this might be hard to do. So the best path forward will be to implement a external JS stack walker and only those has permission to see minidumps can see the JS stacks until we find a way to hide the content url.

Kan-Ru Chen [:kanru] (UTC+9)

Reporter

Updated

•

7 years ago

Summary: Include JS stacks in controlled crash report → Include JS stacks in crash report

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.