Open Bug 1295918 Opened 8 years ago Updated 2 years ago

Include JS stacks in crash report

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

People

(Reporter: kanru, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

For crashes created in a controlled way like MOZ_CRASH, MOZ_ASSERT, NS_RUNTIMEABORT it might be worth to include the current JS stacks of the crashing thread to aid debugging.

Full JS stacks might include content provided scripts which I think are PII so we should only show the stacks from chrome by default.

It looks xpc_PrintJSStack allocates memory so we can't use it in OOM situation. Otherwise I assume it's quite safe to use it from MOZ_CRASH as long as we can avoid recursive MOZ_CRASH in xpc_PrintJSStack.

bz, what do you think?
bz, does the usage of xpc_PrintJSStack in comment 0 sounds reasonable to you?
Flags: needinfo?(bzbarsky)
Yes, that seems reasonable.  We should consider renaming that function to make it clear that it can be used on non-main threads too.
Flags: needinfo?(bzbarsky)
Hi bz, I'd like to know what do you think about creating a new principal type that only subsumes the system principal before I continue.

For privacy reason we don't want to include the js stacks from content so I need to restrict the stacks reported. We can use a principal to filter the stacks but I can't use the system principal because it subsumes all other principal.
Attachment #8789746 - Flags: feedback?(bzbarsky)
Comment on attachment 8789746 [details] [diff] [review]
WIP: Add nsStrictSystemPrincipal that only subsumes the nsSystemPrincipal

This totally breaks the conceptual model of principal subsumption, which is a transitive relationship (that is, principals form a transitively complete DAG under the "subsumes" relation).

We could filter by principal equality instead of subsumes.  But that's still a privacy leak because it shows what extensions are installed, no?
Attachment #8789746 - Flags: feedback?(bzbarsky) → feedback-
(In reply to Boris Zbarsky [:bz] (busy, pun intended) from comment #5)
> Comment on attachment 8789746 [details] [diff] [review]
> WIP: Add nsStrictSystemPrincipal that only subsumes the nsSystemPrincipal
> 
> This totally breaks the conceptual model of principal subsumption, which is
> a transitive relationship (that is, principals form a transitively complete
> DAG under the "subsumes" relation).

Totally understand. That was my concern too.
 
> We could filter by principal equality instead of subsumes.  But that's still
> a privacy leak because it shows what extensions are installed, no?

Installed extensions are already part of the public visible crash report but only contains the extension id and version. With the stacks we might leak the signature and maybe some code. I think if we want to filter that we have to filter by their compartment?
Yeah.  I think it might be fine to just filter by principal equality on system principal.
(In reply to Boris Zbarsky [:bz] (busy, pun intended) from comment #7)
> Yeah.  I think it might be fine to just filter by principal equality on
> system principal.

Done. I created a new FrameIter variant that filters by principal equality.

I just realized that we want MOZ_CRASH to be 100% safe in release build so I think we can't malloc nor examine the js heap. The js stack annotation will only be useful when added manually in code or with MOZ_RELEASE_ASSERT.
We do use MOZ_CRASH from the OOM crash situation, so yeah, it wouldn't be safe to do it there. It should be safe to do from NS_RUNTIMEABORT (although I don't know what guarantees we have claimed about that). It would be wonderful if we could figure out a memory-safe way to gather this info so we could always include it in crash reports, but that's kind of a hard problem.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9)
> We do use MOZ_CRASH from the OOM crash situation, so yeah, it wouldn't be
> safe to do it there. It should be safe to do from NS_RUNTIMEABORT (although
> I don't know what guarantees we have claimed about that). It would be
> wonderful if we could figure out a memory-safe way to gather this info so we
> could always include it in crash reports, but that's kind of a hard problem.

Can we do that in the breakpad client process? If I understand correctly breakpad will clone/create a new process and use ptrace or ReadProcessMemory to generate minidump. If we can run code in the client process, presumably it's safe to inspect the memory.
Flags: needinfo?(ted)
No, unfortunately. Breakpad does call clone(), but it still shares state with the crashed process, so if there's heap corruption or if we're out of memory things can still fail badly. We could do this for content processes, since we write their minidumps from the chrome process, but we'd have to be able to do it by reading process memory with ptrace/etc, not by calling functions from the target process. If we wrote something like that it would probably wind up being safe to use in the crashed process anyway.

I've stumbled across related work lately. Uber released a sampling profiler for Python that uses ptrace to read the Python stack out of a Python process:
http://eng.uber.com/pyflame/

and Julia Evans gave a keynote at RustConf about ruby-stacktrace, which does the same thing for Ruby processes (written in Rust, too):
https://github.com/jvns/ruby-stacktrace/
Flags: needinfo?(ted)
I asked Sean and Steve today about how to reliably find the JS frames from another process. The simple answer is there no existing way to do that but there are codes and information we can use to build that tool. Sean, can you give us some detail about the best way to approach this or which people I should needinfo instead?

The current requirements are:

 1. When content processes crashes we want to know the JS frames
    Useful for investigate front-end code issues and jit crashes
 2. We want to read the memory and find the info from the chrome process
 3. So we need a way to find the entry point address from another process and generate the full JS stacks from there

The expected results are:

 1. The script filenames
 2. The function names
 3. The line numbers
 4. The local variables if possible
Flags: needinfo?(sstangl)
So I was chatting with h4writer on IRC today about JS crash reporting things, and he noted that we have a GDB JS unwinder in the tree these days:
https://dxr.mozilla.org/mozilla-central/source/js/src/gdb/mozilla/unwind.py

I think we could take the strategy implemented there and make it work from inside Gecko on a crashed content process. I don't know if Breakpad has a good hook for this currently, but if you get a proof-of-concept working we can figure out some sensible way to get it in there.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13)
> So I was chatting with h4writer on IRC today about JS crash reporting
> things, and he noted that we have a GDB JS unwinder in the tree these days:
> https://dxr.mozilla.org/mozilla-central/source/js/src/gdb/mozilla/unwind.py
> 
> I think we could take the strategy implemented there and make it work from
> inside Gecko on a crashed content process. I don't know if Breakpad has a
> good hook for this currently, but if you get a proof-of-concept working we
> can figure out some sensible way to get it in there.

Nice! This looks super useful. I'll see if I can make something work.
Flags: needinfo?(sstangl)
Attachment #8789746 - Attachment is obsolete: true
Also: I've got a patch almost ready to go for bug 1336548 which will have us using a local fork of the Breakpad client code, so we'll be able to make whatever changes we need to it at that point. That should simplify some things where getting the API working between Gecko and Breakpad has been difficult.
Looking at a crash report today I was just thinking how useful it would be to see the JS stack.  (Also, is there a particular reason to limit it to controlled crashes and not just any crash?)
(In reply to Cameron McCormack (:heycam) from comment #17)
> Looking at a crash report today I was just thinking how useful it would be
> to see the JS stack.  (Also, is there a particular reason to limit it to
> controlled crashes and not just any crash?)

To summarize the current status: 

 1. I thought in controlled crash we might know it's safe to walk the JS stack. But we realized it's not always the case. We will want to use a external binary to do that.

 2. The JS stack might include url and other PII data so we may want to restrict it to include chrome only url. But with a external JS stack walker this might be hard to do.

So the best path forward will be to implement a external JS stack walker and only those has permission to see minidumps can see the JS stacks until we find a way to hide the content url.
Summary: Include JS stacks in controlled crash report → Include JS stacks in crash report
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: