Dead CPOW errors should have actual stack / file / line information

RESOLVED FIXED in Firefox 51

Status

()

P1
normal
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: Gijs, Assigned: wcpan)

Tracking

Trunk
mozilla51
Points:
---

Firefox Tracking Flags

(e10s+, firefox47 affected, firefox51 fixed)

Details

(Whiteboard: e10st?)

Attachments

(3 attachments)

(Reporter)

Description

3 years ago
Cf. bug 1014185:

73 INFO TEST-UNEXPECTED-FAIL | browser/components/customizableui/test/browser_876944_customize_mode_create_destroy.js | Uncaught exception - at :0 - Error: operation not possible on dead CPOW 

It doesn't look like that failure is deterministic, and it happened in various other tests with that patchset (but said patchset got backed out). This is annoying for developers who now have approximately 0 idea as to what caused this to happen and where it happened or how to prevent it / what they need to fix.
tracking-e10s: --- → ?
(Reporter)

Comment 1

3 years ago
Mike, who knows the IPC machinery involved here? Looking at https://dxr.mozilla.org/mozilla-central/source/js/ipc/JavaScriptShared.cpp#498 it's not obvious to me (given I'm not familiar with JS) why that wouldn't get a "normal" stack trace when it's reporting an exception...
Flags: needinfo?(mconley)

Comment 2

3 years ago
Naveed, do you have somebody who could help with this?
Component: IPC → DOM: Content Processes
Flags: needinfo?(nihsanullah)
(In reply to :Gijs Kruitbosch from comment #1)
> Mike, who knows the IPC machinery involved here? Looking at
> https://dxr.mozilla.org/mozilla-central/source/js/ipc/JavaScriptShared.
> cpp#498 it's not obvious to me (given I'm not familiar with JS) why that
> wouldn't get a "normal" stack trace when it's reporting an exception...

The low-level CPOW machinery definitely falls into billm's wheelhouse, though I believe mrbkap knows this stuff too. Myself, not so much.
Flags: needinfo?(mconley)
Whiteboard: dom-triaged btpp-triage-followup-2016-02-26
Yeah, not a great error message. The error is being thrown in the child process and then passed up to the parent. The stack we use is the one from the exception in the child, and there's nothing on the stack in the child.

I guess we could try to recognize exception objects in the child and replace their stacks in the parent. It would be a decent amount of work though. Usually when I see these errors I just rewrite the test to avoid CPOWs.
Flags: needinfo?(nihsanullah)
(In reply to Bill McCloskey (:billm) from comment #4)
> Usually when I see these errors I just rewrite the test to avoid CPOWs.

That's exactly what the error message should help you with. There weren't any CPOWs in the code and tests I touched. The tests that started failing didn't directly have any either, but a helper function in head.js did. I figured this out now, but it was far from convenient and could easily have taken me longer.
Comment hidden (spam)
blocking-b2g: 2.2? → ---
tracking-b2g: backlog → ---
(Reporter)

Comment 7

3 years ago
(In reply to Bill McCloskey (:billm) from comment #4)
> I guess we could try to recognize exception objects in the child and replace
> their stacks in the parent. It would be a decent amount of work though.

Naively, I'm confused why this would be complex - clearly it's already handing off the errors in some way, so we must be catching them somewhere in the CPOW machinery - AIUI reporting them in the child process with Cu.reportError or its C++ equivalent (not sure what this part of the shims is written in) would do the trick, no?

Where does the code that hands off these errors live?
Flags: needinfo?(wmccloskey)
(Reporter)

Comment 8

3 years ago
(In reply to Dão Gottwald [:dao] from comment #5)
> (In reply to Bill McCloskey (:billm) from comment #4)
> > Usually when I see these errors I just rewrite the test to avoid CPOWs.
> 
> That's exactly what the error message should help you with. There weren't
> any CPOWs in the code and tests I touched. The tests that started failing
> didn't directly have any either, but a helper function in head.js did. I
> figured this out now, but it was far from convenient and could easily have
> taken me longer.

FWIW, I also expect that this stuff is sufficiently opaque/strange that it's going to bite add-on authors as well, who won't be as knowledgeable about our codebase as you or Dão, making the lack of useful stack info worse.
(In reply to :Gijs Kruitbosch from comment #7)
> (In reply to Bill McCloskey (:billm) from comment #4)
> > I guess we could try to recognize exception objects in the child and replace
> > their stacks in the parent. It would be a decent amount of work though.
> 
> Naively, I'm confused why this would be complex - clearly it's already
> handing off the errors in some way, so we must be catching them somewhere in
> the CPOW machinery - AIUI reporting them in the child process with
> Cu.reportError or its C++ equivalent (not sure what this part of the shims
> is written in) would do the trick, no?
> 
> Where does the code that hands off these errors live?

These errors are processed just like normal objects since in JS you can throw anything. 
https://dxr.mozilla.org/mozilla-central/source/js/ipc/WrapperOwner.cpp#1044
The idea is that the object being thrown could be some complex thing and not just a string or Error.

I guess we could recognize that the object is an Error in the child similar to what we do for StopIteration. Then we could send up the message alone. In the parent we would call JS_ReportError or something. I think that could work, although it might not cover everything.

I could work on this but I don't think I'll be able to get to it for a little while. I can mentor someone if they want to pick it up first.
Flags: needinfo?(wmccloskey)

Updated

3 years ago
Blocks: 984139
tracking-e10s: ? → +
Priority: -- → P1
> I could work on this but I don't think I'll be able to get to it for a
> little while. I can mentor someone if they want to pick it up first.

Any suggestions for victims?
Flags: needinfo?(wmccloskey)
Whiteboard: dom-triaged btpp-triage-followup-2016-02-26 → btpp-triage-followup-2016-03-04
Jim, can you find someone to take this?
Flags: needinfo?(wmccloskey) → needinfo?(jmathies)
Whiteboard: btpp-triage-followup-2016-03-04 → btpp-triage-followup-2016-03-11

Comment 12

3 years ago
This is currently prioritized as P1 post initial rollout so it'll find an owner soonish.
Flags: needinfo?(jmathies)

Updated

3 years ago
No longer blocks: 984139
(In reply to Andrew Overholt [:overholt] from comment #11)
> Jim, can you find someone to take this?

I can take a look.
Assignee: nobody → gkrizsanits
Hey, stack and console message... that's not an easy topic.

I think we end up here:
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#500
JavaScriptShared::findObjectByI
JS_ReportError
js::ReportErrorVA
ReportError
js::CallErrorReporter
xpc::SystemErrorReporter

And in SystemErrorReporter, for some reason, we are either not able to retrieve a stack, or, we are not able to get the global object, or the global object isn't a Window.
Because of the console cache, for now, we only pass stacks for console message related to documents, so that we can purge the console cache once a document is collected. We would have to ensure doing the same cleanup for all others kind of global, which sounds complex.

Anyway, if the issue is that the global ends up not being a window (i.e. the exception comes from something else than a chrome/content window script), this is bug 1237904. You will see interesting conversation about this in bug 1208641.

Note that the stack support has been introduced in bug 814497.
Not having a window would mean not having the whole stack, but the top frame info (file, line, column) should be available in all cases, right?
Hum I focused my code comprehension on stack object. top frame info are very different and most likely set earlier, or not set at all.
I would say it isn't set whereas it could be, here, in js::ReportErrorVA:
  https://dxr.mozilla.org/mozilla-central/source/js/src/jscntxt.cpp#416

Just guessing. I don't have a gdb up and running with this exception.
I would say it is missing from the JSErrorReport.

Updated

2 years ago
Assignee: gkrizsanits → nobody
Whiteboard: btpp-triage-followup-2016-03-11 → e10st?
I think I can take a look.

:billm, I'm wondering how do we write a testcase to trigger dead CPOW?

Thanks.
Flags: needinfo?(wmccloskey)
I think I can whip up a testcase. One sec.
Flags: needinfo?(wmccloskey)
Created attachment 8772155 [details] [diff] [review]
Dead CPOW testcase

MozReview-Commit-ID: IrSChyl6HOx
Assignee: nobody → mconley
Status: NEW → ASSIGNED
Silly bzexport - didn't mean to assign to myself.

So ting, if you run that test, it'll include my dead CPOW test. You should be able to modify the list of tests in cpows_child.js to just run my test (though I think you have to run error_reporting_test as well or the parent will complain before the test exits).

I hope that helps!
Assignee: mconley → nobody
Status: ASSIGNED → NEW
This helps a lot, thanks!
Assignee: nobody → wpan
Comment on attachment 8774694 [details]
Bug 1249698 - Use parent stack instead of vanished child stack.

Accidently submitted this WIP, I'm sorry.

I think I can rewrite this, so that it will works like StopIteration.

All I need to do is to create a similar prototype and functions, right?
Attachment #8774694 - Flags: review?(wmccloskey)
Comment on attachment 8774694 [details]
Bug 1249698 - Use parent stack instead of vanished child stack.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/67148/diff/1-2/
Attachment #8774694 - Flags: review?(wmccloskey)
Somehow only ClassName does not return a ReturnStatus, should we add it to the message?
Flags: needinfo?(wmccloskey)
Attachment #8774694 - Flags: review?(wmccloskey) → review+
Attachment #8774693 - Flags: review?(wmccloskey) → review+
Keywords: checkin-needed

Comment 30

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/893aa91455de
Dead CPOW testcase. r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/090157427509
Use parent stack instead of vanished child stack. r=billm
Keywords: checkin-needed
(In reply to Wei-Cheng Pan [:wcpan] [:wcp] [:legnaleurc] from comment #27)
> Somehow only ClassName does not return a ReturnStatus, should we add it to
> the message?

JSAPI doesn't support returning errors from this method, so it wouldn't help.
Flags: needinfo?(wmccloskey)

Comment 32

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/893aa91455de
https://hg.mozilla.org/mozilla-central/rev/090157427509
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox51: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.