Open Bug 1591161 Opened 5 years ago Updated 2 years ago

add_task() loses original stack trace because it doesn't re-throw the server-side error

Categories

(Remote Protocol :: Agent, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

Details

With the following line for browser chrome tests, we loose the original stack of the raised error:

https://searchfox.org/mozilla-central/rev/2a355e56c490768be676689de18629485c9d699b/remote/test/browser/head.js#58

An example why this is annoying can be seen here:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=272733514&repo=autoland&lineNumber=1534-1542

I would suggest that we just log an error with the , and rethrow the original error.

Andreas or Alex, do you remember why this was done that way? I'm not feeling this is a better display given that it looses important data.

Flags: needinfo?(poirot.alex)
Flags: needinfo?(ato)
Blocks: 1590933

In the example you refer to we see an XPCOM exception. These don’t
have stacktraces so I’m not sure there’s any information loss?

I think probably we need to make RemoteAgentError.fromJSON handle
XPCOM exceptions better, because currently it only seems to cater
for JS errors.

The implementation of RemoteAgentError.fromJSON is imperfect
because CDP does not return structured error types. If we see data
loss with untangling a stringified JS error, however, we should see
that as a sign to rewrite it using structured data fields.

Additionally, there are two things that are useful to find out in head.js:58:

  1. The stack trace and line in the caller code (the bc test) that,
    after a command request, caused a server-side error to occur.

  2. The stack trace from the server, containing information about
    where in the server stack the fault occurred.

RemoteAgentError.fromJSON is an attempt at marrying these two
concerns so we don’t lose one or the other. Before you dismiss it
as useless please consider how it preserves the cause of an error:
https://searchfox.org/mozilla-central/source/remote/Error.jsm#29

Flags: needinfo?(ato)

I ran into this too with Page.captureScreenshot erroring just now:

[task 2019-10-28T19:05:59.707Z] 19:05:59     INFO - Console message: [JavaScript Error: "(null)" {file: "chrome://remote/content/Error.jsm" line: 25}]
[task 2019-10-28T19:05:59.708Z] 19:05:59     INFO - RemoteAgentError@chrome://remote/content/Error.jsm:25:5
[task 2019-10-28T19:05:59.709Z] 19:05:59     INFO - fromJSON@chrome://remote/content/Error.jsm:72:17
[task 2019-10-28T19:05:59.710Z] 19:05:59     INFO - this.add_task/<@chrome://mochitests/content/browser/remote/test/browser/head.js:59:32
[task 2019-10-28T19:05:59.711Z] 19:05:59     INFO - Async*Tester_execTest/<@chrome://mochikit/content/browser-test.js:1067:34
[task 2019-10-28T19:05:59.711Z] 19:05:59     INFO - Tester_execTest@chrome://mochikit/content/browser-test.js:1102:11
[task 2019-10-28T19:05:59.712Z] 19:05:59     INFO - nextTest/<@chrome://mochikit/content/browser-test.js:930:14
[task 2019-10-28T19:05:59.713Z] 19:05:59     INFO - SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:805:67

Something odd is going on here judging by the formatting of the first line.

Summary: add_task() looses original stack trace because it doesn't re-throw the server-side error → add_task() loses original stack trace because it doesn't re-throw the server-side error
Assignee: nobody → ato
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [puppeteer-alpha]

Easy to reproduce:

diff --git a/remote/domains/content/Page.jsm b/remote/domains/content/Page.jsm
index ccdd691b2ecf..68fafba92c94 100644
--- a/remote/domains/content/Page.jsm
+++ b/remote/domains/content/Page.jsm
@@ -68,6 +68,8 @@ class Page extends ContentProcessDomain {
   }
 
   async navigate({ url, referrer, transitionType, frameId } = {}) {
+    throw new TypeError("hei på deg!");
+
     if (frameId && frameId != this.content.windowUtils.outerWindowID) {
       throw new UnsupportedError("frameId not supported");
     }

With MOZ_HEADLESS=1 ./mach test remote/test/browser/browser_cdp.js:

 0:05.84 GECKO(31997) 1572638761626	RemoteAgent	WARN	{"name":"TypeError","message":"hei på deg!","stack":"navigate@chrome://remote/content/domains/content/Page.jsm:71:11\nexecute@chrome://remote/content/domains/Domains.jsm:99:25\nreceiveMessage@chrome://remote/content/sessions/ContentProcessSession.jsm:69:45\nMessageListener.receiveMessage*ContentProcessSession@chrome://remote/content/sessions/ContentProcessSession.jsm:27:25\n@chrome://remote/content/sessions/frame-script.js:12:1\n"}
 0:05.84 GECKO(31997) 1572638761627	RemoteAgent	ERROR	RemoteAgentError:
 0:05.84 GECKO(31997) 	RemoteAgentError@chrome://remote/content/Error.jsm:25:5
 0:05.84 GECKO(31997) 	fromJSON@chrome://remote/content/Error.jsm:72:17
 0:05.85 GECKO(31997) 	this.add_task/<@chrome://mochitests/content/browser/remote/test/browser/head.js:59:32
 0:05.85 GECKO(31997) 	Async*Tester_execTest/<@chrome://mochikit/content/browser-test.js:1069:34
 0:05.85 GECKO(31997) 	Tester_execTest@chrome://mochikit/content/browser-test.js:1104:11
 0:05.85 GECKO(31997) 	nextTest/<@chrome://mochikit/content/browser-test.js:932:14
 0:05.85 GECKO(31997) 	SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:805:67
 0:05.84 INFO Console message: [JavaScript Error: "(null)" {file: "chrome://remote/content/Error.jsm" line: 25}]
RemoteAgentError@chrome://remote/content/Error.jsm:25:5
fromJSON@chrome://remote/content/Error.jsm:72:17
this.add_task/<@chrome://mochitests/content/browser/remote/test/browser/head.js:59:32
Async*Tester_execTest/<@chrome://mochikit/content/browser-test.js:1069:34
Tester_execTest@chrome://mochikit/content/browser-test.js:1104:11
nextTest/<@chrome://mochikit/content/browser-test.js:932:14
SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:805:67

I also discovered that JS from from the child process is lossy
because we only preserve the data mandated by CDP.

Flags: needinfo?(poirot.alex)

I wrote a couple of patches yesterday to provide Error prototype
serialisation across IPC boundaries. It needs a bit of extra work
to handle XPCOM exceptions, but works by extending the CDP error
type with the following new fields:

  • name: error type name (or Error.prototype.constructor.name)
  • stack: array of stacktrace frames
  • cause: recursive optional reference to underlying error

The benefit of splitting the stacktrace into an array of frames is
the major thing we need to resolve this particular problem, because
it turns out string parsing is not always reliable (which I kind
of said when I first implemented it).

I decided to further improve the error type by passing along all
the error information we have in child processes by introducing the
cause field. We currently only return the top-most error.

Whiteboard: [puppeteer-alpha] → [puppeteer-alpha-reserve]
Whiteboard: [puppeteer-alpha-reserve] → [puppeteer-beta-mvp]
Depends on: 1606612
Depends on: 1606613
Whiteboard: [puppeteer-beta-mvp] → [puppeteer-beta-reserve]

There is no need that this should block beta.

Assignee: ato → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
Whiteboard: [puppeteer-beta-reserve]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.