Closed Bug 1175116 Opened 7 years ago Closed 6 years ago

Stacktrace missing from test report because "Crash detected but error running stackwalk" although no crash occurred

Categories

(Testing Graveyard :: JSMarionette, defect)

defect
Not set
normal

Tracking

(feature-b2g:2.6+)

RESOLVED FIXED
feature-b2g 2.6+

People

(Reporter: ato, Assigned: aus)

References

Details

(Whiteboard: [MJS] [CI])

Attachments

(2 files)

When running the Gaia integration tests, stacktraces are not included in the test report that mocha produces, as shown by this excerpt from the attached log:

  3 passing (55s)
  5 failing

  1) Input with Keyboard APP Type abc:
     
  Crash detected but error running stackwalk
  
  

  2) Input with Keyboard APP Type Abc in textarea:
     
  Crash detected but error running stackwalk
  
  

  3) Input with Keyboard APP Type number and then alphabet:
     
  Crash detected but error running stackwalk
  
  

  4) Input with Keyboard APP Type alphabet and then number:
     
  Crash detected but error running stackwalk
  
  

  5) Input with Keyboard APP tap space bar and then wait for a while before tapping again:
     
  Crash detected but error running stackwalk

This string originates from tests/jsmarionette/runner/marionette-js-runner/host/session.js:35 which calls parseCrashInfoWithStackwalkError with the `info` variable that is `undefined`.
Attached file missing_stacktrace
Duplicate of this bug: 1190872
Duplicate of this bug: 1195230
Duplicate of this bug: 1184815
From what I saw, I consistently got this with test errors like tapping something not visible. As Johan said in 1195230, we can catch the error using a try/catch block with a console.log in the catch block.

Hey Gareth, is it a useful information for you ? This is my number one frustration when working with JSMarionette.
Flags: needinfo?(gaye)
I got the issue in bug 1205907 with the error 'NoSuchElement: Unable to locate element: #email_1'.
(In reply to Julien Wajsberg [:julienw] from comment #5)
> From what I saw, I consistently got this with test errors like tapping
> something not visible. As Johan said in 1195230, we can catch the error
> using a try/catch block with a console.log in the catch block.

I use assert.ok(false, error.stack) in the catch(). As a work around, of course.
Gareth, actually, the full e.message is:

NoSuchElement: Unable to locate element: #email_1
Remote Stack:
<none>


Maybe the empty "Remote Stack" is what produces the issue ?
Maybe Aus can help here as well.
Flags: needinfo?(aus)
(In reply to Julien Wajsberg [:julienw] from comment #9)
> Maybe Aus can help here as well.

Hi Julien, the problem here is that we're not using mozcrash inside of mozrunner yet. This should, in general, be a real Mulet or b2g runtime crash. 

Here's where we're tracking work on porting over to a new service which will support this properly: https://bugzilla.mozilla.org/show_bug.cgi?id=1214285

We're hoping that we can migrate to this soon enough that it's not worth fixing the python side since we're ultimately deprecating it's use.

Let me know if that's not the case, we can probably work out how to at least get the crash report from the tmp profile that's used. I was helping out :mhenretty earlier with this. You can ask him how my advice worked out. :)
Flags: needinfo?(gaye)
Flags: needinfo?(aus)
Aus, how do you explain that try/catching the code actually works if this is a real crash ? My wild guess here is that it's not a real crash but for some reason the code that runs Mulet thinks it is.
Flags: needinfo?(aus)
(In reply to Julien Wajsberg [:julienw] from comment #11)
> Aus, how do you explain that try/catching the code actually works if this is
> a real crash ? My wild guess here is that it's not a real crash but for some
> reason the code that runs Mulet thinks it is.

I think I confounded two different problems together. I haven't seen this particular issue myself but I've certainly heard of it. I think it's causing some of the intermittents we're chasing down.

It does seem like this isn't a real crash, but we think it is because there's a lack of remote javascript call stack. It's not an unreasonable assumption but it would seem it's wrong.
Flags: needinfo?(aus)
No longer depends on: 1231779
(In reply to Aus Lacroix [:aus] from comment #12)
> It does seem like this isn't a real crash, but we think it is because
> there's a lack of remote javascript call stack.

I found a consistent way, similar to what happened with the accessibility issue spotted a week ago:
In one of the dialer test, add: 
> this.client.executeScript(function() {
>   window.wrappedJSObject.CallLogDBManager.add(nonExistingVar);
> });

If you get a logcat, you'll find this error message:
> 12-15 15:51:44.259  7603  7603 I Gecko   : 1450191104270	Marionette	DEBUG	conn2 <- Response {id: 51, error: {"error":"javascript
> error","message":"ReferenceError: nonExistingVar is not defined","stacktrace":"\ninline javascript, line 65\nsrc: \"         
> window.wrappedJSObject.CallLogDBManager.add(nonExistingVar);\"\nStack:\n__marionetteFunc/<@dummy file:65:11\n__marionetteFunc@dummy file:64:1\n@dummy
> file:66:35\n"}, body: null}
(In reply to Johan Lorenzo [:jlorenzo] (QA) from comment #13)
> If you get a logcat, you'll find this error message:
> > 12-15 15:51:44.259  7603  7603 I Gecko   : 1450191104270	Marionette	DEBUG	conn2 <- Response {id: 51, error: {"error":"javascript
> > error","message":"ReferenceError: nonExistingVar is not defined","stacktrace":"\ninline javascript, line 65\nsrc: \"         
> > window.wrappedJSObject.CallLogDBManager.add(nonExistingVar);\"\nStack:\n__marionetteFunc/<@dummy file:65:11\n__marionetteFunc@dummy file:64:1\n@dummy
> > file:66:35\n"}, body: null}

For clarity, what is returned to the client in this case is the following packet:

[1, 51, {"error":"javascript error","message":"ReferenceError: nonExistingVar is not defined","stacktrace":"\ninline javascript, line 65\nsrc: \"window.wrappedJSObject.CallLogDBManager.add(nonExistingVar);\"\nStack:\n__marionetteFunc/<@dummy file:65:11\n__marionetteFunc@dummy file:64:1\n@dummy file:66:35\n"}, null]

That is, a Response object (see message.js) containing an error property and no body property.  The client should pick up that the third array element (error) is populated and unmarshal the correct error type for that JSON Object.
feature-b2g: --- → 2.6+
Whiteboard: [MJS] [CI]
See Also: → 1233565
Will try and provide a hack for mozrunner so that it doesn't consume the crash reports and leaves of us for us to gather. This is the only reason why this is failing right now and it's now blocking us on fixing intermittents.
Assignee: nobody → aus
Status: NEW → ASSIGNED
Comment on attachment 8707134 [details] [review]
[gaia] nullaus:bug1175116 > mozilla-b2g:master

Turns out our failure was improperly processing framework errors. We assumed that if running the test failed and didn't have a handler already that the process crashed which is false. It can be a marionette js execution error. We now check if the crash info returned is empty (which means no crash!) and do the right thing.
Attachment #8707134 - Flags: review?(gaye)
You can look at the patch in action displaying the correct error stacks for marionette js framework errors -- https://treeherder.mozilla.org/#/jobs?repo=gaia&revision=55f7904eb48f12bc2edca5cff0bd329ac660c3b0
Comment on attachment 8707134 [details] [review]
[gaia] nullaus:bug1175116 > mozilla-b2g:master

LGTM!
Attachment #8707134 - Flags: review?(gaye) → review+
Commit (master): https://github.com/mozilla-b2g/gaia/commit/ac6083c43e35eae958698e999630fa392f93cc63

Fixed. Everyone should now be seeing framework errors when they occur rather than this generic 'Crash detected but error during stackwalk.'
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.