Closed Bug 1068466 Opened 5 years ago Closed 5 years ago

TBPL not reporting Robocop crashes

Categories

(Firefox for Android :: Testing, defect)

All
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 35

People

(Reporter: nalexander, Assigned: chmanchester)

Details

Attachments

(1 file, 1 obsolete file)

While working with testRestrictedProfiles, I discovered an automatic hard crash due to misused JNI.  (That is Bug 1067453, which inexplicably has not yet landed.)

However, this shows that the Robocop harness is not correctly seeing (some?) crashes.  See https://tbpl.mozilla.org/php/getParsedLog.php?id=48218595&tree=Fx-Team&full=1 for an example, in particular:

12:04:39     INFO -  09-16 12:04:34.023 I/Robocop ( 4435): {"action":"log","message":"testRestrictedProfiles.js | Starting test_getUserRestrictions","time":1410869074034,"pid":null,"level":"info","source":"robocop","thread":null}
12:04:39     INFO -  09-16 12:04:34.031 W/dalvikvm( 4435): JNI WARNING: JNI method called with exception pending
12:04:39     INFO -  09-16 12:04:34.031 W/dalvikvm( 4435):              in Lorg/mozilla/gecko/mozglue/GeckoLoader;.nativeRun:(Ljava/lang/String;)V (NewGlobalRef)
12:04:39     INFO -  09-16 12:04:34.031 W/dalvikvm( 4435): Pending exception is:
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): java.lang.NoClassDefFoundError: org/mozilla/gecko/RestrictedProfile
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:361)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:186)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): Caused by:
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): java.lang.ClassNotFoundException: org.mozilla.gecko.RestrictedProfile
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:61)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at java.lang.ClassLoader.loadClass(ClassLoader.java:501)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at java.lang.ClassLoader.loadClass(ClassLoader.java:461)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:361)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): 	at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:186)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435): "Gecko" prio=5 tid=16 NATIVE
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   | group="main" sCount=0 dsCount=0 obj=0x41351670 self=0x1a593b0
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   | sysTid=4456 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=27554304
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   | schedstat=( 0 0 0 ) utm=377 stm=63 core=1
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:361)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):   at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:186)
12:04:39     INFO -  09-16 12:04:34.031 I/dalvikvm( 4435):
12:04:39     INFO -  09-16 12:04:34.031 E/dalvikvm( 4435): VM aborting
12:04:39     INFO -  09-16 12:04:34.062 E/LoadFaviconTask( 4435): Error reading favicon
12:04:39     INFO -  09-16 12:04:34.070 E/LoadFaviconTask( 4435): Error reading favicon
12:04:39     INFO -  09-16 12:04:34.867 D/GeckoThumbnailHelper( 4435): Using new thumbnail size: 271860 (width 345)
12:04:39     INFO -  09-16 12:04:34.867 D/GeckoThumbnailHelper( 4435): Sending thumbnail event: 345, 197
12:04:39     INFO -  09-16 12:04:34.875 D/GeckoToolbar( 4435): onTabChanged: FAVICON
12:04:39     INFO -  09-16 12:04:34.875 D/GeckoToolbarDisplayLayout( 4435): updateFavicon(android.graphics.Bitmap@411ef170)
12:04:39     INFO -  09-16 12:04:34.875 D/GeckoBrowserApp( 4435): BrowserApp.onTabChanged: 0: FAVICON
12:04:39     INFO -  09-16 12:04:34.906 I/ActivityManager( 1401): Process org.mozilla.fennec (pid 4435) has died.

I do see an error in the log:

12:04:39     INFO -  0 ERROR Automation Error: Missing end of test marker (process crashed?)

But the test run is marked green.  Flagging gbrown and jmaher 'cuz they're my goto Robocop people; and chmanchester 'cuz this looks like a logging failure.
To clarify, is the only thing amiss that this didn't turn the job red/orange, or would you expected other output associated with this kind of crash?
(In reply to Chris Manchester [:chmanchester] from comment #1)
> To clarify, is the only thing amiss that this didn't turn the job
> red/orange, or would you expected other output associated with this kind of
> crash?

The thing that flags me is this isn't red/orange.  Others who know more may expect more output.
Some discussion in bug 951181 and associated bugs suggests to me there have always been some cases we hit the "Missing end of test marker" message without turning the tree red/orange, but others could confirm. Either way this looks like something that should turn the tree red.
This should turn the job orange whenever we don't get a test_end message (so usually when the process crashed).

Try run: https://tbpl.mozilla.org/?tree=Try&rev=c29d591064ad
Attachment #8491014 - Flags: review?(cmanchester)
Somehow robocop didn't get run on that try push. Would be good to see this turn the tree orange, but we'll have to coordinate fixing the orange it uncovers.

I think we agree this should turn the tree orange, but I'd like to get this signed off on by someone who might know why it doesn't.
I think the bustage this would have uncovered had a fix landed, but let's see:

https://tbpl.mozilla.org/?tree=Try&rev=2c25325b6c5a
Comment on attachment 8491014 [details] [diff] [review]
0001-Bug-1068466-fail-the-test-job-on-a-robocop-process-c.patch

Review of attachment 8491014 [details] [diff] [review]:
-----------------------------------------------------------------

:jmaher, this is a simple change to make more failure scenarios visible in tbpl, what's your take here?
Attachment #8491014 - Flags: review?(cmanchester) → review?(jmaher)
Comment on attachment 8491014 [details] [diff] [review]
0001-Bug-1068466-fail-the-test-job-on-a-robocop-process-c.patch

Review of attachment 8491014 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see anything wrong with this.  But I do have a concern:
* with structured logging, why do we write cryptic lines still (i.e. process-crash | ...) ?
Attachment #8491014 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #8)
> Comment on attachment 8491014 [details] [diff] [review]
> 0001-Bug-1068466-fail-the-test-job-on-a-robocop-process-c.patch
> 
> Review of attachment 8491014 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't see anything wrong with this.  But I do have a concern:
> * with structured logging, why do we write cryptic lines still (i.e.
> process-crash | ...) ?

As discussed, this all goes away with the forthcoming mozharness refactor.
great, it is documented now- lets move forward and not look back!
Come to think of it this needs to be logged at info for mozharness to pick it up. Truly unfortunate.

Try run just in case:
https://tbpl.mozilla.org/?tree=Try&rev=4a114ccaeb40
Attachment #8491014 - Attachment is obsolete: true
Assignee: nobody → cmanchester
Status: NEW → ASSIGNED
Comment on attachment 8491779 [details] [diff] [review]
fail the test job on a robocop process crash

Review of attachment 8491779 [details] [diff] [review]:
-----------------------------------------------------------------

r=jmaher
Attachment #8491779 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cb1f250550ff
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → Firefox 35
You need to log in before you can comment on or make changes to this bug.