Closed Bug 910320 Opened 11 years ago Closed 11 years ago

Check for java exceptions on Talos runs too

Categories

(Testing :: Talos, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 4 obsolete files)

Bug 823452 added support for checking for java exceptions in the logcat (and if found, means we don't bother looking for the minidump). ie:
http://hg.mozilla.org/mozilla-central/annotate/44e615a66f3f/build/mobile/remoteautomation.py#l92
{
89     def checkForJavaException(self, logcat):
90         found_exception = False
91         for i, line in enumerate(logcat):
92             if "REPORTING UNCAUGHT EXCEPTION" in line or "FATAL EXCEPTION" in line:
93                 # Strip away the date, time, logcat tag and pid from the next two lines and
94                 # concatenate the remainder to form a concise summary of the exception. 
95                 #
96                 # For example:
97                 #
98                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 9 ("GeckoBackgroundThread")
99                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): java.lang.NullPointerException
100                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): 	at org.mozilla.gecko.GeckoApp$21.run(GeckoApp.java:1833)
101                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): 	at android.os.Handler.handleCallback(Handler.java:587)
102                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): 	at android.os.Handler.dispatchMessage(Handler.java:92)
103                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): 	at android.os.Looper.loop(Looper.java:123)
104                 # 01-30 20:15:41.937 E/GeckoAppShell( 1703): 	at org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:31)
105                 #
106                 #   -> java.lang.NullPointerException at org.mozilla.gecko.GeckoApp$21.run(GeckoApp.java:1833)
107                 found_exception = True
108                 logre = re.compile(r".*\): \t?(.*)")
109                 m = logre.search(logcat[i+1])
110                 if m and m.group(1):
111                     top_frame = m.group(1)
112                 m = logre.search(logcat[i+2])
113                 if m and m.group(1):
114                     top_frame = top_frame + m.group(1)
115                 print "PROCESS-CRASH | java-exception | %s" % top_frame
116                 break
117         return found_exception
}

However, this isn't implemented for Talos, meaning that we get things like:
https://tbpl.mozilla.org/php/getParsedLog.php?id=26815583&full=1&branch=fig
{
PROCESS-CRASH | tcheck2 | application crashed [@ mozalloc_abort(char const*)]
Crash dump filename: /tmp/tmpPtM48s/469f2341-9b77-5a0d-31e4eb41-270f459d.dmp
Operating system: Android
                  0.0.0 Linux 2.6.32.9-00002-gd8084dc-dirty #1 SMP PREEMPT Wed Feb 2 11:32:06 PST 2011 armv7l nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys
CPU: arm
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 0 (crashed)
 0  libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:7e83ba2dfb51 : 30 + 0x4]
     r4 = 0x0000aa50    r5 = 0x00000000    r6 = 0x483f35b0    r7 = 0x430cbf30
     r8 = 0xbe9697b0    r9 = 0x430cbf28   r10 = 0x430cbf14    fp = 0x00000000
     sp = 0xbe969728    lr = 0x4498a0bb    pc = 0x4498a0be
    Found by: given as instruction pointer in context
 1  libxul.so!Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash [AndroidJNI.cpp:7e83ba2dfb51 : 102 + 0x3]
     r4 = 0x0000aa50    r5 = 0x00000000    r6 = 0x483f35b0    r7 = 0x430cbf30
     r8 = 0xbe9697b0    r9 = 0x430cbf28   r10 = 0x430cbf14    fp = 0x00000000
     sp = 0xbe969730    pc = 0x55700ef7
    Found by: call frame info
 2  libmozglue.so!Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash [jni-stubs.inc:7e83ba2dfb51 : 108 + 0x1]
     r4 = 0x0000aa50    r5 = 0x00000000    r6 = 0x483f35b0    r7 = 0x430cbf30
     r8 = 0xbe9697b0    r9 = 0x430cbf28   r10 = 0x430cbf14    fp = 0x00000000
     sp = 0xbe9697a8    pc = 0x80c1b4a3
    Found by: call frame info
}

However in the logcat, as expected:
{
08-21 05:25:10.594 E/GeckoAppShell( 2110): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 1 ("main")
08-21 05:25:10.594 E/GeckoAppShell( 2110): java.lang.IllegalStateException: Fragment already added: PinBookmarkDialog{48425208 #3 pin_bookmark}
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.support.v4.app.FragmentManagerImpl.addFragment(FragmentManager.java:1159)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.support.v4.app.BackStackRecord.run(BackStackRecord.java:616)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.support.v4.app.FragmentManagerImpl.execPendingActions(FragmentManager.java:1444)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.support.v4.app.FragmentManagerImpl$1.run(FragmentManager.java:429)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.os.Handler.handleCallback(Handler.java:587)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.os.Handler.dispatchMessage(Handler.java:92)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.os.Looper.loop(Looper.java:123)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at android.app.ActivityThread.main(ActivityThread.java:4627)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at java.lang.reflect.Method.invokeNative(Native Method)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at java.lang.reflect.Method.invoke(Method.java:521)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626)
08-21 05:25:10.594 E/GeckoAppShell( 2110): 	at dalvik.system.NativeStart.main(Native Method)
}
Attached patch WIP (obsolete) — Splinter Review
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Comment on attachment 796782 [details] [diff] [review]
WIP

Joel, this is untested yet (because talos and so I can't face doing so today), but do you at least think this is the approach that you'd prefer?
Attachment #796782 - Flags: feedback?(jmaher)
Blocks: 907720
Don't we have this data in the .extra file next to the .dmp? Couldn't we handle this in mozcrash instead of duplicating this logic between automation.py and Talos? :-(
Comment on attachment 796782 [details] [diff] [review]
WIP

Review of attachment 796782 [details] [diff] [review]:
-----------------------------------------------------------------

this approach is fine.  It would be great if mozcrash had a mozcrash.checkForJavaCrash() method :)
ted, this data is found in logcat information, not in .dmp or .extra files.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> Don't we have this data in the .extra file next to the .dmp? Couldn't we
> handle this in mozcrash instead of duplicating this logic between
> automation.py and Talos? :-(

That's the approach I trialled in bug 809065, but the logcat method of bug  	823452 got implemented first, and at the time we had no reason to favour one method over the other (though now that seems less of the case).
(In reply to Joel Maher (:jmaher) from comment #5)
> ted, this data is found in logcat information, not in .dmp or .extra files.

We put the Java stack in the .extra file so Socorro can process it, so I assume it's available there in automation as well.
there are many cases where we have java only crashes which are at the system level and have nothing to do with our program.  If it is a java crash in our program we get .dmp/.extra files, if this is a libc crash or libdvm crash at the system level, we only see it in logcat.
Okay, I don't know about all of those. Ed's comment 0 clearly shows that we have a .dmp (since we got a stack from it), so we'd presumably have the Java stack in the matching .extra.

Maybe we could simply move this method into mozbase somewhere instead of duplicating it?
Attachment #796782 - Flags: feedback?(jmaher)
Blocks: 920490
Depends on: 923500
Let's just pop the common functionality in mozcrash and use that both from remoteautomation.py and talos.

Have broken out to bug 923500, since we'll need to do the "land in mozbase, deploy elsewhere" dance, before we can proceed here.
Comment on attachment 796782 [details] [diff] [review]
WIP

Have broken out the "clean up remoteautomation.py" part to bug 923513, due to this bug requiring talos push to prod etc.
Attachment #796782 - Attachment is obsolete: true
Use the new mozcrash.check_for_java_exception() added by bug 910320 (this will obviously have to land after the updated mozcrash is rolled out).
Attachment #813655 - Flags: review?(jmaher)
ttest.py:15: 'glob' imported but unused
ttest.py:19: 're' imported but unused
ttest.py:21: 'shutil' imported but unused
ttest.py:129: local variable 'cleanup_result' is assigned to but never used
ttest.py:199: local variable 'e' is assigned to but never used
Attachment #813656 - Flags: review?(jmaher)
Comment on attachment 813655 [details] [diff] [review]
Part 1: Use mozcrash.check_for_java_exception()

Review of attachment 813655 [details] [diff] [review]:
-----------------------------------------------------------------

close, but this should be rearranged a bit.

::: talos/ttest.py
@@ +136,5 @@
> +            with open('logcat.log') as f:
> +                logcat = f.read()
> +            found = mozcrash.check_for_java_exception(logcat)
> +
> +        if not found:

why not use an else here?

@@ +168,5 @@
>  
> +            found = mozcrash.check_for_crashes(minidumpdir,
> +                                               browser_config['symbols_path'],
> +                                               stackwalk_binary=stackwalkbin,
> +                                               test_name=test_name)

I would like to put all the remote checking in the same block.  So, check for remote first (logcat/java, then minidump), then windows, osx, linux.
Attachment #813655 - Flags: review?(jmaher) → review-
Comment on attachment 813656 [details] [diff] [review]
Part 2: Fix some existing Pyflakes warnings

Review of attachment 813656 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!
Attachment #813656 - Flags: review?(jmaher) → review+
(In reply to Joel Maher (:jmaher) from comment #14)
> Comment on attachment 813655 [details] [diff] [review]
> Part 1: Use mozcrash.check_for_java_exception()
> 
> ::: talos/ttest.py
> @@ +136,5 @@
> > +            with open('logcat.log') as f:
> > +                logcat = f.read()
> > +            found = mozcrash.check_for_java_exception(logcat)
> > +
> > +        if not found:
> 
> why not use an else here?

Because that breaks the "logcat exists but there wasn't a java exception" case. (ie: if the logcat exists, we'd give up after finding no java exception, rather than continuing to look for the minidump as usual).

> @@ +168,5 @@
> >  
> > +            found = mozcrash.check_for_crashes(minidumpdir,
> > +                                               browser_config['symbols_path'],
> > +                                               stackwalk_binary=stackwalkbin,
> > +                                               test_name=test_name)
> 
> I would like to put all the remote checking in the same block.  So, check
> for remote first (logcat/java, then minidump), then windows, osx, linux.

Agreed, have rearranged and will re-upload.
Attachment #814389 - Flags: review?(jmaher)
Attachment #813655 - Attachment is obsolete: true
Just noticed the logcat was being passed as a string, rather than the required list of lines. 
Added a .split('\r') to match:
https://github.com/mozilla/mozbase/blob/master/mozdevice/mozdevice/devicemanager.py#L137
...which is what is used by automation.py.in prior to calling self.checkForJavaException(logcat):
http://hg.mozilla.org/mozilla-central/file/51b36c5fd45f/build/mobile/remoteautomation.py#l146
Attachment #814392 - Flags: review?(jmaher)
Attachment #814389 - Attachment is obsolete: true
Attachment #814389 - Flags: review?(jmaher)
Comment on attachment 814392 [details] [diff] [review]
Part 1: Use mozcrash.check_for_java_exception() v3

Review of attachment 814392 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!
Attachment #814392 - Flags: review?(jmaher) → review+
+ bump the version of mozcrash required to the newly released 0.9, so check_for_java_exception() is available.
Attachment #814392 - Attachment is obsolete: true
Update create_talos_zip.py to use the new mozcrash version too.
Comment on attachment 817038 [details] [diff] [review]
Part 3: create_talos_zip.py followup

Review of attachment 817038 [details] [diff] [review]:
-----------------------------------------------------------------

this looks good
Panda talos runs will need bug 927055 too
Depends on: 927055
Depends on: 928370
will this be closed when talos is deployed?
(In reply to Joel Maher (:jmaher) from comment #26)
> will this be closed when talos is deployed?

Yup :-)
Depends on: 931032
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: