Closed Bug 701002 Opened 13 years ago Closed 12 years ago

Put Java stacks into a separate field (not AppNotes)

Categories

(Toolkit :: Crash Reporting, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla12
Tracking Status
firefox11 --- fixed
firefox-esr10 12+ fixed

People

(Reporter: jdm, Assigned: cpeterson)

References

Details

(Whiteboard: [qa-])

Attachments

(3 files)

Specifically, it looks like there are entries in the Fennec 10.0a1 topcrash list that are meant to be java stacks, but actually contain other AppNotes data instead:

samsung SAMSUNG-SGH-I777 samsung/SGH-I777/SGH-I777:2.3.4/GINGERBREAD/UCKH7:user/release-key
samsung GT-I9000 samsung/GT-I9000/GT-I9000:2.3.3/GINGERBREAD/XWJVI:user/release-key

and so on. The links lead to empty lists.
can you please post some specific crash links to aid the investigation?
It looks like we've go three issues here, well maybe two and a half...

1) The broken link is likely a dupe of Bug 701211 - a Socorro issue only coincidentally associated with this problem.

2) The Java traceback information is missing from the AppNotes field.  This is a Breakpad problem.

2.5) Socorro has no facility to detect if the info in the AppNotes field is a valid stack trace. 

I suggest that Breakpad should not just spew data into the catchall AppNotes field without identifying it.  I suggest that we either move AppNotes entries into their own dedicated fields, or we encode identity info in AppNotes with something like json:

{ 
    "JavaTraceback": "...",
    "HardwareInfo": "samsung SAMSUNG-SGH...."
}

With this sort of organization, I can easily make Socorro pull just the appropriate information rather than ending up making a signature from inappropriate data.
Component: Socorro → Breakpad Integration
Product: Webtools → Toolkit
QA Contact: socorro → breakpad.integration
(In reply to K Lars Lohn [:lars] [:klohn] from comment #3)
> 2) The Java traceback information is missing from the AppNotes field.  This
> is a Breakpad problem.

bug 692185 covers this, I think.

> 2.5) Socorro has no facility to detect if the info in the AppNotes field is
> a valid stack trace. 
> 
> I suggest that Breakpad should not just spew data into the catchall AppNotes
> field without identifying it.  I suggest that we either move AppNotes
> entries into their own dedicated fields, or we encode identity info in
> AppNotes with something like json:

I totally said that in bug 686973 comment 1. :-/

Let's morph this bug to cover that last point. It should be as simple as using AnnotateCrashReport("JavaTraceback", ...) instead of AppendAppNotesToCrashReport(...). Obviously we'll have to make a corresponding Socorro change as well, so we should probably open a bug on that as well.
Summary: Java stack detector causes invalid crash report links to appear → Put Java stacks into a separate field (not AppNotes)
what is the latest status on this?
Ted, who is or can be working on this? We don't get useful signatures for a lot of native UI issues without this and its dependency.
Getting a mobile developer to make this change would be a lot more productive. I'd guess nobody is currently working on it, since it's unassigned.
I am confused. Is the purpose of this bug merely to create a way to extract the Java stack as a blob, without any other app note data?

If so, we can just add an annotation:

CrashReporter::AnnotateCrashReport(NS_LITERAL_CSTRING("JavaStack"), stackAsString);

OR, do you want each part of the stack separately addressable?
right now, the ability to create a Java signature is very simple minded.  If a certain symbol appears in the C/C++ stack frames of the crashing thread, Socorro takes a quote from the AppNotes field up to the first '{'.  If there is no '{', Socorro gives up and says something like "EMPTY: unexpected format".  

If we want to create more useful Java signatures, we need to give Socorro data to work with and an algorithm to use.  We've agreed to stop putting the data in AppNotes in favor of a dedicated (and not yet named) field in the crash json.  

these questions need to be answered:

1) Is the aforementioned json file the metadata json?
To recap and get this initiative moving, here's what I want know and want to know:  

right now, the ability to create a Java signature is very simple minded.  If a certain symbol appears in the C/C++ stack frames of the crashing thread, Socorro blindly takes a quote from the AppNotes field up to the first '{' as the Java signature.  If there is no '{', Socorro gives up and says something like "EMPTY: unexpected format".  

If we want to create more useful Java signatures, we need to give Socorro data to work with and an algorithm to use.  We've agreed to stop putting the data in AppNotes in favor of a dedicated (and not yet named) field in the crash json.  

these questions need to be answered:

1) Is the aforementioned json file the metadata json?  Or is it the json from the theoretical future json spewing minidump-stackwalk?

2) What information should be put in the json field and what is its source?

3) What should a Java signature look like?  Is it an aggregation of function signatures from the traceback of a thread?  Which thread?  Or is it something entirely different than how C/C++ signatures are generated?

4) Will the algorithm for creating a Java signature need filtering and aggregation facilities (skip lists) similar to those used C/C++ signature generation?
Assignee: nobody → cpeterson
> 3) What should a Java signature look like?  Is it an aggregation of function signatures 
> from the traceback of a thread?  Which thread?  Or is it something entirely different 
> than how C/C++ signatures are generated?

Here is an example of a typical Java stack trace (as reported by Java's exception APIs). If I move the Java stack traces from the free-form AppNotes to a new field (e.g. "JavaStackTrace"), do you need me to convert this stack data to some structured format? Or is a plain string (in the above format and in its own field) adequate?


java.lang.RuntimeException: THIS IS A USER-FRIENDLY DESCRIPTION OF THIS EXCEPTION
     at org.mozilla.gecko.GeckoAppShell.geckoLoaded(GeckoAppShell.java:498)
     at org.mozilla.gecko.GeckoAppShell.access$100(GeckoAppShell.java:85)
     at org.mozilla.gecko.GeckoAppShell$1.run(GeckoAppShell.java:476)
     at android.os.Handler.handleCallback(Handler.java:605)
     at android.os.Handler.dispatchMessage(Handler.java:92)
     at android.os.Looper.loop(Looper.java:137)
     at org.mozilla.gecko.GeckoApp$30.run(GeckoApp.java:1531)
     at android.os.Handler.handleCallback(Handler.java:605)
     at android.os.Handler.dispatchMessage(Handler.java:92)
     at android.os.Looper.loop(Looper.java:137)
     at android.app.ActivityThread.main(ActivityThread.java:4340)
     at java.lang.reflect.Method.invokeNative(Native Method)
     at java.lang.reflect.Method.invoke(Method.java:511)
     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)
     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)
     at dalvik.system.NativeStart.main(Native Method)


For now, we will only report a stack trace for one Java thread. The crash report WILL include a C++ stack trace, BUT if we have a "JavaStackTrace" field, then the C++ stack trace is not useful. It would likely be a C++ stack trace pointing to our crash reporter functions, such as "mozalloc_abort | __swrite | dexDataMapAlloc".
cpeterson, which part of this stack trace is the part we should pick up as the "signature" of the crash there (in C/C++ we are using the name of the function in which the offending call that caused the crash happened, I guess we want the same here) - do we have a good algorithm to assemble that from any "JavaStackTrace" field (lars will need that in bug 701390)?
> which part of this stack trace is the part we should pick up as the "signature" of the crash?

kairo,

1. I think the most useful "signature" of these Java stack traces would be to collect the first TWO lines from the stack trace fields. This would capture the Java exception's name+description and the Java function that threw the exception.

Here is a good example from a real bug:

    java.lang.IllegalArgumentException: View not attached to window manager
        at android.view.WindowManagerImpl.findViewLocked(WindowManagerImpl.java:355)

2. A potential problem with this signature scheme is that the exception's description will include numbers that may vary on different devices or content. For example:

    java.lang.IndexOutOfBoundsException: getChars (0 ... 17594) ends beyond length 7035
        at android.text.SpannableStringBuilder.checkRange(SpannableStringBuilder.java:967)

Fortunately, I do not think this will be a big problem. Most exception descriptions do not include numbers.

If we wanted to get clever, we could ignore numbers in the exception description string (but NOT the line numbers in the function names) when COMPARING signatures. But developers would still want to see the original exception descriptions with actual numbers when reviewing crash report details.

3. Collecting accurate Java crash reports is a high priority. If Lars is busy dealing with fallout from Oregon's storms, I can write some example signature code that a Socorro engineer could use plug into Socorro. When programming language(s) does the Socorro backend use for collecting and comparing crash signatures?
(In reply to Chris Peterson (:cpeterson) from comment #13)
> If we wanted to get clever, we could ignore numbers in the exception
> description string (but NOT the line numbers in the function names) when
> COMPARING signatures.

Actually, I'm also concerned about the line numbers. We intentionally don't have line numbers in the *signature* for C++ as well because we want the same crash on different releases to show up under one signature, and line numbers tend to change with all kinds of unrelated changes in the same file.

> But developers would still want to see the original
> exception descriptions with actual numbers when reviewing crash report
> details.

Of course, we want the full stack to be visible in the crash report details.

> 3. Collecting accurate Java crash reports is a high priority. If Lars is
> busy dealing with fallout from Oregon's storms, I can write some example
> signature code that a Socorro engineer could use plug into Socorro. When
> programming language(s) does the Socorro backend use for collecting and
> comparing crash signatures?

The first step we need is a patch to send this as a separate field from the client side. Lars is dealing with the flooding there but still online somehow. Once we know what field name this comes in from the client, I think the Socorro team can work on getting everything up (not sure about the time scale on that, I guess coordination with the #breakpad channel would be good).
Would be good to have some example report getting to their stage server once they have something, but I think first they need what exact field name this comes in and what the algorithm should be to generate a signature. (Also, I still fear that signatures will get long, we currently have a hard boundary of 255 characters max and shorter signatures are always better for referring to them in talking about issues as well - as long as one signature translates to one issue in most cases)
I can take the implementation.  It's just a waiting game here now and I could use the distraction. 

signatures have a maximum limit of 255 characters.  Your examples above are around 150.  What character(s) would you like me to replace the CR/LF with?

What's the name of the field in the json file?  

the java signature should supersede the regular signature whenever the field in the json file is non-empty?

what happens with the old java signature sentinel rule?  Do we drop it?
> What character(s) would you like me to replace the CR/LF with?

I think we are substituting \t with space elsewhere in the .csv in places like the comment field.  lets do the same thing.
> Actually, I'm also concerned about the line numbers. We intentionally don't have line numbers in the *signature* for C++ as well because we want the same crash on different releases to show up under one signature, and line numbers tend to change with all kinds of unrelated changes in the same file.

Sounds good. We can ignore the Java line numbers.


> signatures have a maximum limit of 255 characters.  Your examples above are around 150. 

A signature (composed of the first two lines of the Java stack trace string) might exceed 255 characters if it had an excessively long exception description string. (For example, the description might include a URL.) In those cases, you could just ignore line #1 (exception description) and capture just line #2 (function name).


>> What character(s) would you like me to replace the CR/LF with?
> I think we are substituting \t with space elsewhere in the .csv in places like the 
> comment field.  lets do the same thing.

Sounds good.


> What's the name of the field in the json file?  

I will call the field "JavaStackTrace".


> what happens with the old java signature sentinel rule?  Do we drop it?

What is the old java signature sentinel rule? Is that the code that scans the App Notes field for text that resembles a Java stack trace? We would no longer need that.
the old rule is: a Java signature is produced if and only if 'Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash' appears somewhere in the crashing thread stack frames from the minidump_stackwalk output.
I don't think we will need the 'Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash' sentinel check. The existence of a "JavaStackTrace" field should be adequate.
I assume there is a GUI component to this initiative.  While the signature is a distillation of the JavaStackTrace, you're likely going to want to be able to see the entire JavaStackTrace in the reports for individual bugs, ja?
> I assume there is a GUI component to this initiative.  While the signature is a 
> distillation of the JavaStackTrace, you're likely going to want to be able to see the 
> entire JavaStackTrace in the reports for individual bugs, ja?

Yes, good point!


> > signatures have a maximum limit of 255 characters.  Your examples above are around 150. 
>
> A signature (composed of the first two lines of the Java stack trace string) might exceed 
> 255 characters if it had an excessively long exception description string. (For example, 
> the description might include a URL.) In those cases, you could just ignore line #1 
> (exception description) and capture just line #2 (function name).

btw, after more thought, I think a better way to trim signatures that exceed 255 characters would be to omit the exception's *description* string (after the colon) but retaining the exception's *class name*. Dropping all of line #1 and keeping just line #2 would not be very helpful (since we are ignoring line numbers).

For example, this long signature:

    java.lang.RuntimeException: PRETEND THIS EXCEPTION DESCRIPTION STRING IS WAY TOO LOOOOOOONG
        at org.mozilla.gecko.GeckoAppShell.geckoLoaded(GeckoAppShell.java:498)

Could be trimmed to:

    java.lang.RuntimeException: at org.mozilla.gecko.GeckoAppShell.geckoLoaded(GeckoAppShell.java)
Status: NEW → ASSIGNED
How soon do crash reports appear in Socorro? To test my new "JavaStackTrace" field, I injected some Java exceptions into a test build. I sent about 4-5 crash reports around 2012-01-22 20:05:00 PST.

These test reports should have the "JavaStackTrace" field that should look something like this:

java.lang.RuntimeException: CPETERSON BUG701002 TEST
	at org.mozilla.gecko.GeckoApp.doReload(GeckoApp.java:2152)
	at org.mozilla.gecko.GeckoApp.onOptionsItemSelected(GeckoApp.java:503)
	at android.app.Activity.onMenuItemSelected(Activity.java:2502)
	at com.android.internal.policy.impl.PhoneWindow.onMenuItemSelected(PhoneWindow.java:950)
	at com.android.internal.view.menu.MenuBuilder.dispatchMenuItemSelected(MenuBuilder.java:735)
	at com.android.internal.view.menu.MenuItemImpl.invoke(MenuItemImpl.java:149)
	at com.android.internal.view.menu.MenuBuilder.performItemAction(MenuBuilder.java:874)
	at com.android.internal.view.menu.MenuPopupHelper.onItemClick(MenuPopupHelper.java:156)
	at android.widget.AdapterView.performItemClick(AdapterView.java:292)
	at android.widget.AbsListView.performItemClick(AbsListView.java:1058)
	at android.widget.AbsListView$PerformClick.run(AbsListView.java:2514)
	at android.widget.AbsListView$1.run(AbsListView.java:3168)
	at android.os.Handler.handleCallback(Handler.java:605)
	at android.os.Handler.dispatchMessage(Handler.java:92)
	at android.os.Looper.loop(Looper.java:137)
	at org.mozilla.gecko.GeckoApp$30.run(GeckoApp.java:1544)
	at android.os.Handler.handleCallback(Handler.java:605)
	at android.os.Handler.dispatchMessage(Handler.java:92)
	at android.os.Looper.loop(Looper.java:137)
	at android.app.ActivityThread.main(ActivityThread.java:4340)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:511)
	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)
	at dalvik.system.NativeStart.main(Native Method)
Go to about:crashes and see if the crash reports exist. Get a link to the crash report on crash-stats and post the link here.
once you've found your crashes in about:crashes, please post some of the crash uuid numbers so that we can grab them out of production for use as sample crashes in testing the Socorro side of this effort.
Blocks: 719943
consolidate reportJavaCrash() stack logging.
Attachment #590818 - Flags: review?(doug.turner)
send "JavaStackTrace" field to Socorro.
Attachment #590819 - Flags: review?(doug.turner)
remove unused showErrorDialog() method.
Attachment #590820 - Flags: review?(doug.turner)
Hooray! My confirmed that the "JavaStackTrace" json field was received by the Socorro server. Here is a copy of the json rawdump for for crash id aff55cea-9bf6-47c6-8d8e-3c4592120123:

https://crash-stats.mozilla.com/rawdumps/aff55cea-9bf6-47c6-8d8e-3c4592120123.json

{"JavaStackTrace": "java.lang.RuntimeException: CPETERSON BUG701002 TEST\n\tat org.mozilla.gecko.GeckoApp.doReload(GeckoApp.java:2152)\n\tat org.mozilla.gecko.GeckoApp.onOptionsItemSelected(GeckoApp.java:503)\n\tat android.app.Activity.onMenuItemSelected(Activity.java:2502)\n\tat com.android.internal.policy.impl.PhoneWindow.onMenuItemSelected(PhoneWindow.java:950)\n\tat com.android.internal.view.menu.MenuBuilder.dispatchMenuItemSelected(MenuBuilder.java:735)\n\tat com.android.internal.view.menu.MenuItemImpl.invoke(MenuItemImpl.java:149)\n\tat com.android.internal.view.menu.MenuBuilder.performItemAction(MenuBuilder.java:874)\n\tat com.android.internal.view.menu.MenuPopupHelper.onItemClick(MenuPopupHelper.java:156)\n\tat android.widget.AdapterView.performItemClick(AdapterView.java:292)\n\tat android.widget.AbsListView.performItemClick(AbsListView.java:1058)\n\tat android.widget.AbsListView$PerformClick.run(AbsListView.java:2514)\n\tat android.widget.AbsListView$1.run(AbsListView.java:3168)\n\tat android.os.Handler.handleCallback(Handler.java:605)\n\tat android.os.Handler.dispatchMessage(Handler.java:92)\n\tat android.os.Looper.loop(Looper.java:137)\n\tat org.mozilla.gecko.GeckoApp$30.run(GeckoApp.java:1544)\n\tat android.os.Handler.handleCallback(Handler.java:605)\n\tat android.os.Handler.dispatchMessage(Handler.java:92)\n\tat android.os.Looper.loop(Looper.java:137)\n\tat android.app.ActivityThread.main(ActivityThread.java:4340)\n\tat java.lang.reflect.Method.invokeNative(Native Method)\n\tat java.lang.reflect.Method.invoke(Method.java:511)\n\tat com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)\n\tat com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)\n\tat dalvik.system.NativeStart.main(Native Method)\n", "Android_Fingerprint": "google/yakju/maguro:4.0.2/ICL53F/235179:user/release-keys", "FramePoisonSize": "4096", "Theme": "classic/1.0", "Version": "12.0a1", "id": "{aa3c5121-dab2-40e2-81ca-7ea25febc110}", "InstallTime": "1327342341", "Android_Hardware": "tuna", "Vendor": "Mozilla", "EMCheckCompatibility": "true", "buildid": "20120122182012", "version": "12.0a1", "AdapterDeviceID": "Galaxy Nexus", "Android_Device": "maguro", "Android_CPU_ABI2": "armeabi", "ReleaseChannel": "default", "submitted_timestamp": "2012-01-23T18:12:59.756426+00:00", "Android_CPU_ABI": "armeabi-v7a", "URL": "about:home", "timestamp": 1327342379.7565839, "Notes": "EGL? EGL+\nAdapterVendorID: tuna, AdapterDeviceID: Galaxy Nexus.\nAdapterDescription: 'Android, Model: 'Galaxy Nexus', Product: 'yakju', Manufacturer: 'samsung', Hardware: 'tuna''.\n\nsamsung Galaxy Nexus\ngoogle/yakju/maguro:4.0.2/ICL53F/235179:user/release-keys", "CrashTime": "1327342372", "Android_Display": "ICL53F", "Android_Manufacturer": "samsung", "StartupTime": "1327342341", "AdapterVendorID": "tuna", "Android_Brand": "google", "FramePoisonBase": "00000000f0dea000", "Min_ARM_Version": "7", "Add-ons": "", "BuildID": "20120122182012", "Android_Model": "Galaxy Nexus", "ProductName": "Fennec", "legacy_processing": 0, "Android_Version": "14 (REL)", "Android_Board": "tuna", "ProductID": "{aa3c5121-dab2-40e2-81ca-7ea25febc110}"}
Comment on attachment 590820 [details] [diff] [review]
bug-701002-part-3-remove-unused-showErrorDialog.patch

Review of attachment 590820 [details] [diff] [review]:
-----------------------------------------------------------------

stuff like this should have been factored into its own bug and anyone could have rubber stamped it.
Attachment #590820 - Flags: review?(doug.turner) → review+
Attachment #590819 - Flags: review?(doug.turner) → review+
Attachment #590818 - Flags: review?(doug.turner) → review+
Keywords: checkin-needed
nice work cpeterson.
I hope we can uplift this at least to 11 (the train on 10 is gone, I guess).
Comment on attachment 590818 [details] [diff] [review]
bug-701002-part-1-consolidate-reportJavaCrash-logging.patch

[Approval Request Comment]
Regression caused by (bug #): N/A
User impact if declined: Socorro crash reports for Fennec will continue being misreported.
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): Fennec-only change. Only changes crash-reporting code path (so Fennec is already dying at that point).
Attachment #590818 - Flags: approval-mozilla-aurora?
Comment on attachment 590819 [details] [diff] [review]
bug-701002-part-2-send-JavaStackTrace-field.patch

[Approval Request Comment]
Regression caused by (bug #): N/A
User impact if declined: Socorro crash reports for Fennec will continue being misreported.
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): Fennec-only change. Only changes crash-reporting code path (so Fennec is already dying at that point).
Attachment #590819 - Flags: approval-mozilla-aurora?
Comment on attachment 590818 [details] [diff] [review]
bug-701002-part-1-consolidate-reportJavaCrash-logging.patch

[Triage Comment]
Very important for our Fennec Native effort. Approved for Aurora.
Attachment #590818 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #590819 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 723499
No longer depends on: 723499
[Triage Comment]
Naoki - is there a reason you're nominating this for esr10 landing?  It's not a security fix and it seems to be a Fennec Native issue, so unless you have a specific use case to plead here I'll be denying these requests.
lsblakk yes, there is a specific reason why I am noming this for esr 10.

The top crash for XUL is not actionable for devs without this separation :
https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2012-03-26&signature=Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash&version=Fennec%3A10.0.3esr
(In reply to Naoki Hirata :nhirata from comment #40)
> The top crash for XUL is not actionable for devs without this separation :
About 20% of all crashes in XUL Fennec (potentially Native Fennec crashes affecting an audience larger than 800 ADU in Aurora and 500 ADU in Beta) are indeed unknown.
Comment on attachment 590818 [details] [diff] [review]
bug-701002-part-1-consolidate-reportJavaCrash-logging.patch

[Triage Comment]
Thanks for the clarification, Naoki. Please go ahead and land these to ESR as per https://wiki.mozilla.org/Release_Management/ESR_Landing_Process
Attachment #590818 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Attachment #590819 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
Attachment #590820 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
If we are going to land this patch to improve Java crash reporting on ESR10, then I recommend also landing bug 739418 to make Java crash reports easier to collate in Socorro.
Blocks: 739418
I only needed to land bug-701002-part-2-send-JavaStackTrace-field.patch:

https://hg.mozilla.org/releases/mozilla-esr10/rev/d9b4240fa887
To land bug 739418, I had to land (approval-mozilla-esr10+) prerequisite patch bug-701002-part-1-consolidate-reportJavaCrash-logging.patch:

https://hg.mozilla.org/releases/mozilla-esr10/rev/be2d8a309cc4
Does this have or need a test case in-testsuite?
Flags: in-testsuite?
Whiteboard: [qa-]
tracking-fennec: ? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: