Closed Bug 726385 Opened 8 years ago Closed 7 years ago

replace 8 digit hex addresses with a placeholder within Java signatures

Categories

(Socorro :: Backend, task)

ARM
Android
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nhirata, Assigned: lars)

References

Details

(Whiteboard: [qa+])

One of the things I am seeing is that there might be different address location for the same crash.  Example Bug 725167.  It seems like we need a skiplist implementation for the Java field as well.

[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@4054b560 at android.view.ViewGroup.updateViewLayout(ViewGroup.java)]
[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@40542c60 at android.view.ViewGroup.updateViewLayout(ViewGroup.java)]
[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@4053d980 at android.view.ViewGroup.updateViewLayout(ViewGroup.java)]
[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@4055db60 at android.view.ViewGroup.updateViewLayout(ViewGroup.java) ]
Blocks: 725167
Given that we are creating the Java signatures very differently from C/C++ signatures, I'm very unsure if a skiplist approach would be helpful at all. We surely need a Java specialist to propose how we could improve those signatures.
I also think that what you request here isn't even anything that our current skiplists would even be able to capture.
I understand that the current implementation for the skiplist does not capture the information.  That's why I created this bug.

I believe that we need a mechanism to combine certain Java crash signatures that are the same and also differentiate ones that have different crash stacks.  I already ran across these situations while looking through the current Java crash stacks.  That's what the skiplist has done.  If you have a better alternative proposal, I would like to hear it.  It doesn't have to be exactly like the skiplist; it just needs to be able to separate different java stacks and combine similar ones.
I believe we'd only be able to find a mechanism that would be quite complicate to implement (skiplists are pretty simple) and take away a lot of time from development of other needed Socorro features. Unfortunately those Java stacks are far more complicated than the C/C++ ones we're used to.
I am not sure if I understand your concern.  If a regex skiplist is simple to implement, why couldn't it be applied here?  Considering that the Java crashes being as fixed as much as possible is one of the newer exit criterias for the Mobile team, I would say that this is a high priority.
The problem is that a "Java stack trace" is actually an error identifier, an error message, and only then the actual stack trace. We create the "Java signature" of all of those, with special code to not grow over 255 characters, as signatures can't be longer than that in Socorro at this time.
The simple approach of the C/C++ skiplists doesn't work here, because for one thing, we probably run into the 255 character limit, and for the other, the default signature isn't just a stack frame but way more (error message, etc.). What we may need is some filtering of error messages and making them appear more generic in the signature, which is way more complex.
I'd like Lars to weigh in on how complicated this might be - I don't see that it would be that much harder than for C++, but perhaps he will differ.
Assignee: nobody → lars
Thanks, Laura.  

I think a substitution of various errors would end up reducing the cost of the characters for the title, as long as we can come up with some sort of good mapping that dev can agree on.  

It's error code + stack.  It's a bit lengthy because the java oop naming scheme.  Maybe there is a better way to handle it that Lars might be able to come up with.
C signatures are nothing but normalized C frame signatures concatenated together with a delimiter.  The regular expression rules collectively called "skip lists" control what C frame signatures are eligible for inclusion in a C signature.

The Java signatures consist of up to three things: 
  1) an Exception 
  2) an exception description (sometimes suppressed due to line length)
  3) a function call signature with line number suppressed

What would a Java version of the skip lists do?  My guess is that Java skip lists would affect only section 3.  Would it be a list of concatenated and delimited function call signatures?  Or would it just be a single function signature with the preceding ones having been suppressed?

In order for me to determine the complexity of the task, I need at least examples of what the desired output would be from a given input.  With examples, I can propose an algorithm and give an estimate as to the feasibility and timing of the enhancement.
Fair point.  cc'ing cpeterson in on the discussion.

most of what I see is that there are some addresses : 40542c60 (such as in the comment 0) that need to be combined as they are being separated out.

I think it might be easier to see the examples better when the java field is visible to everyone.
> most of what I see is that there are some addresses : 40542c60 (such as in the comment 0) 
> that need to be combined as they are being separated out.

I recommend that any numbers in the Java exception's description string should be ignored when comparing Java signatures.

These examples:

    java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@40567478

    java.lang.SecurityException: caller uid 10372 lacks any of android.permission.GET_ACCOUNTS

Could be treated as:

    java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@

    java.lang.SecurityException: caller uid  lacks any of android.permission.GET_ACCOUNTS
Blocks: 726685
Blocks: 727512
Signatures containing org.mozilla.fennec_aurora and org.mozilla.fennec are the same.
bug 725167 shows as an example of why we need a mechanism to combine the addresses.
OS: All → Android
Hardware: All → ARM
nhirata, do we really need "skip list" rules to skip functions in the Java stack traces? I think my suggestion in comment 10 (ignore number in Java exceptions' description strings) would solve most of our problems.

If Java signatures removed numbers, your examples in comment 1 would produce the identical Java signatures:

[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@ at android.view.ViewGroup.updateViewLayout(ViewGroup.java)]
[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@ at android.view.ViewGroup.updateViewLayout(ViewGroup.java)]
[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@ at android.view.ViewGroup.updateViewLayout(ViewGroup.java)]
[@ java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@ at android.view.ViewGroup.updateViewLayout(ViewGroup.java) ]
cpeterson, you make a valid point.  lars, laura, kairo, what do you think?
I'm with cpeterson on this and like his suggestion in comment #10 - though I'd do some bikeshedding and think we should not cut out the addresses completely but replace with with something that tells that there was a number, e.g. <addr> or so.
The change suggested in comment #13 would not be difficult.  If it makes sense to you to omit the numbers to consolidate signatures, then we should implement it.  However, only you can be the judge of what makes sense.

please give me precise instructions on whether you want just omission or replacement with a placeholder...
lars, I think we should add number placeholders, but ONLY within Java exceptions' description strings (but not within Java or C stack traces). 

For the number placeholder, I don't think we should use "<addr>" because some of the numbers are not addresses. I like "<#>" with one '#' character for replaced digit, but I don't have strong feelings about this. So these Java exception description strings:

  java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@40567478
  java.lang.SecurityException: caller uid 10372 lacks any of android.permission.GET_ACCOUNTS

Would become:

  java.lang.IllegalArgumentException: Given view not a child of android.widget.AbsoluteLayout@<########>
  java.lang.SecurityException: caller uid <#####> lacks any of android.permission.GET_ACCOUNTS

But we should NOT alter crash addresses in native code like these:

  libdvm.so@0x50a4e
  libdvm.so@0x50806

  libflashplayer.so@0x54d00c
  libflashplayer.so@0x53ad8e
With the digit stuff, you will have one signature for bp-747742a0-27cb-449f-8884-fd1142120507 and another one for bp-6762c672-7559-47ed-a698-addab2120505 although it's the same crash.
(In reply to Chris Peterson (:cpeterson) from comment #17)
> lars, I think we should add number placeholders, but ONLY within Java
> exceptions' description strings (but not within Java or C stack traces). 

For one thing, it's clear anyhow that this only affects signatures (not stacks themselves) and only ones derived from JavaStackTrace, not ones derived from C stack traces.

For the other, I don't buy into "all numbers" right now, mostly because stuff after the @ like in comment #0  (or in e.g. "android.view.WindowManager$BadTokenException: Unable to add window -- token android.os.BinderProxy@4051cea8 is not valid; is your activity running? at android.view.ViewRoot.setView(ViewRoot.java)") is actually a hex code, not a decimal-only number, and also because I can easily imagine Java exception strings containing numbers we want to replace, e.g. "android.database.sqlite.SQLiteDatabaseLockedException: error code 5: database is locked: at android.database.sqlite.SQLiteStatement.native_1x1_string(Native Method)" where I think both the 5 in the error code and the 1 in native_1x1_string are something we don't want replaced.

Having stuff like those mentioned in comment #18 complicates things but I'm not sure if we can/should handle those right now.

> For the number placeholder, I don't think we should use "<addr>" because
> some of the numbers are not addresses. I like "<#>" with one '#' character
> for replaced digit, but I don't have strong feelings about this.

The other problem with that is that this would make the signature strings longer and that might push us over the 255 char limit we have for signatures right now. I prefer a solution that doesn't make signatures longer in this replacement process.


Right now, I'd still prefer us to implement a s/@[0-9a-f]{8}/@<addr>/ rule for JavaStrackTrace-derived signatures and only then go the next step of thinking about how we could improve on other things like "Invalid index 2, size is 2" etc.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #19)
> Right now, I'd still prefer us to implement a s/@[0-9a-f]{8}/@<addr>/ rule
> for JavaStrackTrace-derived signatures and only then go the next step of
> thinking about how we could improve on other things like "Invalid index 2,
> size is 2" etc.

kairo, you bring up a great point that the hex addresses and other numbers. I think we should use your s/@[0-9a-f]{8}/@<addr>/ rule for the Java exception signatures.
this is what I've done:

After the creation of a Java signature, I locate and replace any 8 digit hex address that begins with '@' with a placeholder '@<addr>'.  Please acknowledge that this is the correct interpretation of your needs.

If you'd care to review the actual change, see the github pull request:  https://github.com/mozilla/socorro/pull/585
Assignee: lars → nobody
Component: Webapp → Backend
QA Contact: webapp → backend
Summary: Please create skiplist implementation for the Java field → replace 8 digit hex addresses with a placeholder within Java signatures
Target Milestone: --- → 10
Assignee: nobody → lars
lars, looks good, but I agree with peterbe's github comment about tightening up the regular expression to something like r'@[0-9a-f]{8}\s'.
(In reply to Chris Peterson (:cpeterson) from comment #22)
> lars, looks good, but I agree with peterbe's github comment about tightening
> up the regular expression to something like r'@[0-9a-f]{8}\s'.

In that case it needs to be something like (\s|:) at the end of it, as we frequently have a colon after the address.
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/bc6328369ec4eca24869d527a8a40805fd54e544
fixes bug 726385 - hex addresses to placeholder

https://github.com/mozilla/socorro/commit/b29773eee79f8b89ed9c39b5597d3856356e983c
Merge pull request #585 from twobraids/bug726385-java-addr

fixes bug 726385 - hex addresses to placeholder
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 744850, 729251
No longer blocks: 727512
Blocks: 756737
No longer blocks: 744850
Depends on: 744850
Blocks: 744850
No longer depends on: 744850
Whiteboard: [qa-]
Marking [qa-] - the unittests provide great coverage.
Flags: in-testsuite-
Flags: in-moztrap-
Looking at the list, I created bug 758038.
It seems like there's only one signature listed that I spotted that is suppose to be combined but is not?
Whiteboard: [qa-] → [qa?]
The change from this bug is only visible on newly processed bugs, i.e. ones that get submitted from now on that this has landed. If you want it to apply on older ones, you need to submit a request for reprocessing.
There's a colon after @<addr> while it shouldn't have one.
(In reply to Scoobidiver from comment #30)
> There's a colon after @<addr> while it shouldn't have one.

As Lars explained above, it now should always have one, it was a bug before that it didn't always have one.
Whiteboard: [qa?] → [qa+]
Depends on: 865142
You need to log in before you can comment on or make changes to this bug.