Closed Bug 832091 Opened 11 years ago Closed 11 years ago

crash in xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty mainly with Java

Categories

(Core :: XPConnect, defect)

18 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla23
Tracking Status
firefox18 --- wontfix
firefox19 + wontfix
firefox20 + wontfix
firefox21 + fixed
firefox22 --- fixed
firefox23 --- verified

People

(Reporter: mccr8, Assigned: gkrizsanits)

Details

(4 keywords)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-eafb6eb3-1b98-4a5d-a660-bfb272130112 .
============================================================= 

I noticed this when looking at crash stats.  It is around #40 in 19.  They are all null derefs.  There are a couple of comments that talk about Pogo and Java. It looks like the wrapper is null. I don't know if this is easy to fix or not.
It's #28 top browser crasher in 18.0, #43 in 19.0b1, #53 in 20.0a2, and #91 in 21.0a1.

More reports at:
https://crash-stats.mozilla.com/report/list?signature=xpc%3A%3AXPCWrappedNativeXrayTraits%3A%3AresolveDOMCollectionProperty%28JSContext*%2C+JSObject*%2C+JSObject*%2C+int%2C+bool%2C+JSPropertyDescriptor*%29
https://crash-stats.mozilla.com/report/list?signature=xpc%3A%3AXPCWrappedNativeXrayTraits%3A%3AresolveDOMCollectionProperty%28JSContext*%2C+JSObject*%2C+JSObject*%2C+int%2C+JSPropertyDescriptor*%2C+unsigned+int%29
Crash Signature: [@ xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty(JSContext*, JSObject*, JSObject*, int, bool, JSPropertyDescriptor*)] → [@ xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty(JSContext*, JSObject*, JSObject*, int, bool, JSPropertyDescriptor*)] [@ xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty(JSContext*, JSObject*, JSObject*, int JSPropertyDescrip…
Keywords: regression
OS: Windows NT → Windows 7
Version: unspecified → 18 Branch
It's now #29 in 18.0.1, #16 in 19.0b4, and #48 in 20.0a2 so that qualifies it for the topcrash keyword according to https://wiki.mozilla.org/CrashKill/Topcrash

It's correlated to Java:
  xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty(JSContext*, JSObject*, JSObject*, int, bool, JSPropertyDescriptor*)|EXCEPTION_ACCESS_VIOLATION_READ (273 crashes)
     69% (188/273) vs.   1% (1092/171370) java.dll
          1% (4/273) vs.   0% (25/171370) 6.0.310.5
          0% (1/273) vs.   0% (11/171370) 6.0.350.10
          8% (21/273) vs.   0% (89/171370) 6.0.370.6
          6% (16/273) vs.   0% (29/171370) 6.0.380.5
          0% (1/273) vs.   0% (5/171370) 7.0.40.22
          1% (2/273) vs.   0% (10/171370) 7.0.50.6
          8% (21/273) vs.   0% (162/171370) 7.0.90.5
          2% (6/273) vs.   0% (57/171370) 7.0.100.18
         38% (105/273) vs.   0% (513/171370) 7.0.110.21
          4% (11/273) vs.   0% (43/171370) 7.0.110.22

The first frames of the stack trace are:
Frame 	Module 	Signature 	Source
0 	xul.dll 	xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty 	js/xpconnect/wrappers/XrayWrapper.cpp:631
1 	xul.dll 	xpc::XPCWrappedNativeXrayTraits::resolveNativeProperty 	js/xpconnect/wrappers/XrayWrapper.cpp:774
2 	xul.dll 	xpc::XrayWrapper<js::CrossCompartmentWrapper,xpc::XPCWrappedNativeXrayTraits>::g 	js/xpconnect/wrappers/XrayWrapper.cpp:1432
3 	mozjs.dll 	js::BaseProxyHandler::get 	js/src/jsproxy.cpp:93
4 	xul.dll 	xpc::XrayWrapper<js::CrossCompartmentWrapper,xpc::XPCWrappedNativeXrayTraits>::g 	js/xpconnect/wrappers/XrayWrapper.cpp:1653
5 	mozjs.dll 	proxy_GetGeneric 	js/src/jsproxy.cpp:2634
6 	mozjs.dll 	JSObject::getGeneric 	js/src/jsobjinlines.h:171
7 	mozjs.dll 	js::GetPropertyOperation 	js/src/jsinterpinlines.h:307
8 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2228
Keywords: topcrash
Summary: crash in xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty → crash in xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty mainly with Java
I could just guard against the case that getWN(wrapper) returns null, but since that is an assumption it is used all over the place that it does not, I should do it in the whole file. Because I'm afraid even without my patch that introduced the line that is now the point where it crash, it would crash the same way a bit later...

Bobby, should we just guard against this case, or is this something that should never happen and should be tracked what is going on here?

Andrew: do you know any way to reproduce this? Since my other concern is how can we write a test for it...
Sorry, I just saw this on crash stats.
Keywords: needURLs
Yeah, we should really have a WN here. If it's gone that could mean that the WN was transplanted to another compartment, but then we shouldn't be accessing it over Xray. Please dig in if you have time.
It's #57 top browser crasher in 18.0.2, #55 in 19.0b5, #114 in 20.0a2, and #90 in 21.0a1.
Keywords: topcrash
Top URLs:

26 	about:blank
25 	https://verkkopankki.danskebank.fi/html/index.html?site=SBNBFI&secsystem=E2
16 	https://www.facebook.com/
12 	about:newtab
12 	about:home
10 	http://www.pogo.com/games/garden#game
10 	https://www.santandernet.com.br/default.asp
10 	http://www.pogo.com/?pageSection=cp_header_home
(there more with less than 10 hits, a lot of those are pogo.com as well)
Keywords: needURLs
I've tried various pogo games and most of the urls but could not reproduce it yet...
Since I cannot reproduce this bug I cannot do much about it at this point...
Assignee: gkrizsanits → nobody
It's #22 top browser crasher in 19.0, #19 in 20.0b1, and #20 in 21.0a2 with an upward tendency since February 26. See https://crash-analysis.mozilla.com/rkaiser/2013-02-27/2013-02-27.firefox.19.explosiveness.html

Almost every comments talk about Pogo.

There are no available correlations because of bug 836671 in case the spike might be related to a particular Java version or the latest MS update.
Andrew - given the rise in explosiveness and the fairly clear link to Pogo, can you investigate here?  Adding QA wanted too so we can try and get STR.
Assignee: nobody → continuation
I don't know anything about this code.  Maybe Gabor could look at this?  Though it sounds like without a test case he won't be able to do anything.
Yeah... I really have no idea what is going on here, don't really know anything about how we deal with java. I would love to investigate it but after spending many hours to try to reproduce it and failed with it, I have no idea what I could do.
I have done several investigations to reproduce this crash but I was not able to do that. I'd tryed different games from pogo, which asked to run java but after enableing Java they worked as expected. 
I'd also try different steps mentioned in comments via Soccoro but with no results.
(In reply to Scoobidiver from comment #2)
> It's correlated to Java:
>   xpc::XPCWrappedNativeXrayTraits::resolveDOMCollectionProperty(JSContext*,
> JSObject*, JSObject*, int, bool,
> JSPropertyDescriptor*)|EXCEPTION_ACCESS_VIOLATION_READ (273 crashes)
>      69% (188/273) vs.   1% (1092/171370) java.dll
>           1% (4/273) vs.   0% (25/171370) 6.0.310.5
>           0% (1/273) vs.   0% (11/171370) 6.0.350.10
>           8% (21/273) vs.   0% (89/171370) 6.0.370.6
>           6% (16/273) vs.   0% (29/171370) 6.0.380.5
>           0% (1/273) vs.   0% (5/171370) 7.0.40.22
>           1% (2/273) vs.   0% (10/171370) 7.0.50.6
>           8% (21/273) vs.   0% (162/171370) 7.0.90.5
>           2% (6/273) vs.   0% (57/171370) 7.0.100.18
>          38% (105/273) vs.   0% (513/171370) 7.0.110.21
>           4% (11/273) vs.   0% (43/171370) 7.0.110.22

If I'm reading this correctly, this is most highly correlated to Java 7u11. Am I correct?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #16)
> If I'm reading this correctly, this is most highly correlated to Java 7u11.
> Am I correct?
Yes. It was the latest version on February 3rd. Now it's likely correlated to Java SE7 U17.
(In reply to Scoobidiver from comment #17)
> (In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #16)
> > If I'm reading this correctly, this is most highly correlated to Java 7u11.
> > Am I correct?
> Yes. It was the latest version on February 3rd. Now it's likely correlated
> to Java SE7 U17.

Can I please get some more current Java version correlations? Should I just be testing with the latest? This will allow me to more effectively target my testing.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #18)
> Can I please get some more current Java version correlations?
As long as bug 836671 is not fixed, you won't.
(In reply to Scoobidiver from comment #19)
> (In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #18)
> > Can I please get some more current Java version correlations?
> As long as bug 836671 is not fixed, you won't.

Okay, we'll have to make do.

Ioana, can you please have your team make an attempt at reproducing this. The latest data we have indicates this was mostly correlated to Java 7u11 or later and Pogo.com games. Can you please do some testing with the latest Beta builds to see if you can shake out some steps to reproduce and possibly a regression range?
QA Contact: ioana.budnar
QA Contact: ioana.budnar → virgil.dicu
I'd tryed to reproduce using Java 7u11, 7u13 and 7u15 but didn't find any crash with any of the versions. Information from Soccoro are not enought for me to reproduce this issue.
(In reply to MarioMi (:MarioMi) from comment #21)
> I'd tryed to reproduce using Java 7u11, 7u13 and 7u15 but didn't find any
> crash with any of the versions. Information from Soccoro are not enought for
> me to reproduce this issue.

Using Java SE Developers Kit 7 u17 I can see a couple of hangs while loading and playing, but without (Not Responding): http://www.pogo.com/games/scrabble?guest_country=US#game

Even so no crash appeared.
Encountered hangs while playing http://www.pogo.com/games/poppit#game running other games, URLs from comment 7 or Java applets in other tabs, but couldn't reproduce the crash.

Verified on Firefox 18.0.2 (20130201065344), Firefox 19.0 (20130215130331) and Firefox 20.0 beta 4 (20130307075451) while using Java SE7 U11 and then Java SE7 U17.
Correlations are now available. It seems Java SE7 U17 is less affected than previous Java SE7 versions. In addition, the latest version is not CTP-blocklisted:
     84% (251/298) vs.   1% (1144/139851) java.dll
          1% (2/298) vs.   0% (6/139851) 6.0.310.5
          2% (5/298) vs.   0% (7/139851) 6.0.330.3
          3% (8/298) vs.   0% (16/139851) 6.0.370.6
          2% (5/298) vs.   0% (7/139851) 6.0.380.5
          5% (15/298) vs.   0% (18/139851) 6.0.390.4
          3% (9/298) vs.   0% (24/139851) 7.0.100.18
          6% (17/298) vs.   0% (42/139851) 7.0.110.21
         15% (46/298) vs.   0% (94/139851) 7.0.130.20
         38% (113/298) vs.   0% (241/139851) 7.0.150.3
          1% (2/298) vs.   0% (540/139851) 7.0.170.2
          1% (2/298) vs.   0% (4/139851) 7.0.50.6
          1% (3/298) vs.   0% (4/139851) 7.0.70.10
          1% (2/298) vs.   0% (16/139851) 7.0.70.11
          7% (22/298) vs.   0% (74/139851) 7.0.90.5
The highest correlation appears to be with Java 7.0.150.3 which appears to come from one of the JDKs. Virgil and Mario, can either of you try to reproduce with this version?
http://en.softonic.com/s/java-development-kit-7.0.150.3
I tried to find a crash with Java SE7 U15 on Firefox 18.0.2 (20130201065344), Firefox 19.0 (20130215130331) and Firefox 20.0 beta 4 (20130307075451) but still with no luck.

Loaded multiple tabs with games from www.pogo.com, other URL from comment 7 and different Java applets.
bholley: we might want to reconsider some kind of guarding against this case all over the file... I think this is a bug that comes back again and again, and we don't know what is going on. So instead of just crashing it would be great to fail a bit more gracefully. Or at very least we could collect some info before crashing. What do you think?
Flags: needinfo?(bobbyholley+bmo)
It seems from this callstack that there should be no reason to call ResolveDOMCollection here, right? We might as well reorder things so that we only call that function if we're in a valid state, which might have a positive effect on the crash volume?
Flags: needinfo?(bobbyholley+bmo)
I'm going to assign this to Gabor, who appears to actually be looking into this.  Thanks!
Assignee: continuation → gkrizsanits
Yeah, I'll try what Bobby suggests, it should be an improvement anyway, with some luck it might solve this bug entirely.
QA has made multiple unsuccessful attempts at reproducing this crash. Given this is now assigned to a developer I'm dropping qawanted. Please add it back if there is something specific we can do to assist, Gabor.
Keywords: qawanted
I'm not sure about this patch, please take a look at it. I'm a bit over cautious, tried to guard against all the cases any of us thought for a second that might be the culprit of this crash. It's not easy to try and blind fix something. Should not be a risky patch if I haven't made any trivial error, but tryserver should catch that I guess.
Attachment #732728 - Flags: review?(bobbyholley+bmo)
Comment on attachment 732728 [details] [diff] [review]
guarding against possible null pointers and what not

Review of attachment 732728 [details] [diff] [review]:
-----------------------------------------------------------------

This is pretty gross, and this kind of scary voodoo tends to stay in the tree much longer than it should. But given that resolveDOMCollectionProperty is going away with the new DOM bindings, I'm ok with polluting it a bit in the interim.

::: js/xpconnect/wrappers/XrayWrapper.cpp
@@ +778,5 @@
>  
>      XPCNativeInterface *iface;
>      XPCNativeMember *member;
>      XPCWrappedNative *wn = getWN(wrapper);
> +    

Whitespace error here.
Attachment #732728 - Flags: review?(bobbyholley+bmo) → review+
https://hg.mozilla.org/mozilla-central/rev/d4ce33010716
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #34)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d4ce33010716

We should already uplift this on aurora(Fx-22) and check if the top-crash is resolved, in preparation for a Beta uplift on Monday based on the risk here.
I thought for uplifting we need a test at least, which in this case we don't have (since no one was able to reproduce the crash). This patch is a blind fixing attempt, that comes with minimal risk, but I have no idea what is the strategy of uplifting something like this. Anyone could help me out here?
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #37)
> I thought for uplifting we need a test at least, which in this case we don't
> have (since no one was able to reproduce the crash). This patch is a blind
> fixing attempt, that comes with minimal risk, but I have no idea what is the
> strategy of uplifting something like this. Anyone could help me out here?

I would encourage you to uplift, if we well understand what we are doing and considering the patch is low risk keeping in mind the reward that it will help fix a top-crasher.Considering we take this in Beta 3, we will still have an opportunity to back out the patch in the next two weeks if it causes regression or does not fix the crash or the crash transforms to a new signature.

For perspective on risk analysis on uplift here adding needsinfo in :bholley as he reviewed the patch to call out anything :)
Flags: needinfo?(bobbyholley+bmo)
This roughly sums up the patch: http://bit.ly/10YVD7l

So yeah, we can take it on beta.
Flags: needinfo?(bobbyholley+bmo)
Comment on attachment 732728 [details] [diff] [review]
guarding against possible null pointers and what not

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 738244 (?)
User impact if declined: might be a topcrash if Java does something weird again
Testing completed (on m-c, etc.): cannot be reproduced but the patch is green on m-c
Risk to taking this patch (and alternatives if risky): should be a very safe patch
String or IDL/UUID changes made by this patch: none
Attachment #732728 - Flags: approval-mozilla-beta?
Attachment #732728 - Flags: approval-mozilla-aurora?
Approving on aurora/beta considering the patch is safe. Since we have not been able to reproduce the issue, crash-stats is the only fallback to help determine the issue is resolved.
Let's keep a close eye on crash-stats to ensure volume goes down.
Attachment #732728 - Flags: approval-mozilla-beta?
Attachment #732728 - Flags: approval-mozilla-beta+
Attachment #732728 - Flags: approval-mozilla-aurora?
Attachment #732728 - Flags: approval-mozilla-aurora+
(In reply to Scoobidiver from comment #1)
> It's #28 top browser crasher in 18.0, #43 in 19.0b1, #53 in 20.0a2, and #91
> in 21.0a1.
> 
> More reports at:
> https://crash-stats.mozilla.com/report/
> list?signature=xpc%3A%3AXPCWrappedNativeXrayTraits%3A%3AresolveDOMCollectionP
> roperty%28JSContext*%2C+JSObject*%2C+JSObject*%2C+int%2C+bool%2C+JSPropertyDe
> scriptor*%29
> https://crash-stats.mozilla.com/report/
> list?signature=xpc%3A%3AXPCWrappedNativeXrayTraits%3A%3AresolveDOMCollectionP
> roperty%28JSContext*%2C+JSObject*%2C+JSObject*%2C+int%2C+JSPropertyDescriptor
> *%2C+unsigned+int%29

This looks truly fixed without any crashes only in FF 23.
QA Contact: virgil.dicu
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: