<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Brian Carpenter [:geeknik]

Comment 3

•

15 years ago

looks like the trunk volume regression range is right around oct 13 or 14 date tl crashes at, count build, count build, ... js::DefaultValue.JSContext...JSObject...JSType..js::Value.. 20101010 8 6 4.0b7pre2010100604, 1 4.0b62010091408, 1 4.0b42010081813, 20101011 2 4.0b62010091408 2 , 20101012 6 4 4.0b62010091408, 1 4.0b8pre2010101104, 1 4.0b42010081813, 20101013 4 4.0b32010080519 4 , 20101014 12 8 4.0b8pre2010101404, 4 4.0b8pre2010101322, 20101015 21 10 4.0b8pre2010101404, 7 4.0b8pre2010101504, 3 4.0b62010091408, 1 4.0b7pre2010100204, 20101016 9 5 4.0b8pre2010101604, 3 4.0b8pre2010101504, 1 4.0b62010091408, 20101017 16 8 4.0b8pre2010101604, 3 4.0b8pre2010101704, 1 4.0b8pre2010101605, 1 4.0b8pre2010101504, 1 4.0b8pre2010101404, 1 4.0b62010091408, 1 4.0b32010080519, 20101018 23 9 4.0b8pre2010101804, 7 4.0b8pre2010101704, 4 4.0b62010091408, 1 4.0b8pre2010101604, 1 4.0b8pre2010101404, 1 4.0b7pre2010100204, 20101019 23 9 4.0b8pre2010101904, 7 4.0b8pre2010101804, 2 4.0b8pre2010101604, 2 4.0b8pre2010101504, 1 4.0b8pre2010101805, 1 4.0b8pre2010101404, 1 4.0b8pre2010101322,

Robert Sayre

Updated

•

15 years ago

blocking2.0: ? → betaN+

Comment 4

•

15 years ago

Just got this crash with the latest Minefield nightly (Built from http://hg.mozilla.org/mozilla-central/rev/8cbe83542596) while visiting www.appbrain.com. http://crash-stats.mozilla.com/report/index/21e05006-0f91-417c-99dc-165142101107

Assignee

Updated

•

15 years ago

Assignee: general → dmandelin

Assignee

Comment 5

•

15 years ago

I have some info: 1. In most of these, we crash with an address of 0x4 trying to read the clasp of an object, which means the JSObject* is NULL. In the minidump I read, the reason is that we call ValueToNumberSlow on a jsval that is an array hole. Holes should not appear as "normal" values, so that is a bug. 2. This crash was happening earlier and then stopped. The first one in the "new" version came in an Oct 7 build. They were low-volume at first, but spiked when compartments landed--I don't know why. 3. Bug 601733, which relates to JSOP_GETELEM, landed just before the Oct 7 build. Interestingly, that bug asserts that a value is not a hole at a place that used to check, inside ArgGetter. So, that could be the bug. Assuming holes do reach that point, it's not clear yet whether asserting non-hole is a bug, or if some caller is using ArgGetter in a buggy way. 4. If the case in ArgGetter is not the "cause" of this, then most likely we are reading holes out of an array somewhere else.

Assignee

Comment 6

•

15 years ago

Attached patch Diagnostic patch — Details — Splinter Review

This patch primarily checks for holes in ArgGetter per the last comment. In case that's not it, the patch also checks for holes read out of arrays by the interpreter and mjit stub call. Of course, they could be read out elsewhere, but I'm starting in the easy places.

Attachment #490729 - Flags: review?(lw)

Updated

•

15 years ago

Attachment #490729 - Flags: review?(lw) → review+

Assignee

Comment 7

•

15 years ago

Diagnostic landed: http://hg.mozilla.org/mozilla-central/rev/1b815a3b4250

Status: NEW → ASSIGNED

Assignee

Comment 8

•

15 years ago

OK, we've had a big dropoff in the number of these crashes since the diagnostic landed, so either (a) it got fixed by someone else, or (b) the diagnostic is making it crash at a different place, which I'm not finding in Socorro. (I did search for the sigs of both the functions where I added crashes.) So, I think the next step is to back out the diagnostic by parts. If one of the backouts causes the crash to come back, then that's the one that was being hit. If neither does, then it got fixed by someone else.

Assignee

Comment 9

•

15 years ago

(In reply to comment #8) Argh, in comment 8 when I said there was a dropoff I was looking at data for a different crash, bug. It is pretty interesting that that crash seems to have dropped off with the landing of this diagnostic, so the comments in comment 8 do kind of bug 608118.

Assignee

Comment 10

•

15 years ago

Diagnostic backed out: http://hg.mozilla.org/mozilla-central/rev/693505bdb668 The result was negative. I think I'm going to back up a bit on this. The previous test was based on the guess that we were getting a hole from getelem, but neither of those two was ever directly confirmed. So I'm going to first confirm that we are crashing on holes, and continue from there.

Assignee

Comment 11

•

15 years ago

Attached patch Diagnostic patch 2: verify that we are crashing on holes — Details — Splinter Review

Attachment #491632 - Flags: review?

Updated

•

15 years ago

Attachment #491632 - Flags: review? → review+

Assignee

Comment 12

•

15 years ago

Diagnostic 2 landed: http://hg.mozilla.org/mozilla-central/rev/c29356cef6d4

Assignee

Comment 13

•

15 years ago

Diagnostic 2 backed out: http://hg.mozilla.org/mozilla-central/rev/09e869b8ff89 The crash in diagnostic 2 got hit one time. That's way fewer than the original topcrash was. That crash seems to have last occurred in the Nov 17 build. (Note that the diagnostic was first active only in the Nov 18 build.) So the original crash might be gone. I'll keep watching.

Assignee

Comment 14

•

15 years ago

It's still there, but low frequency. Deprioritizing.

blocking2.0: betaN+ → -

status2.0: --- → wanted

Comment 15

•

15 years ago

looks like its still inside the top 20 at #17 in beta7 and running at 130-190 crashes per day. seems like we should stil be trying to get any crash inside the top 30 that is a regression from firefox 3.6.x

Assignee

Comment 16

•

15 years ago

That's reasonable. It looks good on nightlies, but let's check it again once beta8 goes out and make it block again if it's still popular there.

Comment 17

•

15 years ago

bug 619064 just found a reproducible way to crash in js::DefaultValue, except that it involves obj->getClass()->convert == NULL which isn't the same symptom as comment 5 reports.

Comment 18

•

15 years ago

n/m -- its using the shell-only (and I'm pretty sure unsafe) parent() function.

Arie Paap [:wildmyron]

Comment 19

•

14 years ago

Mozilla/5.0 (Windows NT 5.1; rv:6.0a1) Gecko/20110501 Firefox/6.0a1 Build ID: 20110501030600 Built from http://hg.mozilla.org/mozilla-central/rev/068d876996c6 I reliably (100%) get a crash with the same signature on http://geo.maunsell.com/smcpublic/index.phtml using a clean profile when I move the mouse over the map, either after page load finished or during page load (once map images are displayed). Sample crash reports: bp-ea030708-e93c-4847-8b7a-f8b6a2110503 bp-7e16d21d-48a8-4201-8697-c443d2110503 The stack is different though... A quick scan shows some js::DefaultValue(...) crashes in nightly 6.0a1 with similar stack as mine and others with the same two or three frames at the top of stack as in comment 0. I don't relish the thought of making a testcase from this URL, but would it be useful?

Reporter

Comment 20

•

14 years ago

(In reply to comment #19) > I reliably (100%) get a crash with the same signature on > http://geo.maunsell.com/smcpublic/index.phtml using a clean profile when I move > the mouse over the map, either after page load finished or during page load > (once map images are displayed). I cannot reproduce in 4.0.1 with these STR, so it is a new bug. I renamed the summary to reflect the stack traces in comment 0.

Summary: crash [@ js::DefaultValue(JSContext*, JSObject*, JSType, js::Value*) ] → Crash in js::ValueToNumberSlow [@ js::DefaultValue(JSContext*, JSObject*, JSType, js::Value*) ]

Reporter

Comment 21

•

14 years ago

It is #4 top crasher in 5.0b2. The major part of stack traces are similar to the ones in comment 0: 0 mozjs.dll js::DefaultValue js/src/jsobj.cpp:5923 1 mozjs.dll js::ValueToNumberSlow js/src/jsnum.cpp:1274 2 mozjs.dll js::ValueToECMAInt32Slow js/src/jsnum.cpp:1292 3 mozjs.dll js::mjit::stubs::BitXor js/src/methodjit/StubCalls.cpp:617 4 @0x765f1df 5 mozjs.dll CheckStackAndEnterMethodJIT js/src/methodjit/MethodJIT.cpp:712 6 mozjs.dll js::Interpret js/src/jsinterp.cpp:3542 7 mozjs.dll js::Invoke js/src/jsinterp.cpp:716 8 mozjs.dll JS_CallFunctionValue js/src/jsapi.cpp:5153 9 xul.dll nsJSContext::CallEventHandler dom/base/nsJSEnvironment.cpp:1903 10 xul.dll nsJSEventListener::HandleEvent dom/src/events/nsJSEventListener.cpp:224 11 xul.dll nsEventListenerManager::HandleEventSubType content/events/src/nsEventListenerManager.cpp:1136 12 xul.dll nsEventListenerManager::HandleEventInternal content/events/src/nsEventListenerManager.cpp:1233 13 xul.dll nsEventDispatcher::Dispatch content/events/src/nsEventDispatcher.cpp:646 Maybe related to bug 595351.

tracking-firefox5: --- → ?

Sheila Mooney

Comment 22

•

14 years ago

Luke you had some ideas about how we could approach a fix in comments 17 and 18. Any updates on that? If we look at the top crash list, it's kind of dominated by JS related crashes. I think we need to get some eyes on this. It was not in the tp 300 for 4 and 4.0.1 and and seems to be in the top 10 for 5. Probably something regressed this.

tracking-firefox5: ? → +

Marcia Knous [:marcia]

Updated

•

14 years ago

Blocks: 659785

Comment 23

•

14 years ago

In comment 17 I thought I had found a connection to another bug but that other bug was using a shell-only function that isn't meant to be safe (hence the bug was resolved invalid).

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 24

•

14 years ago

(In reply to comment #19) > Mozilla/5.0 (Windows NT 5.1; rv:6.0a1) Gecko/20110501 Firefox/6.0a1 > Build ID: 20110501030600 > Built from http://hg.mozilla.org/mozilla-central/rev/068d876996c6 > > I reliably (100%) get a crash with the same signature on > http://geo.maunsell.com/smcpublic/index.phtml using a clean profile when I > move the mouse over the map, either after page load finished or during page > load (once map images are displayed). > > Sample crash reports: > bp-ea030708-e93c-4847-8b7a-f8b6a2110503 > bp-7e16d21d-48a8-4201-8697-c443d2110503 > > The stack is different though... A quick scan shows some > js::DefaultValue(...) crashes in nightly 6.0a1 with similar stack as mine > and others with the same two or three frames at the top of stack as in > comment 0. > > I don't relish the thought of making a testcase from this URL, but would it > be useful? I bisected on nightlies and found this bug to be first present in the 2011-03-19 mozilla-central build. Given that range, the regressing changeset is almost certainly: changeset: 63445:23455773db73 user: Kyle Huey <khuey@kylehuey.com> date: Fri Mar 18 17:37:46 2011 -0400 summary: Bug 641325: Turn PGO back on for JS. rs=ted a=sayrer Maybe the first thing to try is to turn PGO back off for JS and see what happens in nightly builds? We're currently averaging 1.8/day in nightlies, so if this a major cause, we should be able to see some effect fairly quickly.

Comment 25

•

14 years ago

Sure, you can turn off PGO in nightlies if you want to watch the crash numbers trend, but I don't think that turning off PGO is acceptable to get this to RESOLVED FIXED.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 26

•

14 years ago

Without knowing whether PGO is actually at fault: Is there a really compelling perf win to make it worth wasting a bunch of time tracking down why the compiler is breaking? Keep in mind that the JS win from PGO is only going to decrease over time as we focus on (1) extending JIT coverage to all areas of the VM and (2) improving the VM itself.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 27

•

14 years ago

Turn it off and see! (When we turned it back on in March it was a roughly 10% win on Sunspider, etc).

Comment 28

•

14 years ago

(In reply to comment #26) > Without knowing whether PGO is actually at fault: Is there a really > compelling perf win to make it worth wasting a bunch of time tracking down > why the compiler is breaking? > > Keep in mind that the JS win from PGO is only going to decrease over time as > we focus on (1) extending JIT coverage to all areas of the VM and (2) > improving the VM itself. this sounds reasonable given that time is short to get 5.0 stable. it looks like PGO could also be behind the new instability in this bug (#4 topcrash on 5.0beta) and bug 645775 (#5 topcrash) , and maybe others in the top crash list that we haven't had a chance to investigate yet. let get it turned of for the next round of beta builds, check crash stats, and make the call on if PGO is ready for this area, or if the other things dvander outlined will be in place soon enough. we really can't afford to give up any chance of added instability for perf wins based on the crash rates we we on 4.0x. and this early 5.0 crash data.

(not currently active) Ted Mielczarek

Reporter

Updated

•

14 years ago

Version: Trunk → 2.0 Branch

Comment 29

•

14 years ago

The simple way to turn it off is just to add: NO_PROFILE_GUIDED_OPTIMIZE = 1 to js/src/Makefile.in. You might want to make that ifeq (WINNT,$(OS_ARCH)) so you don't disable PGO on Linux builds as well. This will probably be a large perf hit, but that's not a regression from FF4.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 30

•

14 years ago

Attached patch Patch to disable PGO for for JS — Details — Splinter Review

Patch to disable PGO per Ted's instructions. This is in order to test how much of this crash (and also bug 645775) is from PGO on the nightly builds. If/when we get data on that, we can better decide what to do for the release.

Attachment #535438 - Flags: review?(ted.mielczarek)

Comment 31

•

14 years ago

Comment on attachment 535438 [details] [diff] [review] Patch to disable PGO for for JS Review of attachment 535438 [details] [diff] [review]: -----------------------------------------------------------------

Attachment #535438 - Flags: review?(ted.mielczarek) → review+

Robert Sayre

Comment 32

•

14 years ago

OK, all reviewed. dmandelin, can you check in?

Assignee

Comment 33

•

14 years ago

(In reply to comment #32) > OK, all reviewed. > > dmandelin, can you check in? http://hg.mozilla.org/mozilla-central/rev/7d2a3d61a377

Robert Sayre

Comment 34

•

14 years ago

Sheila, we want to get this on the beta branch as well, right?

Comment 35

•

14 years ago

yes, and in time for the next round of builds.

Assignee

Updated

•

14 years ago

Attachment #535438 - Flags: approval-mozilla-beta+

http://hg.mozilla.org/mozilla-beta/rev/6e18a4b3a5e1

Assignee

Comment 36

•

14 years ago

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 37

•

14 years ago

Please post to .tree-management about the expected perf regression here.

Asa Dotzler [:asa]

Updated

•

14 years ago

status-firefox5: --- → fixed

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 38

•

14 years ago

Why are we marking this fixed for 5? Do we have crashstats numbers showing the decline in crashes?

(not currently active) Ted Mielczarek

Comment 39

•

14 years ago

(In reply to comment #38) > Why are we marking this fixed for 5? I'm guessing asa marked it that way since the patch to backout pgo is held here and marked landed from here. yeah, that makes it a bit messy. maybe we should have spun off another bug for that patch and make this bug and the other sharply rising js crash bugs in 5.0 dependent on that. I posted that list to tree-management. https://bugzilla.mozilla.org/show_bug.cgi?id=659785 #1 5.0 topcrash https://bugzilla.mozilla.org/show_bug.cgi?id=637267 $4 5.0 topcrash https://bugzilla.mozilla.org/show_bug.cgi?id=595635 #6 5.0 topcrash https://bugzilla.mozilla.org/show_bug.cgi?id=602803 #8 5.0 topcrash and maybe others further down that we haven't looked at yet. > Do we have crashstats numbers showing > the decline in crashes? nope, not yet. we need a new beta build and to get 250k+ users on it to do the right kind of analysis. if there is a way to expedite this it would really help.

Jim Mathies [:jimm]

Comment 40

•

14 years ago

This may have regressed Ts on windows - http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7a4900cfb25a&tochange=7d2a3d61a377 http://tinyurl.com/42nhg8g The regression was from 370 -> 380. There was another regression to 395 shortly after which has since been backed out.

Comment 41

•

14 years ago

Perf regressions are a known consequence of this patch. They're not regressions compared to Firefox 4, just compared to trunk.

Nobody; OK to take it and work on it

Comment 42

•

14 years ago

its looking like this landed on mozilla central just *after* the cut over of 6.0 on to the aurora channel, making 6.0a2 crashier. Can someone recheck to see if patch needs to land there?

tracking-firefox6: --- → ?

Asa Dotzler [:asa]

Comment 43

•

14 years ago

we'd like this for aurora (sorry if we are asking for something different now).

tracking-firefox6: ? → +

Asa Dotzler [:asa]

Updated

•

14 years ago

Attachment #535438 - Flags: approval-mozilla-aurora+

Robert Kaiser

Comment 44

•

14 years ago

(In reply to comment #42) > its looking like this landed on mozilla central just *after* the cut over of > 6.0 on to the aurora channel, making 6.0a2 crashier. We don't have any data that confirm it actually made it crashier, the data suggests that crashiness remained the same, it just did shift existing crashes to different signatures due to different inlining.

Updated

•

14 years ago

Crash Signature: [@ js::DefaultValue(JSContext*, JSObject*, JSType, js::Value*) ]

Robert Kaiser

Comment 45

•

14 years ago

Ted asked me to give a bit of a clearer statement here on my assessment of the situation of crashes with/without JS PGO. Most importantly: After turning off PGO on beta, we could not see an measurable improvement of overall crash rates (main comparison here is 5.0b2 vs. 5.0b3). Some signatures that looked to be new (e.g. js::Interpret, js::DefaultValue, js::Invoke, even js::gc::MarkChildren) disappeared or decreased significantly, but ones we've know well as high-rate crashers (js::mjit::EnterMethodJIT, to some extent js::gc::MarkObject) came back to the top levels. This is easy to explain by different inlining decision the compiler makes as part of PGO. My conclusion from looking through the numbers over the last weeks is that JS PGO does not regress crash rates, it just shifts crashes to be reported with different signatures, which is what we saw in 5.0b2 mainly. Followup thoughts/comments: 1) Right now, PGO is ON on aurora, while it's off on central, beta and release. 2) Aurora currently reports with higher than usual crashes/user, but it looks like that was a different issue, fixed late last week. If crash rates go down to a similar level as all other channels within a few days to a week from now, this could be one more data point showing that PGO does not regress overall crash rate. 3) I'd propose to turn PGO ON on central again as well, and leave it on Aurora. Due to the 5.0b2 experience, we now know what signatures the crashes move to, which makes analysis easier. 5 should of course stay as it is, with well-known signatures, we can let "moar speed" slip to 6. 4) If possible, looking into improving signature generation for those crashes would be helpful, so we still see that e.g. we're actually crashing in MethodJIT (#0 frames are always only an address in that case). 5) We should try to investigate that actual causes of those GC and MethodJIT crashes.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 46

•

14 years ago

I was unable to reproduce the crash using the steps in comment 19 using a PGO build built with MSVC 2010. I'll catch up with dmandelin in the office tomorrow to make sure that I'm doing it right. Given that we're going to move to MSVC 2010 Real Soon Now (TM), I would recommend disabling PGO for the js engine on Aurora and waiting until we switch compilers (probably for Gecko 8) to turn PGO back on.

LegNeato

Comment 47

•

14 years ago

Landed on mozilla-beta for Firefox 6: http://hg.mozilla.org/releases/mozilla-beta/rev/3aac9d8c3452 Apparently I missed this during Aurora as I thought I already transplanted it. Thanks khuey for calling attention to it.

status-firefox6: --- → fixed

Reporter

Comment 48

•

14 years ago

It is #4 top crasher in 6.0b1/20110705195857.

Comment 49

•

14 years ago

IIRC, DefaultValue is not present in FF7 top-crashes. Perhaps this is just a different inlining decision, but I don't see any callers of DefaultValue to take its place. Assuming the issue has been fixed, one candidate is http://hg.mozilla.org/releases/mozilla-aurora/rev/d2250fc608cc which first appeared on FF7. Can you think of anything crashy that you fixed in this patch Waldo?

Jeff Walden [:Waldo]

Comment 50

•

14 years ago

Nope, can't. (I skimmed the topcrashers a bit further down to see if the crash maybe bifurcated a bit, but none of the signatures look like anything this change would have spawn.) No idea what happened here. Serendipity! :-)

Comment 51

•

14 years ago

Robert Kaiser

Comment 52

•

14 years ago

The js:DefaultValue instances we are seeing on the first 6.0 beta (which inadvertently has JS PGO turned on, comment #47 landed _after_ that build, so only the second beta will have it off) all seems to have their stack running through some js::mjit::* so I suspect that those crashes get converted by different inlining decisions to report as js::mjit::EnterMethodJIT on non-JS-PGO builds, like a number of other signatures we've seen. Given that, I'm inclined to say we should mark this bug FIXED due to the patch turning JS PGO off that has landed from here, potentially changing the summary to state "turn off JS PGO", and use a new bug for issues like what comment #51 is pointing out. This bug is getting confusing.

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

14 years ago

Crash Signature: [@ js::DefaultValue(JSContext*, JSObject*, JSType, js::Value*) ] → [@ js::DefaultValue(JSContext*, JSObject*, JSType, js::Value*) ] [@ js::DefaultValue(JSContext*, JSObject*, JSType, JS::Value*)]

xunxun

Comment 53

•

13 years ago

Can the crash also be reproduced in present edition ( 11/12/13)? If not, we may enable js PGO. Mozilla has already moved to VC2010, so this bug may disappear...

Comment 54

•

13 years ago

JS PGO is currently broken on trunk. See bug 721284.