Closed Bug 1114079 Opened 9 years ago Closed 9 years ago

Multiple crash signatures in JS error reporting (js::ReportOverRecursed, js::ErrorToException)

Categories

(Core :: JavaScript Engine, defect)

39 Branch
x86_64
Windows
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox38 - wontfix
firefox38.0.5 - wontfix
firefox39 + fixed
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: johnp, Assigned: jandem)

References

Details

(Keywords: crash, reproducible)

Crash Data

Attachments

(2 files, 1 obsolete file)

Experienced this startup crash with todays nightly:

bp-c0b8f51b-aeb9-4aef-a755-753222141220

I'm really not sure where to place this, as there's kind of "a bit of everything" in the stacktrace. Also there are a lot of different stack traces going back some time, most ending up with NewStringDeflated<1> or NewStringDeflated<0>.
This seems to reproduce on every startup with my profile, except when restoring my session from about:sessionrestore:

bp-704b3cd0-238d-4455-b899-66c022141221
bp-5e01dc1e-32b6-4ae6-b264-e14712141221

Moving to "JavaScript Engine" as the top frames look like some JS error handling code and crashing there really doesn't look nice.
Crash Signature: [@ NewStringDeflated<int> ]
Component: General → JavaScript Engine
Keywords: crash
Blocking bug 1107777 as the top third frame shows code added there right before this crashes started to occur.
Blocks: 1107777
(In reply to Johannes Pfrang [:johnp] from comment #1)
> This seems to reproduce on every startup with my profile, except when
> restoring my session from about:sessionrestore:
> 
> bp-704b3cd0-238d-4455-b899-66c022141221
> bp-5e01dc1e-32b6-4ae6-b264-e14712141221
> 
> Moving to "JavaScript Engine" as the top frames look like some JS error
> handling code and crashing there really doesn't look nice.

So if you startup after a crash, and see the about:sessionrestore page, and click "Restore Session", does it crash?
Could you try isolating if this happens due to a particular website or something in the browser by trying in a blank session and individually closing tabs?

Thanks!
Bug 1053999, which added the recursion check that seems to trigger this, mentions ask.com urls being a possible culprit (see comment 29 of that bug). Do you have the ask toolbar or any tabs on ask.com?
Sorry, I didn't see the extensions list in the crash stats before. Could you try with all extensions disabled? It would be great if you could find a reliable STR.
This is most definitely an OOM, and we were not hitting it previously because the JS_IsRunning() failure would prevent js_ErrorToException() from being called. bholley, I think this is related to the change in stack limits from Bug 1053999, the error string "too much recursion" probably isn't fitting in the small amount of memory we keep around in the conservative checker after all the copying that happens in the error handler bits. But I'm just speculating.
Flags: needinfo?(bobbyholley)
I'm not sure if I can still reproduce this. Fwiw I also had one crash from restoring from about:sessionrestore and I don't use ask.com or the ask toolbar. If I encounter it again I'll definitely try to get some STR.
So you're suggesting that js_ExpandErrorArguments is OOMing when allocating the string, but not returning false?

http://hg.mozilla.org/mozilla-central/annotate/490f124d7dea/js/src/jscntxt.cpp#l819

That seems reasonable, but offhand I don't see where that would be happening.
Flags: needinfo?(bobbyholley)
One month has gone by, so I'll resolve this WORKSFORME, as long as no one has objections.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Wow... four bug 1125734 crashes today and just like normal checking the crash-reports after each crash... thought this would be just another one of these, but then I saw this bugs signature in the report. Somehow this now hit me on my desktop computer and not on my laptop like all the other crashes last month.

bp-04a75a3d-38ec-4cda-8f31-7323b2150126
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
:nsm, just trying a shot in the dark, but this has somehow been starting to crash with the same nightly as bug 1125734 and both are practically indistinguishable for me until checking the about:crashes because both happen nearly every time I'm restoring my session... could bug 1125734 probably trigger this bug (as this is something about JS Errors/Exceptions and the other one a JS crash)? Unfortunately I don't have access to that bug so I just have to guess and wait for it to be fixed if I can still reproduce it then...
Flags: needinfo?(nsm.nikhil)
Johannes, if you can reproduce this in a debugger it would be good to know the value of message in https://dxr.mozilla.org/mozilla-central/source/js/src/jsexn.cpp#574 where this is a call to JS_NewStringCopyZ(cx, message). I'm unable to reproduce this.

Bill, the stack length check was changed to conservative in Bug 1053999. Do you have any ideas?
Flags: needinfo?(nsm.nikhil) → needinfo?(wmccloskey)
The crashes are always on Windows https://crash-stats.mozilla.com/report/list?product=Firefox&signature=NewStringDeflated%3Cint%3E#tab-reports and the trace does not always come from SetNewWindow(). This could be related to 1125734, but I don't know enough about the JS engine to be convinced of anything.
Not sure what's happening here. It looks like there's plenty of virtual and physical memory available, so I'd be surprised if we're OOMing

I would suggest waiting until bug 1125734 is fixed before digging into this one too much. It's possible they have the same root cause. There's work happening in bug 1125734, so hopefully we shouldn't have to wait too long.
Flags: needinfo?(wmccloskey)
Well, I reproduced a crash (not sure which one) with Visual Studio Debugger attached configured with the Mozilla symbol and source servers. Unfortunately the symbols for xul.dll won't load. I'm on the latest x64 nightly, is there anything wrong with symbol generation for 64-bit xul.dll?
Ok, manually downloaded the xul.pdb, now it's resolving fine but unfortunately it's bug 1125734. I'll keep the debugger attached for my next couple browsing sessions.
Bug 1125734 got fixed. Could you try now? Thanks a lot for all your effort!
I'm not sure that fix has hit nightly yet. Maybe better to wait until tomorrow.
Yeah, I just crashed with that signature again... No crash today with NewStringDeflated<int>, but I'll keep the debugger attached for the next couple days at least and hope I hit it again.
Ahem.... I just crashes with NewStringDeflated<int> and VS debugger attached, but instead of VS the crash-reporter showed up and my debugger disattached :(

bp-ef696ab8-f2a8-42f8-9847-436502150128

Any idea why that should happen?
So, I set a tracepoint on the message variable and got a 27k line log, I'll probably have to clean up a bit or mail to one of you, because that's from loading my full browser session... sneak peak:

js_ErrorToException message: 0x000000c74076ce40 "too much recursion"

also, prior to this happening my firefox and visual studio both hanged while the log filled itself with

js_ErrorToException message: 0x000000c6e9ed4310 "py6üÆ"

Just so many messages the debugger wasn't fast enough it seems.
More of that:

js_ErrorToException message: 0x000000c6e9ea1950 "@\tŸ\vÇ"
js_ErrorToException message: 0x000000c6e9ea1950 "ð\nŸ\vÇ"
js_ErrorToException message: 0x000000c6e9ea1930 "Ð\x13Æ\vÇ"
js_ErrorToException message: 0x000000c6e9ea0cf0 "PÛÆ\vÇ"
js_ErrorToException message: 0x000000c6e9ea2a40 "pÜÆ\vÇ"

this doesn't look like valid messages...
Attached file Cleaned last ~7k loglines (obsolete) —
Here are the last ~7k log lines starting from right before the dubious "py6üÆ" messages started with most non-file:// URLs removed. If you need the complete log or if I should try to set a tracepoint elsewhere just ask ;)

And just fyi, the debugger still detached and the crash-reporter showed up.
Are all these traces after 1125734? If you are using Nightly that patch isn't in there yet.
No, they're this bug. On occurrences of bug 1125734 my debugger worked and I didn't get the crash-reporter, whereas on the NewStringDeflated<int> crashes my debugger deattached and the crash-reporter showed. The first time I got a NewStringDeflated<int> today this hit me unexpected, but for the second time I added a trace-point to dump the content of the message variable, so I'd have results even when the debugger should detach again. The attachment is from that debugging session. I made sure it was this crash indeed by submitting the crash-report and checking the stack trace:

bp-b74213a4-b7fd-4060-90d9-36c102150128

Fwiw I still keep the debugger attached and once got such a dubious message spam again (2k+ times), but it didn't crash this time:

js_ErrorToException message: 0x000000b1fcd95a50 "à\x15²±"

Do these traces look like bug 1125734? As I said I don't have access to that so I was just guessing here they would be related to this one.
(In reply to Johannes Pfrang [:johnp] from comment #25)
> No, they're this bug. On occurrences of bug 1125734 my debugger worked and I
> didn't get the crash-reporter, whereas on the NewStringDeflated<int> crashes
> my debugger deattached and the crash-reporter showed. The first time I got a
> NewStringDeflated<int> today this hit me unexpected, but for the second time
> I added a trace-point to dump the content of the message variable, so I'd
> have results even when the debugger should detach again. The attachment is
> from that debugging session. I made sure it was this crash indeed by
> submitting the crash-report and checking the stack trace:
> 
> bp-b74213a4-b7fd-4060-90d9-36c102150128
> 
> Fwiw I still keep the debugger attached and once got such a dubious message
> spam again (2k+ times), but it didn't crash this time:
> 
> js_ErrorToException message: 0x000000b1fcd95a50 "à\x15²±"
> 
> Do these traces look like bug 1125734? As I said I don't have access to that
> so I was just guessing here they would be related to this one.

The expectation is that Bug 1125734 is also causing these crashes, so please wait until Bug 1125734 hits nightly (so tomorrow as Bill said above), and try with tomorrow's build. That may resolve this. I agree that the strings are odd.
Still crashing with today's nightly. Unfortunately there was an error when trying to send the crash-report and my tracepoint didn't work because the xul.pdb didn't load the first time. Now the tracepoint should work again and I'll try to get it crash again. Fwiw I got a log showing the same ~200 lines of 

[JavaScript Error: "too much recursion" {file: "resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/system/events.js" line: 68}]

with the last 8 lines only showing

[JavaScript Error: "too much recursion"]

and then 

[JavaScript Warning: "ReferenceError: reference to undefined property params.charset" {file: "chrome://browser/content/tabbrowser.xml" line: 1366}]
First-chance exception at 0x00007FFD205BD28C (xul.dll) in firefox.exe: 0xC0000005: Access violation reading location 0x0000000000000128.
Just as an update: I had it crash once again with yesterdays nightly, but unfortunately the debugger wasn't attached, as it didn't crash for two days before that, and before that the crash often happened multiple times with the same session, so I figured I'd wait for the first and then catch the second in the debugger. Unfortunately that didn't happen, but instead I crashed with bug 1128527.
Attached file Complete crash log
So, I finally got a log of this crash again. Attached is the complete 10MB log with URLs removed using "sed -r 's/http(s)?:\/\/[^ ^"]+/(URL)/g'".
So, it could be that the signature changed somehow... Here's a new crash with an additional stack frame:

bp-4c1c0fd4-c65d-4072-b893-5cbaf2150217

The new frame being js::gc::CheckAllocatorState<1> in http://hg.mozilla.org/mozilla-central/annotate/09f4968d5f42/js/src/jsgcinlines.h#l459.

I'm adding the crash signature here for the moment. If this new signature is unrelated I'd open a separate bug.
Crash Signature: [@ NewStringDeflated<int> ] → [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ]
I've added a comment to the bug that introduced the new code we're crashing in now: Bug 1130475
See Also: → 1137001
Bug 1137001 has a simple testcase and I can reliably repro this in opt and debug builds.

We're crashing under nsGlobalWindow::SetNewDocument -> js::ReportOverRecursed (the same recursion check as the one mentioned in comment 4). It's crashing while allocating the "too much recursion" string, in GCIfNeeded:

   435 	        if (rt->gc.isIncrementalGCInProgress() &&
-> 436 	            cx->zone()->usage.gcBytes() > cx->zone()->threshold.gcTriggerBytes())

because cx->zone() is nullptr... (cx->compartment() is also nullptr)

NI billm because of bug 1053999.
Flags: needinfo?(wmccloskey)
Status: UNCONFIRMED → NEW
Ever confirmed: true
We should probably also make js::ReportOverRecursed, js::ErrorToException and/or GCIfNeeded assert we entered a compartment, to make it easier to find these bugs.
Bobby, could you take a look at this? I don't understand how this error reporting stuff works. Something changed in bug 1107777 so that now we allocate an exception object when reporting this error. However, we don't have a compartment to allocate it in. What do we want to do here? Should we try to figure out the right compartment to use, or should we avoid allocating the exception at all?
Flags: needinfo?(wmccloskey) → needinfo?(bobbyholley)
Sorry for the delay on this. SetNewDocument should really be doing the recursion check before the AutoJSAPI is even instantiated, and just returning NS_ERROR_FAILURE in that case (with an NS_WARNING). We should probably abstract out the guts of JS_CHECK_RECURSION_LIMIT into MFBT or something, though we'd still have the annoying complication that the native stack limits are stored on the JSRuntime. All solvable problems.

Long story short - no reason to call js::ReportOverRecursed on the freshly-created AutoJSAPI.
Flags: needinfo?(bobbyholley)
Crash Signature: [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ] → [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ] [@ GCIfNeeded ]
The primary signature of this bug is spiking in Firefox 37.0b7, up to the #4 or even #3 position with around 2% of all crashes.
Also, those are the top URLs for this signature:
168 	http://www.dinakaran.com/
161 	https://www.facebook.com/
147 	about:newtab
138 	about:blank
129 	http://www.amarujala.com/

All others are way lower, at 35 hits or less.
This does not always look like the other signatures (all coming from js_ReportOverRecursed() ), but I found at least 1 crash that also has that function in it's stack:

bp-241a3b5f-e7ca-4194-a50f-1d21e2150326

Also, I just crashed with yet another signature:

bp-da5ed725-44c9-40d3-98cf-7e9dc2150327
Crash Signature: [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ] [@ GCIfNeeded ] → [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ] [@ GCIfNeeded ] [@ js::NewStringCopyN<int, wchar_t>(js::ExclusiveContext*, wchar_t const*, unsigned int) ] [@ js::gc::GCRuntime::tryNewTenuredThing<JSFatInlineString, int>(js::Exclusive…
Changing summary and setting reproducible keyword per bug 1137001.
Keywords: reproducible
Summary: Crash @ NewStringDeflated<int> → Multiple crash signatures in JS error reporting (js::ReportOverRecursed, js::ErrorToException)
OS: Windows 8.1 → Windows
Version: Trunk → 39 Branch
Crashed two times FF while trying to restore a session with the page "Well, this is embarrassing."
-> No SessionRestore possible ATM !!!

[@ js::gc::GCRuntime::tryNewTenuredThing<JSFatInlineString, int>(js::ExclusiveContext*, js::gc::AllocKind, unsigned __int64) ]
https://crash-stats.mozilla.com/report/index/adadb666-69b7-4941-934f-9928d2150423
https://crash-stats.mozilla.com/report/index/643575fd-3ee7-4dff-ad03-2315e2150423
Severity: critical → blocker
FYI: SessionRestore worked with 32bit.

[@ js::gc::GCRuntime::checkAllocatorState<int>(JSContext*, js::gc::AllocKind) ]
https://crash-stats.mozilla.com/report/index/4523ed18-12d6-47e7-a5e9-1fac42150424
Other signature - same problem.
Crash Signature: , int>(js::ExclusiveContext*, js::gc::AllocKind, unsigned __int64) ] → , int>(js::ExclusiveContext*, js::gc::AllocKind, unsigned __int64) ] [@ js::gc::GCRuntime::checkAllocatorState<int>(JSContext*, js::gc::AllocKind) ]
[Tracking Requested - why for this release]:
The NewStringDeflated<int> signature has high volume in 38b - currently at #17
in the "Top Crashers for Firefox 38.0b" list.
Too late for 38.* but tracking for 39.
The NewStringDeflated<int> signature, which is pretty high in Firefox 38 (and 38.0.5) data, has a huge amount of hits for http://www.amarujala.com/ - is that helpful to find out what's going on here?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #46)
> has a huge amount of hits for http://www.amarujala.com/ -
> is that helpful to find out what's going on here?

We already know what's going on here, see comments 32-36 (assuming it's the same issue).

Bobby, do you have some spare cycles to take a stab at this? Else we can find somebody else maybe.
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) from comment #36)
> SetNewDocument should really be doing the
> recursion check before the AutoJSAPI is even instantiated, and just
> returning NS_ERROR_FAILURE in that case (with an NS_WARNING). We should
> probably abstract out the guts of JS_CHECK_RECURSION_LIMIT into MFBT or
> something, though we'd still have the annoying complication that the native
> stack limits are stored on the JSRuntime. All solvable problems.

I think a simple fix is to add JS_CHECK_RECURSION_CONSERVATIVE_DONT_REPORT, a combination of JS_CHECK_RECURSION_CONSERVATIVE and JS_CHECK_RECURSION_DONT_REPORT.

I can give this a try tomorrow.
Flags: needinfo?(bobbyholley) → needinfo?(jdemooij)
It's getting late in release cycle for Beta 39, we're about to go into Beta 4. Can still take a fix this week if it is verified first on nightly.

Do we know if 40 or 41 are affected?
Flags: qe-verify?
Yes; I found some on 41.
Crash Signature: , int>(js::ExclusiveContext*, js::gc::AllocKind, unsigned __int64) ] [@ js::gc::GCRuntime::checkAllocatorState<int>(JSContext*, js::gc::AllocKind) ] → , int>(js::ExclusiveContext*, js::gc::AllocKind, unsigned __int64) ] [@ js::gc::GCRuntime::checkAllocatorState<int>(JSContext*, js::gc::AllocKind) ] [@ js::gc::GCRuntime::tryNewTenuredThing<JSFatInlineString, int>(js::ExclusiveContext*, js::gc::AllocKin…
Crash Signature: , js::gc::AllocKind, unsigned int) ] → , js::gc::AllocKind, unsigned int) ] [@ js::Allocate<JSFatInlineString, int>(js::ExclusiveContext*) ]
Attached patch PatchSplinter Review
This is basically what I described in comment 48: a new macro to check for the overrecursion that doesn't report the error. Then we can return NS_ERROR_FAILURE when that happens.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8620953 - Flags: review?(bzbarsky)
Attachment #8556006 - Attachment is obsolete: true
Comment on attachment 8620953 [details] [diff] [review]
Patch

r=me
Attachment #8620953 - Flags: review?(bzbarsky) → review+
Crash Signature: [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ] [@ GCIfNeeded ] [@ js::NewStringCopyN<int, wchar_t>(js::ExclusiveContext*, wchar_t const*, unsigned int) ] [@ js::gc::GCRuntime::tryNewTenuredThing<JSFatInlineString, int>(js::Exclusive… → [@ NewStringDeflated<int> ] [@ js::gc::CheckAllocatorState<int> ] [@ GCIfNeeded ] [@ js::NewStringCopyN<int, wchar_t>(js::ExclusiveContext*, wchar_t const*, unsigned int) ] [@ js::NewStringCopyN<T>(js::ExclusiveContext*, wchar_t const*, unsigned int) …
Comment on attachment 8620953 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/regressing bug #]: Bug 1107777 or bug 1053999.
[User impact if declined]: Topcrash.
[Describe test coverage new/current, TreeHerder]: On m-i.
[Risks and why]: Low risk.
[String/UUID change made/needed]: None.
Attachment #8620953 - Flags: approval-mozilla-beta?
Attachment #8620953 - Flags: approval-mozilla-aurora?
Waiting for this to be green on m-c before uplift to 39 and 40.
https://hg.mozilla.org/mozilla-central/rev/4b42c4d4e4a7
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment on attachment 8620953 [details] [diff] [review]
Patch

Approved for uplift to aurora and beta. May help fix a startup crash.
Attachment #8620953 - Flags: approval-mozilla-beta?
Attachment #8620953 - Flags: approval-mozilla-beta+
Attachment #8620953 - Flags: approval-mozilla-aurora?
Attachment #8620953 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.