Closed Bug 868369 Opened 11 years ago Closed 11 years ago

crash in MarkInternal

Categories

(Core :: JavaScript Engine, defect)

21 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla23
Tracking Status
firefox20 --- unaffected
firefox21 + fixed
firefox22 --- fixed
firefox23 --- fixed

People

(Reporter: scoobidiver, Assigned: till, NeedInfo)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [qa-])

Crash Data

Attachments

(1 file)

It started spiking in 21.0b6 (not correlated to AMD Radeon GPUs :-)) and is currently #6 top browser crasher. The regression range is:
http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=dc33b3fb2fb0&tochange=c1453860aef9
It might be caused by bug 851788 (specific backout of bug 784294 for Beta).
Crashes in this bug are worse than those fixed by that bug.

The PushMarkStack signature I associated to bug 719114 might be also related to this bug.

Stack traces are various:
Frame 	Module 	Signature 	Source
0 	mozjs.dll 	MarkInternal<js::PropertyName> 	js/src/gc/Marking.cpp:170
1 	mozjs.dll 	JSFunction::trace 	js/src/jsfun.cpp:524
2 	mozjs.dll 	fun_trace 	js/src/jsfun.cpp:537
3 	mozjs.dll 	js::GCMarker::processMarkStackTop 	js/src/gc/Marking.cpp:1373
4 	mozjs.dll 	js::GCMarker::drainMarkStack 	js/src/gc/Marking.cpp:1426
...

Frame 	Module 	Signature 	Source
0 	mozjs.dll 	MarkInternal<js::PropertyName> 	js/src/gc/Marking.cpp:170
1 	mozjs.dll 	JSScript::markChildren 	js/src/jsscript.cpp:2742
2 	mozjs.dll 	PushMarkStack 	js/src/gc/Marking.cpp:759
3 	mozjs.dll 	MarkInternal<JSScript> 	js/src/gc/Marking.cpp:171
4 	mozjs.dll 	JSFunction::trace 	js/src/jsfun.cpp:528
5 	mozjs.dll 	fun_trace 	js/src/jsfun.cpp:537
6 	mozjs.dll 	js::GCMarker::processMarkStackTop 	js/src/gc/Marking.cpp:1373
7 	mozjs.dll 	js::GCMarker::drainMarkStack 	js/src/gc/Marking.cpp:1426
...

Frame 	Module 	Signature 	Source
0 	mozjs.dll 	MarkInternal<js::PropertyName> 	js/src/gc/Marking.cpp:170
1 	mozjs.dll 	js::Bindings::trace 	js/src/jsscript.cpp:264
2 	mozjs.dll 	JSScript::markChildren 	js/src/jsscript.cpp:2773
3 	mozjs.dll 	PushMarkStack 	js/src/gc/Marking.cpp:759
4 	mozjs.dll 	MarkInternal<JSScript> 	js/src/gc/Marking.cpp:171
5 	mozjs.dll 	JSFunction::trace 	js/src/jsfun.cpp:528
6 	mozjs.dll 	fun_trace 	js/src/jsfun.cpp:537
7 	mozjs.dll 	js::GCMarker::processMarkStackTop 	js/src/gc/Marking.cpp:1373
8 	mozjs.dll 	js::GCMarker::drainMarkStack 	js/src/gc/Marking.cpp:1426
...

Frame 	Module 	Signature 	Source
0 	mozjs.dll 	MarkInternal<js::PropertyName> 	js/src/gc/Marking.cpp:170
1 	mozjs.dll 	JSScript::markChildren 	js/src/jsscript.cpp:2742
2 	mozjs.dll 	PushArenaTyped<JSScript> 	js/src/gc/Marking.cpp:1099
3 	mozjs.dll 	js::gc::PushArena 	js/src/gc/Marking.cpp:1115
4 	mozjs.dll 	js::GCMarker::markDelayedChildren 	js/src/jsgc.cpp:1785
5 	mozjs.dll 	js::GCMarker::markDelayedChildren 	js/src/jsgc.cpp:1816
6 	mozjs.dll 	js::GCMarker::drainMarkStack 	js/src/gc/Marking.cpp:1441
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=MarkInternal%3Cjs%3A%3APropertyName%3E
More reports also at:
https://crash-stats.mozilla.com/report/list?signature=PushMarkStack
https://crash-stats.mozilla.com/report/list?signature=MarkInternal%3Cjs%3A%3AArgumentsObject%3E
Crash Signature: [@ MarkInternal<js::PropertyName>] → [@ MarkInternal<js::PropertyName>] [@ MarkInternal<js::ArgumentsObject>] [@ PushMarkStack]
Blocks: 851788
Passing this onto Till & Ccing :Waldo to get the investigation started as he helped with the backout.Hey guys, can you please help understand if the backout could be related here based on the crash stack in the description and what the next best steps are ?
Assignee: general → tschneidereit
I'm skeptical that this is a real problem. In FF23, processMarkStack went down a lot as a topcrash, while MarkInternal went up. This seems like it's probably just a change in inlining. I'll try to ping someone today about the progress of bug 803209, since that would answer these sorts of questions.
(In reply to Bill McCloskey (:billm) from comment #3)
> I'm skeptical that this is a real problem.

Are you suggesting that this is a signature swap in FF21? Or that this will be resolved in FF23 and we should deprioritize it because of that? Just want to understand what you mean by this not being a real problem.
Oh, sorry, I misread the initial comment and didn't realize it was a problem in 21. processMarkStackTop is still very high on 21, so my explanation doesn't make sense there. Please disregard comment 3.
I looked into this issue quite a bit and, unfortunately, can't see any reasons why the backout in bug 851788 should cause crashes of this (or any other) type. 

Additionally, nothing's really changed in the other builtin Array methods since the last backout of the self-hosted Array extras. Specifically, I'd expect at least one of array_map, array_sort and array_filter to cause similar issues to those that the backout seems to have caused.

Quite honestly, I don't have a good idea for how to continue here. Maybe fixing the issues with JSD is the less dangerous path at this point? I'll work on debugging this and the JSD issues over the weekend. Hopefully, one of those works out.
I don't have any better ideas, sadly.  :-(  That makes as much sense as any other tack here, I guess.
Till,Jeff : Thanks for the analysis here. 

Once we have some more data we should compare the crash-volume for this bug vs Bug 851788 and try to take the less crashier one for our final beta if not resolved.
That may involve relanding self-hosted code or taking the best path fwd based on Till's findings over the weekend..

I am not seeing anything else in the changeset that could have regressed this in beta 6 , please do have a look http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=dc33b3fb2fb0&tochange=c1453860aef9 and see if there is anything else that may be striking here & I may have missed.

If you have any tips/testcase around reproducing this which could act as pointers to QA based on the crash stack we can get QA help here.
In order to know whether it's still bug 719114 or a regression, I compared one day of stats:
                                             21.0b5                 21.0b6
(Empty                                         27%                   22.7%)
js::GCMarker::processMarkStackTop             6.6%
PushMarkStack                                                         3.2%
js::GCMarker::processMarkStackTop                                     2.5%
MarkInternal<js::PropertyName>                                        1.6%
MarkInternal<js::ArgumentsObject>                                     0.9%
OVERALL                                       6.6%                    8.2%

PushMarkStack belongs likely to bug 719114 but I think MarkInternal signatures are a regression. Nevertheless, it's still too soon to conclude as empty signatures don't match.
(In reply to Till Schneidereit [:till] from comment #8)
> I looked into this issue quite a bit and, unfortunately, can't see any
> reasons why the backout in bug 851788 should cause crashes of this (or any
> other) type. 
> 
> Additionally, nothing's really changed in the other builtin Array methods
> since the last backout of the self-hosted Array extras. Specifically, I'd
> expect at least one of array_map, array_sort and array_filter to cause
> similar issues to those that the backout seems to have caused.
> 
> Quite honestly, I don't have a good idea for how to continue here. Maybe
> fixing the issues with JSD is the less dangerous path at this point? I'll
> work on debugging this and the JSD issues over the weekend. Hopefully, one
> of those works out.

Hey Till, was there any fwd fix options here for these specific signatures which rose in  beta 6? Else do you think re-landing of self hosted array code  may have any impact here & recommend taking that path forward? Given this is our final beta we really want to have the lowest risk/safe option here .My understanding is we will still have 851788 but looks like the crash volume for this spike is significantly higher than that,hence the alternative approach.
I compared the volume of bug 719114, bug 761081 and this bug that all contain js::GCMarker::processMarkStackTop in the stack trace between Beta 5 and 6:

                                                         21.0b5         21.0b6
(Empty                                                   17.46%         15.96%)
js::GCMarker::processMarkStackTop(js::SliceBudget&)       4.22%          1.59%
PushMarkStack                                                            1.99%
JSScript::markChildren(JSTracer*)                         1.02%          0.14%
MarkInternal<js::PropertyName>                                           0.95%
ScanRope                                                  0.71%          0.69%
ScanShape                                                                0.49%
MarkInternal<js::ArgumentsObject>                                        0.49%
ScanTypeObject                                                           0.44%
ScanLinearString                                                         0.20%
js::gc::Mark<JSAtom>                                      0.19%
MarkRange<JSObject>                                                      0.14%
ScanBaseShape                                                            0.08%
MarkInternal<js::Shape>                                                  0.07%
MarkUnbarriered<JSScript>                                 0.05%
OVERALL                                                   6.19%          7.27%

It's one point higher in Beta 6 but empty crashes are 1.5 point lower so the regression is not obvious. In addition, when Beta reaches 1.5M ADU, Beta 5 has a crash rate of 1.97% and Beta 6, 1.86%.
See also bug 851934 comment 50 for the different breakdown of JS signatures.
No longer blocks: 851788
(In reply to Scoobidiver from comment #13)
> I compared the volume of bug 719114, bug 761081 and this bug that all
> contain js::GCMarker::processMarkStackTop in the stack trace between Beta 5
> and 6:

Thanks for this analysis. One theory for a causal relationship might be that the actual regression lies somewhere else, but that an increase in crashes is caused by backing out the self-hosted code nevertheless. This might be because, in addition to other code, the native implementations of the Array extras trigger the crash, whereas the self-hosted versions did not.

This would, again, speak for re-landing the self-hosted Array extras, ideally with a fix for bug 851788. I do have a potential fix for that which I'm currently testing and will put up for review and, ideally, quick landing in a few hours.

(In reply to bhavana bajaj [:bajaj] from comment #12)
> Hey Till, was there any fwd fix options here for these specific signatures
> which rose in  beta 6? Else do you think re-landing of self hosted array
> code  may have any impact here & recommend taking that path forward? Given
> this is our final beta we really want to have the lowest risk/safe option
> here .My understanding is we will still have 851788 but looks like the crash
> volume for this spike is significantly higher than that,hence the
> alternative approach.

See the above analysis: I agree with the assessment that we'd be better of with the crashes in bug 851788 than the spike here, and have a theory that might explain how it is caused.
Pretty straight-forward and I should've added this ages ago.
Attachment #745885 - Flags: review?(jimb)
Comment on attachment 745885 [details] [diff] [review]
don't ever create JSDScripts for self-hosted scripts.

Review of attachment 745885 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this totally seems like a good idea. _newJSDScript callers accept failure, so this should be fine.
Attachment #745885 - Flags: review?(jimb) → review+
Comment on attachment 745885 [details] [diff] [review]
don't ever create JSDScripts for self-hosted scripts.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9418ada85c60

[Approval Request Comment]
Bug caused by (feature/regressing bug #): self-hosted JS
User impact if declined: increased crash-rate during debugging with Firebug (note that the fix involves re-landing the self-hosted Array extras, too)
Testing completed (on m-c, etc.): no, sadly
Risk to taking this patch (and alternatives if risky): extremely low. I checked all code paths and verified that no assumptions are broken by the early return this patch causes.
String or IDL/UUID changes made by this patch: none
Attachment #745885 - Flags: approval-mozilla-beta?
Attachment #745885 - Flags: approval-mozilla-aurora?
(In reply to Till Schneidereit [:till] from comment #17)
> Comment on attachment 745885 [details] [diff] [review]
> don't ever create JSDScripts for self-hosted scripts.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/9418ada85c60
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): self-hosted JS
> User impact if declined: increased crash-rate during debugging with Firebug
> (note that the fix involves re-landing the self-hosted Array extras, too)
> Testing completed (on m-c, etc.): no, sadly
> Risk to taking this patch (and alternatives if risky): extremely low. I
> checked all code paths and verified that no assumptions are broken by the
> early return this patch causes.
> String or IDL/UUID changes made by this patch: none

This looks good to land on mozilla-aurora/beta.The patch here deals with Bug 851788 and obviates the need for the attachment in that bug.Please make sure to do the the needed backout of the patch " prevent jsd_SetExecutionHook from operating on self-hosted functions " in bug 851788  which is currently on aurora(FX22)/trunk(Fx23) to make sure we have similar code on all branches.

We should also note that the self hosted code will be re-landing on mozilla-beta along with the attachment here.

We should keep a close-eye on these JS signatures or anything else that may have come-up for our final beta build.
Comment on attachment 745885 [details] [diff] [review]
don't ever create JSDScripts for self-hosted scripts.

Till has done basic browser testing along with debugging with Firefug before posting the patch here.
It is understood that QA does not have reliable STR here to test this bug & :Till will be helping out with as much testing as possible in the next few days.

If there is any way QA can help here and have any pointers here please let us know, we are happy to request help here.
Attachment #745885 - Flags: approval-mozilla-beta?
Attachment #745885 - Flags: approval-mozilla-beta+
Attachment #745885 - Flags: approval-mozilla-aurora?
Attachment #745885 - Flags: approval-mozilla-aurora+
And a follow-up uplifting a utility function needed to make this actually compile ...
.. and finally, let's also fully align inbound with the changes on Aurora and Beta: https://hg.mozilla.org/integration/mozilla-inbound/rev/c99512985ee0
OS: Windows 7 → All
Hardware: x86 → All
Depends on: 784294
Target Milestone: --- → mozilla23
FWIW, the more I look at crash data for 21.0b6, it looks to me like those changes with that backout actually only did spread crashes that previously all came in with a js::GCMarker::processMarkStackTop(js::SliceBudget&) signature to a whole range of different signatures (as processMarkStackTop is significantly down in stats). That would also explain why overall crash rates remained good.
Thanks for that info. I guess that that kinda makes sense in that the allocation patterns change with self-hosting, while the total amount of allocated data should roughly stay the same. Even more hints at this being an issue somewhere else.
Blocks: 698296
Maybe I'm reading the data wrong so can someone please confirm that this is fixed?

> https://crash-stats.mozilla.com/report/list?signature=MarkInternal%3Cjs%3A%3AArgumentsObject%3E
Shows 555 crashes with Firefox 23.0b3
> https://crash-stats.mozilla.com/report/list?signature=PushMarkStack
Shows 2529 crashes with Firefox 23.0b3
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #28)
> Maybe I'm reading the data wrong so can someone please confirm that this is
> fixed?
GC crash signatures swap regularly. This bug tracked a suspicious swap (potential higher volume) in 21.0b6. It was fixed in 21.0b7 and you can't verify it.
Thanks Scoobidiver.
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: