Closed Bug 1547561 Opened 6 years ago Closed 2 years ago

Crash in [@ js::frontend::RewritingParseNodeVisitor<T>::visit]

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox67

wontfix

firefox68

---

wontfix

firefox69

---

wontfix

firefox70

---

wontfix

firefox71

---

wontfix

People

(Reporter: jseward, Assigned: jorendorff)

References

(Blocks 1 open bug)

Details

(Keywords: crash, csectype-wildptr, sec-high)

Crash Data

Attachments

(3 files)

Bug 1547561 - Part 1: Change integer values used by ParseNodeKind, for extra assertiness. r?khyperia 6 years ago Jason Orendorff [:jorendorff] 47 bytes, text/x-phabricator-request	abillings : sec-approval+	Details \| Review
Bug 1547561 - Part 2: Check parse node kinds even in some release builds. r?khyperia 6 years ago Jason Orendorff [:jorendorff] 47 bytes, text/x-phabricator-request	abillings : sec-approval+	Details \| Review
Bug 1547561 - Part 3: Additional CheckParseTree pass, enabled in debug builds only. r?khyperia 6 years ago Jason Orendorff [:jorendorff] 47 bytes, text/x-phabricator-request	abillings : sec-approval+	Details \| Review

Julian Seward [:jseward]

Reporter

Description

•

6 years ago

This bug is for crash report bp-2eb348c9-c737-4cd1-ae99-94cfc0190426.

This appears at least 3 times in at least 2 different installations
in the Windows nightly 20190425094625.

Top 10 frames of crashing thread:

0 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h:108
1 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h:112
2 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h
3 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h:112
4 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h
5 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h:112
6 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h
7 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h:112
8 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h
9 xul.dll js::frontend::RewritingParseNodeVisitor<FoldVisitor>::visit js/src/frontend/ParseNodeVisitor.h:112

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Type: task → defect

Julian Seward [:jseward]

Reporter

Updated

•

6 years ago

Flags: needinfo?(khyperia)

Calixte Denizet (:calixte)

Comment 1

•

6 years ago

[Tracking Requested - why for this release]: this a startup crash in beta 67

status-firefox67: --- → affected

status-firefox68: --- → affected

tracking-firefox67: --- → ?

Pascal Chevrel:pascalc

Updated

•

6 years ago

tracking-firefox67: ? → +

OS: Windows 10 → All

Pascal Chevrel:pascalc

Updated

•

6 years ago

Priority: -- → P1

Pascal Chevrel:pascalc

Comment 2

•

6 years ago

Steven, this is a startup crash affecting 67, can it be investigated rapidly please? Thanks

Flags: needinfo?(sdetar)

Ashley Hauck [:khyperia]

Comment 3

•

6 years ago

Sorry for the lack of response, I am investigating this, I just forgot to respond here.

Steven DeTar [:sdetar]

Comment 4

•

6 years ago

Jason, is this something you could look at?

Flags: needinfo?(sdetar) → needinfo?(jorendorff)

Steven DeTar [:sdetar]

Comment 5

•

6 years ago

Since Ashley is working on this I removing the needinfo for Jason

Flags: needinfo?(jorendorff)

Jason Orendorff [:jorendorff]

Assignee

Comment 6

•

6 years ago

Ashley and I looked at these reports.

Ashley noticed that the reports with Reason=EXCEPTION_BREAKPOINT have "MOZ_CRASH(invalid node kind)".

This is a similar case to the GC crashes we see pretty regularly: RewritingParseNodeVisitor<FoldVisitor> is the first thing that walks the whole AST after parsing. Therefore, if the user has some bad RAM, or if something in another thread scribbles on some memory and happens to hit the AST, this is the code that will fail, every time.

Two possible actions.

Change these assertions to MOZ_RELEASE_ASSERT.

This should not affect performance appreciably, because the assertion should be optimized away in cases where that constructor is sufficiently inlined (maybe all call sites?).

This asserts that the node kind is valid. Note that the pn_type field is const. If we make this change, but still get crashes with "MOZ_CRASH(invalid node kind)", then it's not a JS engine bug.
Change the definition of ParseNodeKind so that 0 is an invalid value. If crashes increase in frequency, or we get more "MOZ_CRASH(invalid node kind)" hits and fewer other wildptr crashes, then it's not a JS engine bug.

(Also possible: some scheme where we isolate the frontend to its own process, separate from the rest of ... yeah.)

Flags: needinfo?(khyperia)

Jason Orendorff [:jorendorff]

Assignee

Comment 7

•

6 years ago

I don't think this is a regression. This is the new signature for bug 1274887. These crashes used to happen in Fold; we rewrote that code to use a Visitor class, and now the Visitor class is blamed. Note the crash volume went down there when it started here.

The fact that we did a major refactor and still have crashes is consistent with these crashes not being a JS engine bug.

This is only barely actionable, and the volume of crashes looks low. pascalc, is this really a top priority for us?

Flags: needinfo?(pascalc)

Randell Jesup [:jesup] (needinfo me)

Comment 9

•

6 years ago

About 40-50% of the crashes are wildptrs, everyone I looked at on line 108 (pn->getKind()), which implies that pn is corrupt in a bunch of cases.

Group: core-security

Keywords: csectype-wildptr, sec-high

Pascal Chevrel:pascalc

Comment 10

•

6 years ago

Volume went down, we are not going to fix it in 67 and per comment #7 this is a signature change for a crash that we have had for years. Wontfix 67.

status-firefox67: affected → wontfix

tracking-firefox67: + → -

Flags: needinfo?(pascalc)

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security → javascript-core-security

Jason Orendorff [:jorendorff]

Assignee

Comment 11

•

6 years ago

Attached file Bug 1547561 - Part 1: Change integer values used by ParseNodeKind, for extra assertiness. r?khyperia — Details

Jason Orendorff [:jorendorff]

Assignee

Comment 12

•

6 years ago

Attached file Bug 1547561 - Part 2: Check parse node kinds even in some release builds. r?khyperia — Details

Depends on D32243

Jason Orendorff [:jorendorff]

Assignee

Comment 13

•

6 years ago

Attached file Bug 1547561 - Part 3: Additional CheckParseTree pass, enabled in debug builds only. r?khyperia — Details

This is an experiment. We'll see if the fuzzers find anything. If nothing else,
it's good to have the invariants of the ParseNode data structure documented in
code, as assertions, separate from the parser and emitter code.

We can add more checks to this as we go. It may not be fast enough to leave on
in release builds, even in Nightly, and I don't want to push my luck. So
DEBUG-only for now.

Depends on D32244

Jason Orendorff [:jorendorff]

Assignee

Comment 14

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=99dcec78e8eb4fd8f59f70834897b9ff7bd322b8

Jason Orendorff [:jorendorff]

Assignee

Comment 15

•

6 years ago

Landing the diagnostic patches (see comment 6) but these are not expected to fix the crashes.

Keywords: leave-open

Jason Orendorff [:jorendorff]

Assignee

Comment 16

•

6 years ago

Comment on attachment 9066901 [details]
Bug 1547561 - Part 1: Change integer values used by ParseNodeKind, for extra assertiness. r?khyperia

Security Approval Request

How easily could an exploit be constructed based on the patch?: Extraordinarily hard. We don't know what the bug is. These are diagnostic patches.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
Which older supported branches are affected by this flaw?: all
If not all supported branches, which bug introduced the flaw?: None
Do you have backports for the affected branches?: No
If not, how different, hard to create, and risky will they be?: no backports
How likely is this patch to cause regressions; how much testing does it need?: no backports

Attachment #9066901 - Flags: sec-approval?

Jason Orendorff [:jorendorff]

Assignee

Updated

•

6 years ago

Attachment #9066902 - Flags: sec-approval?

Attachment #9066903 - Flags: sec-approval?

Jason Orendorff [:jorendorff]

Assignee

Updated

•

6 years ago

Assignee: nobody → jorendorff

Al Billings [:abillings - ex-MoCo]

Comment 17

•

6 years ago

Comment on attachment 9066901 [details]
Bug 1547561 - Part 1: Change integer values used by ParseNodeKind, for extra assertiness. r?khyperia

sec-approval+ for trunk.

Attachment #9066901 - Flags: sec-approval? → sec-approval+

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Attachment #9066902 - Flags: sec-approval? → sec-approval+

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Attachment #9066903 - Flags: sec-approval? → sec-approval+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 18

•

6 years ago

https://hg.mozilla.org/integration/autoland/rev/e391e47e6fa43709b8575f200bd9698074b41b03
https://hg.mozilla.org/integration/autoland/rev/e1e95251ab35c6ca222e64593a8fcf0c2738e38e
https://hg.mozilla.org/integration/autoland/rev/6f6ebd68ccad4b93cb2238b029ab2476577d0128
https://hg.mozilla.org/mozilla-central/rev/e391e47e6fa4
https://hg.mozilla.org/mozilla-central/rev/e1e95251ab35
https://hg.mozilla.org/mozilla-central/rev/6f6ebd68ccad

Jason Orendorff [:jorendorff]

Assignee

Comment 19

•

6 years ago

The situation is: I implemented both options 1 and 2 from comment 6. Now we have to wait for Firefox 69 to go to beta and see what the crash report data says. This isn't really actionable until then.

Ted Campbell [:tcampbell]

Comment 20

•

6 years ago

FF69 has now hit beta and we are getting crash data.

In [1], I summarize all the beta crashes for JS_PARSE_NODE_ASSERT crash. The only thing that is crashing is the ParseNode::getKind() accessor. The crashes are split between half being above limit and half below the start (1000).

[1] https://crash-stats.mozilla.com/search/?moz_crash_reason=~parse%20node&moz_crash_reason=~ParseNodeKind&release_channel=beta&product=Firefox&date=%3E%3D2019-07-11T17%3A22%3A00.000Z&date=%3C2019-07-25T17%3A22%3A00.000Z&_facets=signature&_facets=moz_crash_reason&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-moz_crash_reason

Liz Henry (:lizzard) (relman/hg->git project)

Comment 21

•

6 years ago

Can you take another look now that there is more data? Anything useful in there?
I'm going through some untouched-for-a-while sec-high issues to make sure we've investigated all we can before marking them stalled.

Flags: needinfo?(jorendorff)

Ashley Hauck [:khyperia]

Comment 22

•

6 years ago

This looks very likely to be a memory bug/stomping issue.

As Ted said, it looks like the new crashes are crashing here: https://searchfox.org/mozilla-central/rev/e5327b05c822cdac24e233afa37d72c0552dbbaf/js/src/frontend/ParseNode.h#655-656

Plainly visible, however, is the asserts in the constructor of ParseNode, just above that line in the source. Those identical asserts are not firing, and the field is const, meaning the value cannot change between the constructor and us checking it in getKind. So, because we are crashing in getKind, that means that the const value somehow changed - i.e. some issue with memory stomping, or some other form of memory corruption, not a logic bug.

I believe this is what Jason thought was pretty likely to happen with these diagnostics, and said earlier in the thread that this implies that this is not a JS engine bug, but rather some other thread scribbling on our data, or some other crazy thing happening. I'll leave that up to someone else to strongly conclude, however.

Jason Orendorff [:jorendorff]

Assignee

Comment 23

•

6 years ago

OK. The wild write happens either (a) in the parser thread itself; or (b) in another thread.

If it's (a), then I would expect the bug to happen during some particular phase of parsing, and the crash to happen during the next phase of parsing. But we have crash reports from several phases.

If it's (b), then why does the wild write happen to land in the ParseNode arena? If it's pure coincidence, these crashes would be vanishingly rare on 64-bit, which they aren't. So what is it? Possibly a use-after-free, since LifoAlloc slabs are allocated using jemalloc.

So as an experiment, we could try changing LifoAlloc to allocate slabs using something like js::gc::MapAlignedPagesRandom. This would make (b) incredibly unlikely on 64-bit platforms, and seems like a prudent move anyway (for the same reasons as other users of that function).

Ted Campbell [:tcampbell]

Comment 24

•

6 years ago

I might suggest that if we try this that we limit it to the JSContext::tempLifoAlloc to begin with since it is already long-lived. That said, our typical LifoAlloc chunks are multiple 4kB pages so any jemalloc benefits might not apply.

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

status-firefox68: affected → wontfix

status-firefox69: --- → wontfix

status-firefox70: --- → fix-optional

status-firefox71: --- → affected

Jason Orendorff [:jorendorff]

Assignee

Updated

•

6 years ago

Assignee: jorendorff → nobody

Flags: needinfo?(jorendorff)

Andrew Overholt [:overholt]

Comment 25

•

6 years ago

Is this a candidate for being marked stalled?

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Comment 26

•

6 years ago

Nicolas, do you think this is something you take a quick look at and help us to identify if this is still actionable or stalled?

Flags: needinfo?(sdetar) → needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Comment 27

•

6 years ago

What is described in comment 6 looks like this crashes are similar to Bug 1263794, such as crashing while reading data allocated within a LifoAlloc chunks. However, one of the symptoms Bug 1263794 is the fact that we have plenty of jemalloc poison values in the addresses, which is not the case here.

The crash addresses are reporting -1 (~11%) and 0 (~6%) at being the most frequent addresses. These might be hint of logical issues.

The crash addresses are also reporting pointer reads where the pointers are null values with a single bit set. This is a huge hint toward memory corruption, and as such likely to be outside the JS engine, as concluded in Bug 1263794 and in comment 22.

Using the GC allocation functions as suggested comment 23, or ExecutableMemoryAllocator pre-reserved space, would help figure out the corruption case, but would not change anything in case of bad memory. This would not help as only 2 crashes look like jemalloc poison values over the past 6 months.

However, many of the attempt to investigate LifoAlloc corruptions failed until now (see Bug 1263794), highly hinting towards bad memory.

I think it ok to consider this bug as stalled, but we should try to change the backend memory allocator of LifoAlloc to investigate the evolution of Bug 1263794 crashes.

Flags: needinfo?(nicolas.b.pierron)

Keywords: stalled

BugBot [:suhaib / :marco/ :calixte]

Comment 28

•

5 years ago

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)

Nicolas B. Pierron [:nbp]

Comment 29

•

5 years ago

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #28)

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

leave-open is present because patches were landed to investigate the issue without fixing it.
stalled is present to indicate that no activity is expected unless new information is added to this bug.

I think we have the right set of keywords, and I do not think this stalled bug can be closed.

Flags: needinfo?(sdetar) → needinfo?(cdenizet)

Calixte Denizet (:calixte)

Comment 30

•

5 years ago

:nbp, you're right and it's fixed now: https://github.com/mozilla/relman-auto-nag/commit/6b049a694295d598aa26c58daed45d6106e074e0

Flags: needinfo?(cdenizet)

Nicolas B. Pierron [:nbp]

Updated

•

4 years ago

Blocks: sm-defects-crashes

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox70: fix-optional → wontfix

status-firefox71: affected → wontfix

BugBot [:suhaib / :marco/ :calixte]

Comment 31

•

3 years ago

The bug is linked to a topcrash signature, which matches the following criteria:

Top 20 desktop browser crashes on release (startup)
Top 10 content process crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash, topcrash-startup

BMO Automation

Updated

•

3 years ago

Severity: critical → S2

BugBot [:suhaib / :marco/ :calixte]

Comment 32

•

2 years ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash-startup

BugBot [:suhaib / :marco/ :calixte]

Comment 33

•

2 years ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Ted Campbell [:tcampbell]

Updated

•

2 years ago

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → INCOMPLETE

Ted Campbell [:tcampbell]

Updated

•

2 years ago

Resolution: INCOMPLETE → FIXED

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Assignee: nobody → jorendorff

BugBot (nomail) [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Keywords: leave-open

BugBot (nomail) [:suhaib / :marco/ :calixte]

Comment 34

•

2 years ago

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

Group: javascript-core-security → core-security-release

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Group: core-security-release

Nicolas B. Pierron [:nbp]

Updated

•

9 months ago