Closed Bug 1658719 Opened 4 years ago Closed 2 years ago

Crash in [@ nsTArray_Impl<T>::AppendElementsInternal<T> | nsHtml5TreeBuilder::startTag]

Categories

(Core :: DOM: HTML Parser, defect, P2)

Unspecified
All
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: sg, Unassigned)

References

Details

(Keywords: crash, csectype-uaf, sec-high)

Crash Data

This bug is for crash report bp-f39e68b3-6465-406a-9257-3be770200812.

Top 10 frames of crashing thread:

0 XUL nsHtml5TreeOperation* nsTArray_Impl<nsHtml5TreeOperation, nsTArrayInfallibleAllocator>::AppendElementsInternal<nsTArrayFallibleAllocator> xpcom/ds/nsTArray.h:1727
1 XUL nsHtml5TreeBuilder::startTag parser/html/nsHtml5TreeBuilder.cpp:1235
2 XUL nsHtml5Portability::newStringFromBuffer parser/html/nsHtml5Portability.cpp:38
3 XUL nsHtml5Portability::newLocalNameFromBuffer parser/html/nsHtml5Portability.cpp:15
4 XUL int nsHtml5Tokenizer::stateLoop<nsHtml5SilentPolicy> parser/html/nsHtml5Tokenizer.cpp
5 XUL nsHtml5Tokenizer::tokenizeBuffer parser/html/nsHtml5Tokenizer.cpp:443
6 XUL XUL@0xf569ef 
7 XUL nsHtml5StreamParser::ParseAvailableData parser/html/nsHtml5StreamParser.cpp:1730
8 XUL XUL@0xf569ef 
9 XUL nsHtml5StreamParser::DoDataAvailable parser/html/nsHtml5StreamParser.cpp:1429

Looks like a UAF on the nsHtml5TreeBuilder instance to me. The actual call to AppendElement probablye comes from https://searchfox.org/mozilla-central/rev/8df04ff073d0fa70f692935c5dcddc0e23cb0117/parser/html/nsHtml5TreeBuilderCppSupplement.h#612

On platforms other than Mac OS X, there are different crashes in nsHtml5TreeBuilder::startTag, not 100% sure if that's the same issue, but likely. With VS, I see two inline frames at the top of the stack:

[Inline Frame] xul.dll!autoJArray<nsHtml5StackNode *,int>::operator[](const int index) Line 102
	at /builds/worker/checkouts/gecko/parser/html/jArray.h(102)
[Inline Frame] xul.dll!nsHtml5TreeBuilder::isInForeign() Line 4426
	at /builds/worker/checkouts/gecko/parser/html/nsHtml5TreeBuilder.cpp(4426)
xul.dll!nsHtml5TreeBuilder::startTag(nsHtml5ElementName * elementName, nsHtml5HtmlAttributes * attributes, bool selfClosing) Line 694
	at /builds/worker/checkouts/gecko/parser/html/nsHtml5TreeBuilder.cpp(694)
Group: core-security → dom-core-security

There is at least one intermittent failure that looks similar, bug 1635758. It hasn't happened in a while, though.

For the half dozen crashes I looked at, they were all happening off the main thread which seems sketchy. It does look like we have a separate HTML parser thread, but it doesn't look like the crashes are happening on that thread. For instance, in comment 0, the crash is happening on thread 26. The threads aren't labelled with their name (which seems bad, I'll look at some other crashes and file an issue on it) but assuming the thread mapping is accurate, if you look at the metadata tab, it looks like the HTML parser thread is thread 35. Thread 26 is... ProfilerChild? The 2 other crashes I checked for this were the same... the crash was on the ProfileChild thread instead of the HTML parser thread. That seems bad. But maybe there's just something weird going on with thread names and that's a red herring.

I looked on crash stats for other signatures containing nsTArray_Impl<T>::AppendElementsInternal<T> and I found a few others that looked related to the HTML parser. I looked at a few of these, and they look similar, they are also on OSX:
[@ nsTArray_Impl<T>::AppendElementsInternal<T> | nsHtml5TreeBuilder::createElement]
[@ nsTArray_Impl<T>::AppendElementsInternal<T> | nsHtml5TreeBuilder::endTag ]
[@ nsTArray_Impl<T>::AppendElementsInternal<T> | nsHtml5TreeBuilder::appendElement ]

Similarly, search for nsHtml5TreeBuilder turns up a number of signatures that look like they might be the non-OSX versions of these crashes.
[@ nsHtml5TreeBuilder::startTag ], [@ nsHtml5TreeBuilder::createElement], etc. This startTag crash is from 68 and on Android, but it is definitely on a poison value: bp-9f39dc85-f296-4b71-8fd7-dea9a0200805 For that crash, the thread is labelled as the HTML parser thread.

Henri, could you take a look? Thanks.

Flags: needinfo?(hsivonen)

(In reply to Andrew McCreight [:mccr8] from comment #1)

For the half dozen crashes I looked at, they were all happening off the main thread which seems sketchy. It does look like we have a separate HTML parser thread, but it doesn't look like the crashes are happening on that thread.

We've seen another bug where the thread attribution makes no sense. (IIRC, the other one supposedly had the HTML parser running on a Stylo worker thread.)

For instance, in comment 0, the crash is happening on thread 26. The threads aren't labelled with their name (which seems bad, I'll look at some other crashes and file an issue on it) but assuming the thread mapping is accurate, if you look at the metadata tab, it looks like the HTML parser thread is thread 35. Thread 26 is... ProfilerChild? The 2 other crashes I checked for this were the same... the crash was on the ProfileChild thread instead of the HTML parser thread. That seems bad. But maybe there's just something weird going on with thread names and that's a red herring.

Additionally, all the macOS stacks that I looked at are bogus. They show a credible theme of Necko calling into the HTML parser and then stuff happening in the HTML parser, but the stack inside the HTML parser is impossible:

How reliably is the stack scan? With that kind of bogus results, how much can we trust the rest of the stack?

This startTag crash is from 68 and on Android, but it is definitely on a poison value: bp-9f39dc85-f296-4b71-8fd7-dea9a0200805 For that crash, the thread is labelled as the HTML parser thread.

This one looks real.

A substantial proportion of the macOS crashes has 0xffffffff as the high half of the crash address. Surely that's not OK. Is it a known symptom of something?

These stacks in themselves don't appear to point to where the bug is. If these stacks are even remotely correct, a UAF of the tokenizer and the tree builder seems like a good guess. That doesn't explain what frees them too early, though.

Redirecting needinfo for the earlier question of what to make of the stack scan showing impossible stacks. Benign artifact of inlining or the stack being completely bogus?

Flags: needinfo?(hsivonen) → needinfo?(continuation)

Another function for which there are several signatures which look related is nsHtml5TreeBuilder::appendToCurrentNodeAndPushElementMayFoster.

(In reply to Henri Sivonen (:hsivonen) from comment #2)

How reliably is the stack scan? With that kind of bogus results, how much can we trust the rest of the stack?

I don't know. I think Gabriele is the expert on that, though it looks like he's on PTO right now.

Flags: needinfo?(continuation) → needinfo?(gsvelto)

Stack scanning is not very reliable; if you find bogus frames it usually means it went off on a tangent following pointers that "appear" to be stack frames but aren't. That being said when stack scanning is used the important question is usually why the stack walker had to use it instead of CFI. Very often the answer is that the stack has been smashed somehow and the previous frame is corrupted; so it tries using CFI, doesn't find anything that looks like a valid frame and falls back to stack scanning. It could also be that the CFI information is wrong, but this isn't very common.

Flags: needinfo?(gsvelto)

The severity field is not set for this bug.
:jstutte, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)
Severity: -- → S2
Flags: needinfo?(jstutte)
Priority: -- → P2

This crash is almost entirely Mac-specific, and we moved users of old macOS versions to ESR. We process all ESR crash reports but only a sample of regular release channel crash reports. Chances are that the increase in crash volume isn't about the crash rate actually increasing but about the crash report processing proportion increasing.

Marking stalled due to the lack of actionable clues.

Keywords: stalled

Indeed, the table at the top shows most of the crashes on ESR on Mac.

Most of the Windows crashes attributed to this bug look very different from the Mac crash. The Windows crashes seem to be mainly about the innerHTML setter crashing.

There are some Windows stacks that look alike to the general Mac situation but with a real-looking stack:
https://crash-stats.mozilla.org/report/index/bfbc514a-20b0-4dec-9872-461880200828
https://crash-stats.mozilla.org/report/index/8343af97-6d63-4bee-9758-ac6ae0200824

These show crash address as 0xffffffff. What does that signify?

Here's a document.write one:
https://crash-stats.mozilla.org/report/index/968fab48-c649-4958-b907-cca300200827

(This one reports 0xffffff8b as the crash address.)

Flags: needinfo?(gsvelto)
See Also: → 1665411

(In reply to Henri Sivonen (:hsivonen) (away from Bugzilla until 2020-09-26) from comment #10)

These show crash address as 0xffffffff. What does that signify?

No special meaning, it's just that those are 32-bit builds and 0xffffffff is an address that gets reported if we hit it (it's usually because we did a NULL pointer access minus one).

Here's a document.write one:
https://crash-stats.mozilla.org/report/index/968fab48-c649-4958-b907-cca300200827

(This one reports 0xffffff8b as the crash address.)

I don't see anything special there but there's some evidence of stack corruption too.

Flags: needinfo?(gsvelto)
Depends on: 1676343

The [@ nsTArray_Impl<T>::AppendElementsInternal<T> | nsHtml5TreeBuilder::startTag] crashes from the last 3 months are all on ESR78. There are crashes from more recent versions for [@ nsHtml5TreeBuilder::startTag ], but they don't look too specifically scary.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.