Closed Bug 1693137 Opened 3 years ago Closed 3 years ago

Crash in [@ XDRParserAtomTable<T>]

Categories

(Core :: JavaScript Engine, defect, P1)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- unaffected
firefox86 + wontfix
firefox87 + disabled
firefox88 --- fixed

People

(Reporter: aryx, Assigned: arai)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Tooru, is this a signature change?

Crash report: https://crash-stats.mozilla.org/report/index/0574eb13-cc60-4f7a-a907-25af60210215

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(idx < storage_.size())

Top 10 frames of crashing thread:

0 xul.dll XDRParserAtomTable<js::XDR_DECODE> js/src/vm/Xdr.cpp:346
1 xul.dll js::XDRStencilDecoder::codeStencils js/src/vm/Xdr.cpp:513
2 xul.dll js::ScriptDecodeTask::parse js/src/vm/HelperThreads.cpp:811
3 xul.dll js::ParseTask::runHelperThreadTask js/src/vm/HelperThreads.cpp:617
4 xul.dll static js::HelperThread::ThreadMain js/src/vm/HelperThreads.cpp:2342
5 xul.dll static js::detail::ThreadTrampoline<void  js/src/threading/Thread.h:205
6 ucrtbase.dll thread_start<unsigned int , 1> 
7 kernel32.dll BaseThreadInitThunk 
8 ntdll.dll RtlUserThreadStart 
9 kernelbase.dll TerminateProcessOnMemoryExhaustion 
Flags: needinfo?(arai.unmht)

I'm not sure if this is signature change.

The crash is in the following entries_[index] access.

https://searchfox.org/mozilla-central/rev/5120ec68572d946bd15101cf2ee2eaf4a210724f/js/src/vm/Xdr.cpp#338-341

template <XDRMode mode>
static XDRResult XDRParserAtomTable(XDRState<mode>* xdr,
                                    frontend::BaseCompilationStencil& stencil) {
...
  for (uint32_t i = 0; i < atomCount; i++) {
    frontend::ParserAtom* entry = nullptr;
    uint32_t index;
    MOZ_TRY(xdr->codeUint32(&index));
    MOZ_TRY(XDRParserAtom(xdr, &entry));
    xdr->frontendAtoms().set(frontend::ParserAtomIndex(index), entry);
  }

https://searchfox.org/mozilla-central/rev/5120ec68572d946bd15101cf2ee2eaf4a210724f/js/src/frontend/ParserAtom.h#667-669

class ParserAtomSpanBuilder {
...
  void set(ParserAtomIndex index, const ParserAtom* atom) {
    entries_[index] = const_cast<ParserAtom*>(atom);
  }

in this report https://crash-stats.mozilla.org/report/index/0574eb13-cc60-4f7a-a907-25af60210215
the index is 0x3bb7 (rbx), that's in possible range but somewhat large.

in this report https://crash-stats.mozilla.org/report/index/4c1c6474-059d-4185-ba68-710f10210216
the index is 0x351b5 (edi), that's not-reasonable large value.

Anyway, this is caused by a corrupted XDR data, that can be one of:

  • a bug in encoder that generates invalid data
  • format mismatch between encoder/decoder
  • memory corruption before writing to the file
  • file corruption
  • memory corruption after reading from the file

then, given the similar crash happens in several installation,
I'll check the encoder/decoder.

Flags: needinfo?(arai.unmht)

after bug 1687095, the XDR data structure becomes simple, and usually small for content JS.
we could add some kind of extra check (hash, checksum, magic number), in between each fields,
so that we can verify that which part of the data is corrupted

Depends on: 1693184

Note that the Necko/JSBC files do have file hashes (256kB chunks) and we are only use Stencil-XDR for JSBC right now. Memory corruption before / after encoding or possible but we should be suspicious about these crashes.

Arai, comment 2 sounds like the right way to proceed.

For other XDR decoding errors we return BadDecode; maybe we should be doing that here instead of relying on a release-assert to save us from a huge security bug.

On the other hand, maybe the way we handle XDR errors is masking a serious underlying problem.

Severity: -- → S2
Flags: needinfo?(arai.unmht)
Priority: -- → P1

I'll address this in bug 1693184 as a first step.

Then, there's a question about how much/deeply we care/validate about the integrity of the XDR buffer.

Currently, there's almost no check, except:

  • the release-assert in Span, that catches "greater than expected" index, or "smaller than expected" length
  • buffer-size check inside fundamental XDRState methods (read/peek/align/borrow), that catches "shorter than expected" buffer

Then, things not covered are:

  • the body of pointer-free structure (ScriptStencil, ParserAtom, etc)
    • it's just borrowed from the XDR buffer, and isn't validated in any way, and not accessed until instantiation
  • "smaller than expected" index
    • that will result in reading from/writing into wrong data, but not caught until using it in instantiation/execution
  • "greater than expected" length
    • if the decode loops over the length, it will result in buffer overrun and will be caught in some case, but if the length is used just for allocation, it won't be caught

Adding marker between fields will catch the case that,

  • when the length of Span is wrong
  • when the decode loop is buggy about the number of items to read

Both of them will result in consuming wrong size of data in the XDR buffer.
If the consumed buffer isn't immediately used (this happens for borrowed data), the marker will be checked before decoding the next field,
so, such case results in soft-error.
But this doesn't solve anything about the error in pointer-free structure itself.

Adding hash/checksum for each field will catch any error after encoding, including

  • the length of span
  • wrong data in stencil structure

but it's somewhat costly, both for encoding/decoding, and also the checksum itself overlaps with JSBC's one.

Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED

If we assume that XDR buffer can contain bogus data, we'll need to either:

  • add a validation step inside/after decode, for each stencil field
  • validate stencil fields during instantiation, when using each stencil field
Flags: needinfo?(arai.unmht)
Flags: needinfo?(arai.unmht)
Depends on: 1693625
Flags: needinfo?(arai.unmht)

Could we consider backing out the regressing bug in the short term or is that too messy?

Flags: needinfo?(arai.unmht)

for 88, the fix is coming (bug 1693625).
for 86/87, we can backport the patch with minor change.

unfortunately reverting the bug's patch isn't easy (it's mid of large change), and I think the underlying issue isn't caused by single bug.
will try fixing bug 1693625 shortly.

Flags: needinfo?(arai.unmht)

This signature seems to just be the new version of Bug 1423616, Bug 1407651 which generally were related to disk and memory corruption. I'll take another look at if there is anything new we can do, but I'm not sure this is much of a "regression" compared to the old code.

Attached a patch for mozilla-beta, in bug 1693625.

No crashes so far in 87.0b8.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.