Closed Bug 1405571 Opened 2 years ago Closed 2 years ago

Crash in encoding_rs::Decoder::max_utf16_buffer_length

Categories

(Core :: DOM: Core & HTML, defect, critical)

58 Branch
Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1070763
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 --- unaffected
firefox58 --- fixed

People

(Reporter: calixte, Assigned: wisniewskit)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [clouseau])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-80a424bb-4dfe-49b0-9b2b-8224f0171004.
=============================================================

There are 68 crashes in nightly 58 with buildid 20171003220138. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1070763.

[1] https://hg.mozilla.org/mozilla-central/rev?node=4bac27faa19b0ff556aff9a466053239c1c05c8d
Flags: needinfo?(wisniewskit)
Crash Signature: [@ encoding_rs::Decoder::max_utf16_buffer_length] → [@ encoding_rs::Decoder::max_utf16_buffer_length] [@ mozalloc_abort | abort | encoding_rs::{{impl}}::max_utf16_buffer_length]
This is a panic whose message gives the reason:
DecoderLifeCycle::Finished => panic!("Must not use a decoder that has finished."),

I.e. XHR calls into mDecoder after mDecoder has returned kInputEmpty from a call to Decode() where the last argument was true.
I can reproduce the crash with url https://dev.to/ben and in scrolling when the page is loading.
Crash Signature: [@ encoding_rs::Decoder::max_utf16_buffer_length] [@ mozalloc_abort | abort | encoding_rs::{{impl}}::max_utf16_buffer_length] → [@ encoding_rs::Decoder::max_utf16_buffer_length] [@ mozalloc_abort | abort | encoding_rs::{{impl}}::max_utf16_buffer_length] [@ mozalloc_abort | abort | encoding_rs::Decoder::max_utf16_buffer_length]
Duplicate of this bug: 1405574
Duplicate of this bug: 1405646
Henri, this is indeed because of my XHR BOM patch. It happens because MaxUTF16BufferLength crashes intentionally if the stream is finished, and there is no way for me to know if it is finished or not at the time I call it.

I'll either have to not call it at all (and just make a speculative buffer as I was doing before you suggested MaxUTF16BufferLength, since Decode* doesn't crash if the stream is finished), or change it to not crash, or add a "finished" method to the Decoder class and use that before trying to call MaxUTF16BufferLength, or we'll have to think of something else.

Thoughts?
Flags: needinfo?(wisniewskit) → needinfo?(hsivonen)
If fixing this takes time, could we please backout bug 1070763. We should keep Nightly rather stable.
Yes, I've just requested a backout in the original bug.
(In reply to Thomas Wisniewski from comment #5)
> It happens because
> MaxUTF16BufferLength crashes intentionally if the stream is finished, and
> there is no way for me to know if it is finished or not at the time I call
> it.

Why not? Doesn't OnStopRequest fire only once? The caller not having clarity on the stream having finished seems like the root problem.

> I'll either have to not call it at all (and just make a speculative buffer
> as I was doing before you suggested MaxUTF16BufferLength, since Decode*
> doesn't crash if the stream is finished),

Decode() should panic if called after the stream has finished. If it doesn't that's a bug in itself.

> or change it to not crash, or add
> a "finished" method to the Decoder class

I'd rather not add such a method, because asking the decoder if the EOF has been seen seems like hiding a problem in the calling code: That the calling code isn't clear on whether it has seen the EOF.

At the simplest, the caller could set mDecoder to nullptr after calling Decode() with the last argument set to true.
Flags: needinfo?(hsivonen)
Ah, you're indeed right: OnStopRequest is getting called twice (the second time from nsCORSListenerProxy::OnStopRequest). Your suggested fix seems to be sufficient. I'm doing a quick try-run just in case: https://treeherder.mozilla.org/#/jobs?repo=try&revision=732bd5b3d554b2a43fcc8eb2c1db449fc34fb135
The try-run seems fine, so I've requested review for the patch.
(In reply to Thomas Wisniewski from comment #9)
> Ah, you're indeed right: OnStopRequest is getting called twice (the second
> time from nsCORSListenerProxy::OnStopRequest).

On surface, that seems like the root cause bug. Why doesn't the CORS proxy ensure that it's called only once?

The patch doesn't add a null check anywhere. With the patch, what happens on the second OnStopRequest call? Does a new useless decoder get allocated so that it processes a zero-length stream and, therefore, produces no additional output?
Flags: needinfo?(wisniewskit)
The signature 'encoding_rs::Decoder::max_utf16_buffer_length' is ranked #6 in nightly top-crashers for content process with 1157 crashes.
Keywords: topcrash
Assignee: nobody → wisniewskit
>Why doesn't the CORS proxy ensure that it's called only once?

That I don't know, but I would rather make this code more robust one way or the other. But I'm certainly game to do a follow-up on why the CORS code ends up calling OnStopRequest a second time, as I've wanted to figure that code out as well.

>The patch doesn't add a null check anywhere. With the patch, what happens on the second OnStopRequest call?

The same function I added to in this patch (AppendToResponseText) already checks "NS_ENSURE_STATE(mDecoder)" when it is first called.

In addition, the only place OnStopRequest calls AppendToResponseText is in the code I added in my BOM patch, which also does a null check:

>  if (mDecoder && !mFlagParseBody) {
>    AppendToResponseText(nullptr, 0, true);
>  }

I also don't see a reason for mDecoder to become non-null again until the XHR has a new stream to decode. (With this patch, the only way I see that happening after the first OnStopRequest would be if OnDataAvailable is called after OnStopRequest. Is that ever supposed to happen?)
Flags: needinfo?(wisniewskit)
Flags: needinfo?(hsivonen)
Comment on attachment 8915330 [details]
Bug 1405571 - Only do a final flush on the XHR text decoder once;

https://reviewboard.mozilla.org/r/186522/#review191928
Attachment #8915330 - Flags: review?(hsivonen) → review+
Flags: needinfo?(hsivonen)
Alright, I'm setting checkin-needed, and will file a follow-up bug ASAP. Thanks, Henri!
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/224d8c57a469
Only do a final flush on the XHR text decoder once; r=hsivonen
Keywords: checkin-needed
Duplicate of this bug: 1406324
https://hg.mozilla.org/mozilla-central/rev/224d8c57a469
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
This is the #2 Windows topcrash in Nightly 20171006220306. It is gone in Nightly 20171007100142. Excellent!
Status: RESOLVED → VERIFIED
Depends on: 1407332
Is this back?  A single crash, but the build id is 10/10 - https://crash-stats.mozilla.com/report/index/5003719e-21a1-4b2d-b82b-124b00171011
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
There is no url with the report but it is from release channel nightly-oak which appears to be https://hg.mozilla.org/projects/oak/. I think this is just a stale project repo.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Depends on: 1408495
Depends on: 1408697
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/32fdad6bf6f0
Backed out changeset 224d8c57a469 for crashes and broken websites. r=backout a=backout
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla58 → ---
Depends on: 1407102
Depends on: 1406785
There haven't been any more crashes with these signatures since the backout, at least on builds newer than 20171015110433.

I think we should dupe this against Bug 1070763 and it should address the issues raised in this bug when it gets re-landed. I'll leave a comment over there as well.
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1070763
As the root cause of this, bug 1070763, has been backed out, changing the status of Firefox 58 correspondingly.
Flags: needinfo?(wisniewskit)
You need to log in before you can comment on or make changes to this bug.