Crash in [@static union core::option::Option<T> encoding_rs::Decoder::max_utf16_buffer_length] with moz crash reason "Must not use a decoder that has finished."

RESOLVED FIXED in Firefox 63

Status

()

defect
--
critical
RESOLVED FIXED
11 months ago
11 months ago

People

(Reporter: philipp, Assigned: hsivonen)

Tracking

({crash, regression})

63 Branch
mozilla63
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox61 unaffected, firefox62 unaffected, firefox63 fixed)

Details

(crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is
report bp-55d515c7-c65e-49bc-bd21-c19ef0180811.
=============================================================

starting in yesterday's 63.0a1 nightly build 20180810220115 a number of content crash reports with an unsymbolized signature/stack and with Moz crash reason "Must not use a decoder that has finished." are showing up:
https://crash-stats.mozilla.com/search/?moz_crash_reason=~Must%20not%20use%20a%20decoder%20that%20has%20finished.&version=63.0a1&date=%3E%3D2018-07-01#facet-signature
here's the pushlog of patches going into that nightly build:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d999fb&tochange=9a6728
hi, might this be related to the work in bug 1070763?
Flags: needinfo?(hsivonen)
Crash Signature: xul.dll@0x330eb05 | xul.dll@0x9a0579 | xul.dll@0x330ef8b | xul.dll@0x37cb8b ...] → xul.dll@0x330eb05 | xul.dll@0x9a0579 | xul.dll@0x330ef8b | xul.dll@0x37cb8b ...] [@ xul.dll@0x103062a | xul.dll@0x4450dff | xul.dll@0x4452b0e | xul.dll@0xb1f20b | xul.dll@0xb0fa27 | mozglue.dll@0xd166] [@ libmozglue.dylib@0xf2c5] [@ xul.dll@0x1023cba …
(In reply to [:philipp] from comment #2)
> hi, might this be related to the work in bug 1070763?

Most likely, yes.

twisniewski, based on the crash, it looks like we're calling into the decoder after calling it with last=true such that the last=true call has returned kInputEmpty.
Blocks: 1070763
Flags: needinfo?(hsivonen) → needinfo?(wisniewskit)
Crash Signature: mozglue.dll@0xd146] → mozglue.dll@0xd146] [@ static union core::option::Option<T> encoding_rs::Decoder::max_utf16_buffer_length]
Try run to check that the fix to paper over multiple OnStopRequests doesn't cause other badness:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=49ffe91db32f2f85f4da211dc92e23fb726ffadc
Crash Signature: [@ mozalloc_abort | abort | XUL@0x466ee68 | XUL@0x466ee58 | XUL@0x466c358 | XUL@0x40f60b7 | XUL@0x410584a | XUL@0x40f5bc8 | XUL@0x23905a6 | XUL@0x2390929 | XUL@0x23a1242 | XUL@0x2274893 | XUL@0x239bfb0 | XUL@0x1bc2f41 | XUL@0x1bc3627 | XUL@0x1bbc886 | XUL… → [@ static union core::option::Option<T> encoding_rs::Decoder::max_utf16_buffer_length] [@ encoding_rs::Decoder::max_utf16_buffer_length] [@ mozalloc_abort | abort | XUL@0x466ee68 | XUL@0x466ee58 | XUL@0x466c358 | XUL@0x40f60b7 | XUL@0x410584a | XUL@0x40…
Summary: Unsymbolized XUL crashes with Moz crash reason "Must not use a decoder that has finished." → Crash in [@static union core::option::Option<T> encoding_rs::Decoder::max_utf16_buffer_length] with moz crash reason "Must not use a decoder that has finished."
It would probably be cleaner to track down the cause of multiple OnStopRequests and fix that, but let's do the easy fix to stop backing out the XHR BOM patch (which affects our wpt.fyi score on encoding tests).
Assignee: nobody → hsivonen
I agree with Henri, and think his patch will be a good stop-gap. But I may have found the actual root cause.

These are the only symbolified stack traces we have so far:
- https://crash-stats.mozilla.com/report/index/cc4bcecd-2e46-4da1-9d30-fcd340180813
- https://crash-stats.mozilla.com/report/index/f829339a-d667-4c08-ba9e-c3c6e0180813

For the second one, it's unclear what's happening, as all we know is that they are trying to read the response text in a setTimeout callback.

However, the first gives us useful clues. We hit the ChangeStateToDone here, in OnStopRequest, just after we've done the aLast=true Append:

>  if (!mResponseXML) {
>    mFlagParseBody = false;
>    if (!waitingForBlobCreation) {
>      ChangeStateToDone();
>    }

(https://hg.mozilla.org/mozilla-central/annotate/83d0673bbca49e9713f5326708cf9908f2d71764/dom/xhr/XMLHttpRequestMainThread.cpp#l2230)

The script calls GetResponseText while handling the events, which does another Append at the end of this code-fragment:

>  if ((!mResponseXML && !mErrorParsingXML) ||
>      mResponseBodyDecodedPos == mResponseBody.Length()) {
>    mResponseText.CreateSnapshot(aSnapshot);
>    return;
>  }
>  MatchCharsetAndDecoderToResponseDocument();
>  aRv = AppendToResponseText(mResponseBody.get() + mResponseBodyDecodedPos,
>                             mResponseBody.Length() - mResponseBodyDecodedPos);

(https://hg.mozilla.org/mozilla-central/annotate/83d0673bbca49e9713f5326708cf9908f2d71764/dom/xhr/XMLHttpRequestMainThread.cpp#l592)

So logically, since mResponseXML=false, that must mean mErrorParsingXML=true for the first clause to fail. And the only place mErrorParsingXML is set to true is at the end of OnStopRequest, after this code would run. The only way I can see this happening is if the user is re-using the same XHR object, which fails parsing once (empty/bad XML file), then doesn't fail, and hits this case. And indeed, mErrorParsingXML is never reset to false.

I'm trying to engineer a crashtest for this now.
Flags: needinfo?(wisniewskit)
Comment on attachment 8999640 [details]
Bug 1482640 - Null out mDecoder in OnStopRequest to avoid asserting when there is more than one OnStopRequest.

Olli Pettay [:smaug] has approved the revision.
Attachment #8999640 - Flags: review+
So in order for the if-statement to fail, this clause must also be false:

>mResponseBodyDecodedPos == mResponseBody.Length()

But those variables are reset to 0 on each send() in ResetResponse(), and I see no way for them to get out of sync unless the response is XML (which begs the question of how we even got past the !mResponseXML to get to this code in the first place).

That is to say that I'm completely mystified how the stack trace gets here. But given that the other stack trace we have involves timeouts, I think it's safe to say that it's happening in the wild regardless. Henri, I think we should also make sure to clear mErrorParsingXML in at least the ResetResponse method (if not everywhere where we set mResponseXML=nullptr).

I'll continue trying to figure out a crashtest for now, as it would be great to have test coverage for this.
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/02b3167c1a8e
Null out mDecoder in OnStopRequest to avoid asserting when there is more than one OnStopRequest. r=smaug
I'm afraid I can't reproduce a crashtest here, so I think we'd need a URL with a live test-case to try to reproduce it. I still feel we should clear out the mErrorParsingXML variable as I mentioned in comment 8, however. Henri, what do you think?
Flags: needinfo?(hsivonen)
https://hg.mozilla.org/mozilla-central/rev/02b3167c1a8e
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Depends on: 1483764
(In reply to Thomas Wisniewski [:twisniewski] (PTO Aug 15- Sep 7) from comment #12)
> I'm afraid I can't reproduce a crashtest here, so I think we'd need a URL
> with a live test-case to try to reproduce it. I still feel we should clear
> out the mErrorParsingXML variable as I mentioned in comment 8, however.
> Henri, what do you think?

Looks like doing something in that area is needed. Let's do that in bug 1483764.
Flags: needinfo?(hsivonen)
You need to log in before you can comment on or make changes to this bug.