Closed
Bug 1070763
Opened 10 years ago
Closed 6 years ago
XHR does not sniff the BOM before decoding responseText
Categories
(Core :: DOM: Core & HTML, defect, P3)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla63
People
(Reporter: slama-h, Assigned: wisniewskit, Mentored)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [lang=c++][wptsync upstream])
Attachments
(4 files, 1 obsolete file)
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0
Build ID: 20140917194002
Steps to reproduce:
Go to URL http://downloadcenter.samsung.com/content/MC/201303/20130327211500937/EN/Eng/start_here.html (User Manual for tablet Samsung Galaxy Note 8.0). User agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0.
Actual results:
User Manual's table of contents (panel on left-hand side) displays OK but content area (large panel on right-hand side) shows raw HTML preceded by 2 question marks (0x3f). See attachment Galaxy_Manual_Firefox.jpg
Expected results:
Content area should look as rendered by Google Chrome - see attachment Galaxy_Manual_Chrome.jpg
Reporter | ||
Comment 1•10 years ago
|
||
Reporter | ||
Comment 2•10 years ago
|
||
Note 1: Not sure about Version: 9 Branch and other keywords; I'm not a developer.
Note 2: The error also occurs with Firefox for Android (32.0.1) on tablet Samsung Galaxy Note 8.0 (GT-N5110) with Android 4.4.2. Chrome for Android renders properly.
Severity: normal → minor
Comment 3•10 years ago
|
||
I can confirm this with Firefox32
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 4•10 years ago
|
||
This is not an HTML parser bug.
The site uses jQuery's "ajax" function to retrieve the HTML as XHR's responseText. The HTML is little-endian UTF-16 with BOM.
For some reason that I haven't figured out at this time, XHR decodes the response as a rough ASCII superset instead of sniffing the BOM and decoding as UTF-16.
Component: HTML: Parser → DOM: Core & HTML
OS: Windows 7 → All
Hardware: x86_64 → All
Version: 9 Branch → unspecified
Updated•10 years ago
|
Summary: Raw HTML displayed in main content DIV instead of being rendered → XHR decodes BOMful UTF-16 responseText as non-UTF-16
Updated•10 years ago
|
Mentor: bzbarsky
Whiteboard: [lang=c++]
Comment 5•10 years ago
|
||
Hallvors, could you check if there's wpt tests for this?
Flags: needinfo?(hsteen)
Comment 6•10 years ago
|
||
I reported this - thanks: https://github.com/w3c/web-platform-tests/issues/1586
Flags: needinfo?(hsteen)
Comment 7•9 years ago
|
||
Also a problem with UTF-8: see bug 1193947.
Summary: XHR decodes BOMful UTF-16 responseText as non-UTF-16 → XHR does not sniff the BOM before decoding responseText
Updated•9 years ago
|
Depends on: encoding_rs
Assignee | ||
Comment 10•7 years ago
|
||
I have a simple patch in the works which has XHRs sniff for a BOM and change decoders accordingly, before they actually process the response.
However, it's still not letting Firefox pass one of the tests in the WPT responsetext-decoding.htm, the one named: XMLHttpRequest: responseText decoding (text/plain %C2)
I'm honestly not sure how to fix that, nor if it's even related to this specific bug. Thoughts, Anne?
Comment 11•7 years ago
|
||
(In reply to Thomas Wisniewski from comment #10)
> I have a simple patch in the works which has XHRs sniff for a BOM and change
> decoders accordingly, before they actually process the response.
Does XHR expose detail that makes the built-in morphing of mozilla::Decoder according to BOM not suffice?
Comment 12•7 years ago
|
||
Thomas, I think that might be due to XMLHttpRequest not properly signaling EOF. I thought Henri fixed that, but maybe not for all callers?
Comment 13•7 years ago
|
||
(In reply to Anne (:annevk) from comment #12)
> Thomas, I think that might be due to XMLHttpRequest not properly signaling
> EOF. I thought Henri fixed that, but maybe not for all callers?
I've made both the EOF and the BOM bugs in XHR fix*able*, but I haven't fixed the XHR callers.
Comment 14•7 years ago
|
||
(In reply to Thomas Wisniewski from comment #10)
> I'm honestly not sure how to fix that, nor if it's even related to this
> specific bug. Thoughts, Anne?
Anne is right that that's about EOF. XHR needs to signal the EOF to the decoder, and the decoder will do the right thing.
Assignee | ||
Comment 15•7 years ago
|
||
I just tried adding this to the XHR OnStopRequest handler, but no dice (the same three failures still happen on the web platform test):
> // Signal EOF to mDecoder
> if (mDecoder) {
> Span<uint8_t> src; // empty span
> nsString dst;
> dst.SetLength(1024);
> uint32_t result;
> size_t read;
> size_t written;
> bool hadErrors;
> Tie(result, read, written, hadErrors) = mDecoder->DecodeToUTF16(src, dst, true);
> Unused << hadErrors;
> MOZ_ASSERT(read == 0, "How come an empty span was read form?");
> }
Assuming that's sufficient to send the EOF signal, then something else must be tripping up the decoder. Thoughts?
Flags: needinfo?(hsivonen)
Assignee | ||
Comment 16•7 years ago
|
||
I figured I might as well post my patch here just in case.
Comment 17•7 years ago
|
||
(In reply to Thomas Wisniewski from comment #15)
> I just tried adding this to the XHR OnStopRequest handler, but no dice (the
> same three failures still happen on the web platform test):
>
> > // Signal EOF to mDecoder
> > if (mDecoder) {
> > Span<uint8_t> src; // empty span
> > nsString dst;
> > dst.SetLength(1024);
> > uint32_t result;
> > size_t read;
> > size_t written;
> > bool hadErrors;
> > Tie(result, read, written, hadErrors) = mDecoder->DecodeToUTF16(src, dst, true);
> > Unused << hadErrors;
> > MOZ_ASSERT(read == 0, "How come an empty span was read form?");
> > }
>
> Assuming that's sufficient to send the EOF signal, then something else must
> be tripping up the decoder. Thoughts?
That should make the decoder process the eof, but your code quote doesn't show anything done with dst afterwards. Are you saying dst doesn't end up with a REPLACEMENT CHARACTER in it on the test case that has a single non-ASCII byte labeled as UTF-8?
(In reply to Thomas Wisniewski from comment #16)
> Created attachment 8912078 [details] [diff] [review]
> process_bom_in_xhrs.diff
>
> I figured I might as well post my patch here just in case.
The patch seems to manage BOM sniffing manually unnecessarily. That is, I don't see the manual sniffing interacting with the XML declaration or anything like that.
If you do mDecoder = encoding->NewDecoder();, mDecoder will handle sniffing for you. After mDecoder has seen 3 or more bytes, mDecoder->Encoding() is guaranteed to return the post-sniffing encoding if you need to expose it.
Flags: needinfo?(hsivonen)
Assignee | ||
Comment 18•7 years ago
|
||
Ah, thanks, I was clearly too tired last night to really think my code through. The WPT sending just a %C2 character does indeed pass when I add this code-block to OnStopRequest instead:
> if (mDecoder) {
> Span<uint8_t> src; // empty span
> XMLHttpRequestStringWriterHelper helper(mResponseText);
> for (;;) {
> if (!helper.AddCapacity(1024)) {
> return NS_ERROR_OUT_OF_MEMORY;
> }
> auto dst = MakeSpan(helper.EndOfExistingData(), 1024);
> uint32_t result;
> size_t read;
> size_t written;
> bool hadErrors;
> Tie(result, read, written, hadErrors) =
> mDecoder->DecodeToUTF16(src, dst, true);
> Unused << hadErrors;
> MOZ_ASSERT(read == 0, "How come an empty span was read from?");
> helper.AddLength(written);
> if (result != kOutputFull) {
> break;
> }
> }
> }
Unfortunately, it still doesn't pass the case when a UTF-16BE/LE BOM is sent with no other characters (or is sent twice/doubled, like %FE%FF%FE%FF). The stream ends up getting %FD%FD instead of the BOM being stripped (or twice that if the BOM was doubled). That is, it's picking UTF-8, even if I start it off with the correct encoding.
But my patch passes these cases, so I'm not sure what's going on here. Any hints?
Flags: needinfo?(hsivonen)
Comment 19•7 years ago
|
||
(In reply to Thomas Wisniewski from comment #18)
> > auto dst = MakeSpan(helper.EndOfExistingData(), 1024);
1024 seems excessive. mDecoder->MaxUTF16BufferLength(src.Length()) should be much less than 1024. (The argument of MaxUTF16BufferLength() is misleadingly named in Encoding.h; I should fix that.)
> Unfortunately, it still doesn't pass the case when a UTF-16BE/LE BOM is sent
> with no other characters (or is sent twice/doubled, like %FE%FF%FE%FF). The
> stream ends up getting %FD%FD instead of the BOM being stripped (or twice
> that if the BOM was doubled). That is, it's picking UTF-8, even if I start
> it off with the correct encoding.
How do you instantiate mDecoder? (Should be encoding->NewDecoder().)
Flags: needinfo?(hsivonen)
Assignee | ||
Comment 20•7 years ago
|
||
>Should be encoding->NewDecoder()
Ah, that was it, thanks! (I did try that, but it turns out there were two places where we instantiate a decoder in the XHR code, and I had only changed one of them for my testing).
I'll get a patch ready ASAP.
Assignee | ||
Comment 21•7 years ago
|
||
Actually, I decided to skip writing all of that code in favor of just calling the existing XHR function AppendToResponseText which reads from the decoder (but changing that function to send DecodeToUTF16 aLast=true when given an empty source buffer as input).
That does the trick, and makes us start passing a good number of web platform tests. Curiously, two tests in encoding/unsupported-encodings.html still don't pass, and I'm not sure why:
UTF-32be with BOM should decode as windows-1252
utf-32be with BOM should decode as windows-1252
Meanwhile, the analogous tests with no BOM or with UTF-32le *do* pass:
UTF-32be with no BOM should decode as windows-1252
utf-32be with no BOM should decode as windows-1252
UTF-32LE with BOM should decode as UTF-16LE
utf-32le with BOM should decode as UTF-16LE
I'm guessing we can figure those cases out in a follow-up bug, however.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 24•7 years ago
|
||
Alright, sent my patch up for review; it seems to pass try just fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=89c3acb2372a4e6229e7e78622aa973f68d284b0
Comment 25•7 years ago
|
||
mozreview-review |
Comment on attachment 8914573 [details]
Bug 1070763 - Ensure that XHRs sniff the BOM for non-JSON responseTypes, and flush the decoder upon end-of-stream;
https://reviewboard.mozilla.org/r/185910/#review190926
r+ provided that the EOF signaling is made explicit.
::: dom/xhr/XMLHttpRequestMainThread.cpp:569
(Diff revision 1)
> size_t written;
> bool hadErrors;
> Tie(result, read, written, hadErrors) = mDecoder->DecodeToUTF16(
> AsBytes(MakeSpan(aSrcBuffer, aSrcBufferLen)),
> MakeSpan(helper.EndOfExistingData(), destBufferLen.value()),
> - false);
> + aSrcBufferLen == 0);
I don't see enough assurances that Necko won't report a zero-length segment in the middle of the stream. Please add `bool aLast = false` as an argument of `AppendToResponseText` and pass `true` explicitly on EOF instead of inferring it from the input length.
Attachment #8914573 -
Flags: review?(hsivonen) → review+
Comment 26•7 years ago
|
||
(In reply to Thomas Wisniewski from comment #21)
> Curiously, two tests in encoding/unsupported-encodings.html
> still don't pass, and I'm not sure why:
>
> UTF-32be with BOM should decode as windows-1252
> utf-32be with BOM should decode as windows-1252
What do those decode as?
> I'm guessing we can figure those cases out in a follow-up bug, however.
OK.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 29•7 years ago
|
||
Ah, good idea to just pass an explicit parameter. Done.
Requesting checkin.
Assignee: nobody → wisniewskit
Keywords: checkin-needed
Comment 30•7 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4bac27faa19b
Ensure that XHRs sniff the BOM for non-JSON responseTypes, and flush the decoder upon end-of-stream; r=hsivonen
Keywords: checkin-needed
Comment 31•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Assignee | ||
Comment 32•7 years ago
|
||
KWierso, I'd like to request backing this patch out, as it is causing unexpected crashes (see bug 1405571) and I can't be sure whether the fix will be trivial.
Flags: needinfo?(wkocher)
Assignee | ||
Comment 33•7 years ago
|
||
Actually, I think we did find the fix, so let's not back it out after all.
Flags: needinfo?(wkocher)
Comment 34•7 years ago
|
||
The fix seems to have broken the internet :(
Assignee | ||
Comment 35•7 years ago
|
||
Yeah, I can only apologize for the bad timing (I didn't expect this much fallout to happen while I was on PTO).
The good news is that it seems like :baku may have already found the fix to unbreak things in bug 1407573.
Comment 36•7 years ago
|
||
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/f3da88fdee4a
Backed out changeset 4bac27faa19b for crashes and broken websites. r=backout a=backout
Comment 37•7 years ago
|
||
Backed out bug 1070763 and bug 1405571 for crashes and broken websites:
bug 1070763
https://hg.mozilla.org/mozilla-central/rev/f3da88fdee4ab79a2212f8d2d1efd37fb5009b58
bug 1405571
https://hg.mozilla.org/mozilla-central/rev/32fdad6bf6f0417d9b094a13e0670ddb76a085b8
See bug 1405571 and the dependencies (broken websites).
Status: RESOLVED → REOPENED
Flags: needinfo?(wisniewskit)
Resolution: FIXED → ---
Target Milestone: mozilla58 → ---
Updated•7 years ago
|
status-firefox58:
fixed → ---
Comment 38•7 years ago
|
||
It seems to me that a relanding should include a robust fix for the root cause of the CORS proxy causing double calls to OnStopRequest.
Comment 41•7 years ago
|
||
I just duped Bug 1405571 (crash fallout from this patch landing) against this bug. Please ensure the issues raised there are addressed in the re-landing, thanks.
Comment 42•7 years ago
|
||
Marking this affected for 58 based on bug 1405571.
status-firefox58:
--- → affected
Comment 43•7 years ago
|
||
Tracking this for 58 since it seems complex and I am coming across duplicates.
tracking-firefox58:
--- → +
Assignee | ||
Comment 44•7 years ago
|
||
I don't think it's necessary to track this one, since it's an old bug that hasn't been causing any major issues that I'm aware of (the real issues began when I tried to fix this, causing bug 1405571, which has been resolved for 57).
Flags: needinfo?(wisniewskit) → needinfo?(lhenry)
Comment 46•7 years ago
|
||
status-firefox59:
--- → ?
Updated•6 years ago
|
Priority: -- → P3
Comment 47•6 years ago
|
||
So I've deduced that the failure which caused this to be backed out in bug 1405571 is *not* caused by OnStopRequest being called multiple times, but rather because of these early returns in OnStartRequest:
This can happen when any of the following early-returns happen in OnStartRequest, because mDecoder is only cleared after
>NS_IMETHODIMP
>XMLHttpRequestMainThread::OnStartRequest(nsIRequest *request, nsISupports *ctxt)
>{
> // snip for brevity
>
> if (request != mChannel) {
> // Can this still happen?
> return NS_OK;
> }
>
> // Don't do anything if we have been aborted
> if (mState == XMLHttpRequest_Binding::UNSENT) {
> return NS_OK;
> }
>
> /* Apparently, Abort() should set UNSENT. See bug 361773.
> XHR2 spec says this is correct. */
> if (mFlagAborted) {
> NS_ERROR("Ugh, still getting data on an aborted XMLHttpRequest!");
>
> return NS_ERROR_UNEXPECTED;
> }
>
> // Don't do anything if we have timed out.
> if (mFlagTimedOut) {
> return NS_OK;
> }
When you realize that mDecoder is only set later in this function (during the call to DetectCharset), the reason for the crash becomes obvious: if an XHR object is re-used after hitting one of these early returns, it will have a stale mDecoder from its previous request. And of course, re-using that decoder rightly triggers the crash.
The fix for this is trivial; just nullify mDecoder during each call to open().
Unfortunately, engineering a test for the first early-return above is beyond me, but I managed to come up with a tests for the other three (I wanted to use crashtests instead of mochitests, but have no idea how to allow a crashtest to hit the network, which is necessary to trigger these cases as data, blob and file URLs won't ever hit these cases). Alternatively, I could move them into a new web platform test instead, but I'm not sure if that's the right thing to do for crashtests.
I'm doing a try-run with my rebased patch here to see what else I may have missed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f0df0841d7bcf8cf5e8a17c89b8ddfea2939033
Comment 48•6 years ago
|
||
Using wpt is fine.
Comment 49•6 years ago
|
||
This bug makes us look bad on http://w3c-test.org/encoding/unsupported-encodings.any.html , because the test harness uses XHR for testing BOM handling.
Comment 50•6 years ago
|
||
Henri, I could use a little help here. I'm unable to grasp why my revised patch is failing the final test in the (recently-added) WPT: http://w3c-test.org/xhr/overridemimetype-edge-cases.window.html
The failure message is:
>assert_equals: expected "\ufffd" but got "\ufffd\ufffd"
It's doing an XHR to:
>const testURL = "resources/status.py?type=" + encodeURIComponent("text/plain;charset=windows-1252") + "&content=%C2%F0";
The UTF-8 decoder is being selected as expected, and indeed only 2 chars are being sent to it in AppendToResponseText, 0xC2 and 0xF0.
But the call to UTF8Decoder->DecodeToUTF16 is converting that input into two replacement characters instead of the one the test expects. (For reference, the call is also returning hadError=true, and written=1).
Flags: needinfo?(hsivonen)
Comment 51•6 years ago
|
||
Why does the patch make XHR do its own BOM sniffing instead of letting mozilla::Decoder do it?
> The UTF-8 decoder is being selected as expected, and indeed only 2 chars are being sent to it in AppendToResponseText, 0xC2 and 0xF0.
The test looks wrong. 0xC2 and 0xF0 are both UTF-8 lead bytes. There are fewer REPLACEMENT CHARACTERs than bogus bytes if there are some UTF-8 trail bytes following a lead byte such that the bogus sequence forms a prefix of what would be a valid sequence. In this case 0xC2, 0xF0 is not a prefix of a valid sequence. 0xC2 is a prefix and 0xF0 is another prefix, so there are two REPLACEMENT CHARACTERs.
Flags: needinfo?(hsivonen) → needinfo?(twisniewski)
Comment 52•6 years ago
|
||
>Why does the patch make XHR do its own BOM sniffing instead of letting mozilla::Decoder do it?
It does... I guess you mean my commit message is too vague? (All the patch really does is change to using a non-BOM-stripping Decoder, call its DecodeToUTF16 one last time when the stream is over with aLast=true, and add a line to reset the decoder on open()).
>The test looks wrong.
Ah, ok, that makes sense to me. Anne, I'll change the test as part of this patch so it expects the correct number of REPLACEMENT CHARACTERS, unless you feel that's not the way to go?
Flags: needinfo?(twisniewski) → needinfo?(annevk)
Updated•6 years ago
|
Flags: needinfo?(hsivonen)
Comment 53•6 years ago
|
||
Sounds good to me, thanks for cleaning this up! (I suspect the test ended up being wrong here due to EOF decoding handling only being recently fixed in Firefox and not yet in other implementations.)
Flags: needinfo?(annevk)
Comment 54•6 years ago
|
||
(In reply to twisniewski from comment #52)
> >Why does the patch make XHR do its own BOM sniffing instead of letting mozilla::Decoder do it?
>
> It does... I guess you mean my commit message is too vague?
I was looking at the wrong attachment. I was looking at attachment 8912078 [details] [diff] [review].
Flags: needinfo?(hsivonen)
Updated•6 years ago
|
Attachment #8912078 -
Attachment is obsolete: true
Comment 55•6 years ago
|
||
Comment 56•6 years ago
|
||
Alright, I've revised the WPT in this version of the patch. A try run seems fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=63a0269304f14a831e785d8cbaecd5841dac4e2a
That said, I could not push a fresh review to MozReview (no permission to re-open the review there), so I decided to try this fancy new Phabricator thing we have going on now.
Comment 57•6 years ago
|
||
Comment on attachment 8999002 [details]
Bug 1070763 - Ensure that XHRs sniff the BOM for non-JSON responseTypes, and flush the decoder upon end-of-stream; r?hsivonen
Henri Sivonen (:hsivonen) has approved the revision.
Attachment #8999002 -
Flags: review+
Comment 58•6 years ago
|
||
Thanks Henri! Requesting check-in. Sheriffs, please note that the Phabricator patch is the one intended to land; there was an earlier MozReview one which is now stale, and had been backed out.
Keywords: checkin-needed
Comment 59•6 years ago
|
||
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8b97d3b93472
Ensure that XHRs sniff the BOM for non-JSON responseTypes, and flush the decoder upon end-of-stream; r=hsivonen
Keywords: checkin-needed
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/12411 for changes under testing/web-platform/tests
Whiteboard: [lang=c++] → [lang=c++][wptsync upstream]
Can't merge web-platform-tests PR due to failing upstream checks:
Github PR https://github.com/web-platform-tests/wpt/pull/12411
* continuous-integration/travis-ci/pr (https://travis-ci.org/web-platform-tests/wpt/builds/414556015?utm_source=github_status&utm_medium=notification)
Whiteboard: [lang=c++][wptsync upstream] → [lang=c++][wptsync upstream error]
Whiteboard: [lang=c++][wptsync upstream error] → [lang=c++][wptsync upstream]
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Comment 63•6 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Updated•6 years ago
|
Upstream PR merged
You need to log in
before you can comment on or make changes to this bug.
Description
•