Closed Bug 918742 Opened 11 years ago Closed 9 years ago

[XHR2] Implementation doesn't fix author-supplied charset correctly and adds charset parameter when author doesn't set it

Categories

(Core :: DOM: Core & HTML, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: hsteen, Assigned: nika, Mentored)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-needed)

Attachments

(1 file, 2 obsolete files)

To clarify: this is about generating the right content-type header when POSTing content with XHR, and overriding the author-supplied header correctly if it's wrong. For example, if author set 

xhr.setRequestHeader('Content-type', 'text/html;charset=shift-jis')

and we'll send UTF-8 data, we need to remove the given charset parameter and not just append ";charset=UTF-8".
Per bug 416178 this is a web compatibility problem. If an author sets a MIME type without a charset param, we should not add one and doing so breaks real sites and products out there.
Priority: P5 → P3
Summary: [XHR2] Implementation doesn't fix author-supplied charset correctly (corner cases) → [XHR2] Implementation doesn't fix author-supplied charset correctly and adds charset parameter when author doesn't set it
IIRC, bz has played with XHR+charset
Mentor: bzbarsky
This patch causes all except for 1 of the tests in the wpt linked in comment 0 to pass. The one test which doesn't pass fails to pass due to intentional non-conformance on our part (namely, we don't reword `;charset=utf-8` to `;charset=UTF-8` because it breaks things, according to the comment which implements this).

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb5c484da30d
Attachment #8654348 - Flags: review?(bzbarsky)
Assignee: nobody → michael
Oops - missed a bunch of tests.

new try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=98ee32257d27
Attachment #8654348 - Attachment is obsolete: true
Attachment #8654348 - Flags: review?(bzbarsky)
Attachment #8654865 - Flags: review?(bzbarsky)
I'm really sorry this is taking so long.  I need to make sure to read through the spec so I can check whether we actually match it...

I should be able to get to this on Tuesday.
Comment on attachment 8654865 [details] [diff] [review]
Correct handling of author-supplied charsets in XMLHttpRequest

>+            // Find the start of the actual charset declaration, skipping the

How about fixing NS_ExtractCharsetFromContentType to just return the indices you care about?  This callsite is the only consumer of those indices, as far as I can tell (and yes, I checked the addons mxr).  Followup bug is probably better for this.

We should document NS_ExtractCharsetFromContentType or better yet its IDL version to say that it finds the _last_ charset parameter, since you're depending on that here, right?  Also, add some tests for this multiple-charset-param behavior, if the wpt tests don't test it already.

Why are there remaining failures in the web platform test manifests that you changed?  I'm particularly looking at the "charset given but wrong, fix it (known MIME, bogus charset)" failure...

r=me modulo that last question
Attachment #8654865 - Flags: review?(bzbarsky) → review+
And my apologies for taking so long here...
Oh, I see.  That failure is the one you were talking about in comment 5.

> namely, we don't reword `;charset=utf-8` to `;charset=UTF-8`

So, this is an interesting case.  The spec says:

  Otherwise, if the header named `Content-Type` in author request headers has a value that
  is a valid MIME type, which has a `charset` parameter whose value is not a
  case-insensitive match for encoding, and encoding is not null, set all the `charset`
  parameters of that `Content-Type` header's value to encoding. 

So per spec and in our impl "foo/bar;charset=utf-8" is left as-is.  And per spec and in our impl "foo/bar;charset=haha" is changed to "foo/bar;charset=UTF-8".  But the interesting case is "foo/bar;charset=utf-8;charset=haha".  The spec language above finds the "charset=haha" and changes the "utf-8" to "UTF-8".  That seems odd to me; what we do (changing just the "haha" to "UTF-8") makes more sense.

Anne, is that spec language aiming for the effect it has (and if so why?), or is that just an accident of the way things got worded?
Flags: needinfo?(annevk)
I think I just went for simplicity. Happy to add another conditional: https://github.com/whatwg/xhr/commit/a1f8e140fef9e3ee8255ef58a2c71ff9d75933d2
Flags: needinfo?(annevk)
Reviewed the XMLHttpRequest/send-content-type-charset.htm test, will push the small change required to adjust to this.
Anne, thanks.

Michael, I'm still interested in whether we understand what those remaining failures are about.
Sorry for the delay, I didn't have time to work on friday, and have been out of town all weekend!

(In reply to Boris Zbarsky [:bz] from comment #8)
> Comment on attachment 8654865 [details] [diff] [review]
> Correct handling of author-supplied charsets in XMLHttpRequest
> 
> >+            // Find the start of the actual charset declaration, skipping the
> 
> How about fixing NS_ExtractCharsetFromContentType to just return the indices
> you care about?  This callsite is the only consumer of those indices, as far
> as I can tell (and yes, I checked the addons mxr).  Followup bug is probably
> better for this.
> 
> We should document NS_ExtractCharsetFromContentType or better yet its IDL
> version to say that it finds the _last_ charset parameter, since you're
> depending on that here, right?  Also, add some tests for this
> multiple-charset-param behavior, if the wpt tests don't test it already.

I'll file a follow-up bug for that. I'm pretty sure that this test checks the multiple parameter behavior: https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/XMLHttpRequest/send-content-type-charset.htm?offset=300#71-75

> Why are there remaining failures in the web platform test manifests that you
> changed?  I'm particularly looking at the "charset given but wrong, fix it
> (known MIME, bogus charset)" failure...

This is the failure you're asking about - https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/XMLHttpRequest/send-content-type-charset.htm?offset=300#61-65
It is expecting 
        "text/plain;charset=utf-8;charset=waddup" => "text/plain;charset=UTF-8;charset=UTF-8"
but we give it
        "text/plain;charset=utf-8;charset=waddup" => "text/plain;charset=utf-8;charset=UTF-8"
because we don't change case of charsets if they would be valid as lowercase.

With regard to the failures like [XMLHttpRequest: send() - charset parameter of Content-Type 7], I have no idea where they are coming from, I can't find them in the source :-/
> > I'm particularly looking at the "charset given but wrong, fix it
> > (known MIME, bogus charset)" failure...

> It is expecting 
>         "text/plain;charset=utf-8;charset=waddup" =>
> "text/plain;charset=UTF-8;charset=UTF-8"
> but we give it
>         "text/plain;charset=utf-8;charset=waddup" =>
> "text/plain;charset=utf-8;charset=UTF-8"

And that test will be fixed per Anne's spec change to expect what our implementation does.

> With regard to the failures like [XMLHttpRequest: send() - charset parameter
> of Content-Type 7], I have no idea where they are coming from, I can't find
> them in the source :-/

I recently rewrote that test case to make sure all the sub-tests had better descriptions. Is this failure possibly recorded for an earlier version of the test case where descriptions were different?
(In reply to Hallvord R. M. Steen [:hallvors] from comment #15)
> > > I'm particularly looking at the "charset given but wrong, fix it
> > > (known MIME, bogus charset)" failure...
> 
> > It is expecting 
> >         "text/plain;charset=utf-8;charset=waddup" =>
> > "text/plain;charset=UTF-8;charset=UTF-8"
> > but we give it
> >         "text/plain;charset=utf-8;charset=waddup" =>
> > "text/plain;charset=utf-8;charset=UTF-8"
> 
> And that test will be fixed per Anne's spec change to expect what our
> implementation does.
> 
> > With regard to the failures like [XMLHttpRequest: send() - charset parameter
> > of Content-Type 7], I have no idea where they are coming from, I can't find
> > them in the source :-/
> 
> I recently rewrote that test case to make sure all the sub-tests had better
> descriptions. Is this failure possibly recorded for an earlier version of
> the test case where descriptions were different?

That might be the case. I definitely don't see those failures when running the test directly. I'll try to remove them and run tests again, and we'll see if it works.
> and have been out of town all weekend!

Just so we're clear, I am expecting you to NOT work on weekends.  No need to be sorry about not working on your days off!
(It's not trivial to fix bug 918739 while you're at it, I presume?)
The attached patch fixes that bug, as far as I can tell.
Blocks: 918739
(In reply to Hallvord R. M. Steen [:hallvors] from comment #18)
> (It's not trivial to fix bug 918739 while you're at it, I presume?)

I actually think I fixed that in trying to fix this, didn't I? I removed a send-content-type-string failure.
Michael, I owe you a coffee or something :)

/me is always happy when Firefox's pass rate on XHR tests goes up, no matter how trivial the bug is :p
(In reply to Hallvord R. M. Steen [:hallvors] from comment #21)
> Michael, I owe you a coffee or something :)
> 
> /me is always happy when Firefox's pass rate on XHR tests goes up, no matter
> how trivial the bug is :p

:D
I removed the unnecessary failures - and it seems like the WPTs are still passing :)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=814879d598ab
Attachment #8654865 - Attachment is obsolete: true
Excellent.  If you get a chance, what's up with the remaining "send() - Content-Type 1" and "send() - Content-Type 2" failures?  Also the remaining "setRequestHeader() - Content-Type header ()" failure?
(In reply to Boris Zbarsky [:bz] from comment #24)
> Excellent.  If you get a chance, what's up with the remaining "send() -
> Content-Type 1" and "send() - Content-Type 2" failures?  Also the remaining
> "setRequestHeader() - Content-Type header ()" failure?

setRequestHeader() - Content-Type header () => If you call setRequestHeader("Content-Type", "") with the empty string as the header, then our implementation will incorrectly provide a default header. We should probably open a bug to fix that one, if we don't have one already.

send() - Content-Type 1/2 => We load a data: URI with a specified charset into an iframe, and then send it over XHR - we apparently ignore the data: URI's specified content type and charset, and use application/xml and UTF-8 unconditionally (as far as I can tell).

I feel like those failures are unrelated to this bug, so I plan to just land this, and then maybe deal with those later.
That sounds like a great plan.  I just wanted to make sure we understood the remaining failures.

That said, the "send() - Content-Type 1/2" case has a test bug; per spec we should in fact always be sending "UTF-8" as the encoding.  So the only issue is that for an HTML document we still serialize as application/xml, not text/html.  Definitely followup bug fodder.
https://hg.mozilla.org/mozilla-central/rev/d08142841263
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Ugh, I had an addon that was (erroneously, presumably) depending on the automatic charset addition, and there was no heads-up of this change.
Keywords: dev-doc-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: