Closed
Bug 663057
Opened 14 years ago
Closed 13 years ago
support RFC2231/5987 encoding for title parameter in HTTP link header fields
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla15
People
(Reporter: julian.reschke, Assigned: julian.reschke)
References
(Depends on 1 open bug, )
Details
(Keywords: addon-compat, dev-doc-needed)
Attachments
(2 files, 11 obsolete files)
18.59 KB,
patch
|
hsivonen
:
review+
|
Details | Diff | Splinter Review |
5.99 KB,
patch
|
jdm
:
review-
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: (mozilla-central nightly builds)
RFC 5988 states in <http://greenbytes.de/tech/webdav/rfc5988.html#rfc.section.5.4.p.5>:
"The "title*" parameter can be used to encode this label in a different character set, and/or contain language information as per [RFC5987]...."
A test case is available at <http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987>
Reproducible: Always
Steps to Reproduce:
1. Run test case at <http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987>
Actual Results:
no title information for the stylesheet available
Expected Results:
should extract title from title* parameter
To do this properly, we probably should refactor the RCF2231/5987 decoding in nsMIMEHeaderParamImpl so that it can be used from outside, such as the Link header field handling in nsContentSink.
Assignee | ||
Updated•14 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 1•14 years ago
|
||
(In reply to comment #0)
> To do this properly, we probably should refactor the RCF2231/5987 decoding
> in nsMIMEHeaderParamImpl so that it can be used from outside, such as the
> Link header field handling in nsContentSink.
Not trivial, as nsMIMEHeaderParamImpl operates on octets, while nsContentSink has UCS-2 characters.
Assignee | ||
Comment 2•14 years ago
|
||
Rough patch; help on making the decoder less ugly/more robust appreciated.
Assignee | ||
Comment 3•13 years ago
|
||
Attachment #551023 -
Flags: review?(bzbarsky)
Comment 4•13 years ago
|
||
I'm not likely to get to this review until after my vacation, so figure at least two weeks. May be worth asking someone else.
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #4)
> I'm not likely to get to this review until after my vacation, so figure at
> least two weeks. May be worth asking someone else.
No problem. This can wait, and you already have all the link-header field context present :-). This can wait a few weeks.
Assignee | ||
Comment 6•13 years ago
|
||
BTW: test cases over here...:
http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987
(and subsequent tests)
Assignee | ||
Updated•13 years ago
|
OS: Windows 7 → All
Hardware: x86 → All
Comment 7•13 years ago
|
||
Julian, is there a reason to not just check for non-ascii, and if not found convert to ASCII and then use the existing code?
Or alternately, adding API for this on nsIMIMEHeaderParam and sharing the code there (e.g. via a template on char type)?
I'm really not very happy with the code duplication here.
Assignee | ||
Comment 8•13 years ago
|
||
The code in nsIMIMEHeaderParam would need a big refactoring, and options to turn off all the workarounds it currently does. I absolutely would not want to leak all the quirks into another header field.
I would propose to go there later on once we have made progress on cleaning up that code, but not right now...
(We could also ask why we have separate header field parsers for Link and Content-Disposition)
Assignee | ||
Comment 9•13 years ago
|
||
...we could also just move the function into nsIMIMEHeaderParam, so it can be re-used from other places more easily.
Assignee | ||
Comment 10•13 years ago
|
||
Comment on attachment 551023 [details] [diff] [review]
proposed patch
Refactoring the patch in order to move most of the functionality into nsMIMEHeaderParamImpl.cpp
Attachment #551023 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 11•13 years ago
|
||
Patch, work-in-progress. Moves most of the code into nsMIMEHeaderParamImpl so that it can be re-used elsewhere. C-D does *not* use it for now, because it's intentionally strict.
Attachment #550820 -
Attachment is obsolete: true
Attachment #551023 -
Attachment is obsolete: true
Assignee | ||
Comment 12•13 years ago
|
||
Implements 5987 decoding for title* in Link HTTP header fields; delegates most of the decoding to new code in nsMIMEHeaderParamImpl. The latter doesn't use it itself for now because it is designed to be strict, and thus will reject instances that the old code would accept. Also adds JS test cases.
Attachment #561279 -
Attachment is obsolete: true
Attachment #561472 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 13•13 years ago
|
||
Attachment #561472 -
Attachment is obsolete: true
Attachment #561472 -
Flags: review?(bzbarsky)
Attachment #564831 -
Flags: review?(bzbarsky)
Comment 14•13 years ago
|
||
OK. So... I'm still not that happy with the code duplication. If we want the strict behavior, then why is not not ok for existing consumers? If it's not ok for consumers, why do we want it?
I'm also not sure I understand why this is restricting the encoding to utf-8 and iso-8859-1. Does the RFC say anything about that?
Assignee | ||
Comment 15•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #14)
> OK. So... I'm still not that happy with the code duplication. If we want
> the strict behavior, then why is not not ok for existing consumers? If it's
> not ok for consumers, why do we want it?
Let's categorize strictness here:
1) Parsing the field value
2) Decoding the percent-escapes
3) Mapping the octets to characters (with respect to broken octet sequences)
4) Supported character encodings
I think existing uses of C-D should be ok with 1 through 3; not sure about 4) (see below)
> I'm also not sure I understand why this is restricting the encoding to utf-8
> and iso-8859-1. Does the RFC say anything about that?
It's because there's no point to leave this open-ended. That was actually feedback from those browser vendors who didn't support the encoding 12 months ago.
Senders previously did not know what they can rely on so we had to require certain encodings. UTF-8 was obvious, ISO-8859-1 was kept because all UAs need to support it anyway. The latter is under discussion for RFC 5987bis.
For a new use of the encoding, I wanted to avoid another pointless source of interop problems, thus did not want to support more than the required encodings for Link.
How about parametrizing this, so C-D can start using the same code? (The public interface would still only support the required encodings for RFC 5987, but the C-D code would get a separate entry point).
Assignee | ||
Comment 16•13 years ago
|
||
updated patch brought up to date with fix for bug 693806
Attachment #564831 -
Attachment is obsolete: true
Attachment #564831 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 17•13 years ago
|
||
Attachment #568756 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → julian.reschke
Assignee | ||
Comment 18•13 years ago
|
||
Attachment #572401 -
Attachment is obsolete: true
Attachment #572607 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 19•13 years ago
|
||
Updated to apply cleanly on top of latest patch for bug 693806
Attachment #572607 -
Attachment is obsolete: true
Attachment #572607 -
Flags: review?(bzbarsky)
Attachment #575748 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 20•13 years ago
|
||
Comment on attachment 575748 [details] [diff] [review]
Proposed patch, incl test case
(patch needs to be updated)
Attachment #575748 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 21•13 years ago
|
||
Attachment #575748 -
Attachment is obsolete: true
Attachment #608982 -
Flags: review?(bzbarsky)
Comment 22•13 years ago
|
||
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806
I don't know when I'll be able to get to this. Please find another reviewer? Maybe one of the necko folks and/or sicking?
Updated•13 years ago
|
Attachment #608982 -
Flags: review?(bzbarsky) → review?(jonas)
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806
I don't know nearly enough about encodings to review this. Maybe Henri?
Attachment #608982 -
Flags: review?(jonas) → review?(hsivonen)
Updated•13 years ago
|
Attachment #608982 -
Attachment is patch: true
Assignee | ||
Updated•13 years ago
|
Attachment #608982 -
Flags: review?(hsivonen) → review?(jduell.mcbugs)
Assignee | ||
Comment 24•13 years ago
|
||
Jason has looked at my recent change to this code, so assigning to him for now (Jason: take your time: one change per release cycle is sufficient :-).
Comment 25•13 years ago
|
||
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806
Review of attachment 608982 [details] [diff] [review]:
-----------------------------------------------------------------
Henri, do you know encodings well enough to review this? I'd have to devote some time getting up to speed on this stuff to review it.
Attachment #608982 -
Flags: review?(jduell.mcbugs) → review?(hsivonen)
Comment 26•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #25)
> Henri, do you know encodings well enough to review this?
I'll take a look.
Target Milestone: --- → mozilla14
Version: unspecified → Trunk
Comment 27•13 years ago
|
||
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806
In general, I think it's not at all clear that it's worthwhile to support the Link header at all. It seems to lead to having to maintain a bunch of code without a clear benefit beyond making a few pages made by Hixie and Anne work.
>+bool
>+nsContentSink::Decode5987Format(PRUnichar *encoded) {
It seems bad to use PRUnichar* as an in-out parameter. It seem the caller ends up assigning into an nsAutoString anyway and this method does a string copy before returning with success. Please use nsAString& as the argument type instead of PRUnichar*.
>+ nsresult rv;
>+ nsCOMPtr<nsIMIMEHeaderParam> mimehdrpar =
>+ do_GetService(NS_MIMEHEADERPARAM_CONTRACTID, &rv);
>+ if (NS_FAILED(rv))
>+ return false;
Checking rv here is probably useless. If someone manages to override the service with something that doesn't instantiate, it's probably not worthwhile to try to keep the app running but behaving incorrectly.
(I expect the rest of this method to change substantially once the argument is nsAString&.)
>+ } else if (attr.LowerCaseEqualsLiteral("title*")) {
>+ if (titleStar.IsEmpty() && !needsUnescape) {
Could you, please, add a comment about the semantics of needsUnescape and mention that it's different kind of unescaping than what's about to happen here.
>+ * This function is purposefully picky; it will abort for all (most?)
>+ * invalid inputs. This is by design. In particular, it does not support
>+ * any character encodings other than UTF-8 and ISO-8859-1, in order
>+ * not to promote non-interoperable usage.
Please make it clear whether ISO-8859-1 means Windows-1252 per general Web convention or real ISO-8859-1 per some exception to the rule.
+1 for not supporting random encodings. (But does even ISO-8851-1 need to be supported?)
>+ AString decodeRFC5987Param(in ACString aParamVal,
>+ [optional] out ACString aLang);
Does it make sense to make string outparams optional? Won't the caller need to pass a string that could be written to anyway?
>+// percent-decode a value
>+// returns false on failure
>+bool PercentDecode(nsACString& aValue)
>+{
>+ char *c = (char *) nsMemory::Alloc(aValue.Length() + 1);
You could use moz_xmalloc and moz_free directly to avoid all the indirection.
>+ if (!c) {
>+ return false;
>+ }
No need to check for null after infallible malloc.
>+ [ // ISO-8859-1
>+ "ISO-8859-1''%A3%20rates", "\u00a3 rates", "", Cr.NS_OK ],
Maybe test some C1 range bytes.
Marking r- mainly because of the argument type of nsContentSink::Decode5987Format.
Attachment #608982 -
Flags: review?(hsivonen) → review-
Assignee | ||
Comment 28•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> ...
> >+bool
> >+nsContentSink::Decode5987Format(PRUnichar *encoded) {
>
> It seems bad to use PRUnichar* as an in-out parameter. It seem the caller
> ends up assigning into an nsAutoString anyway and this method does a string
> copy before returning with success. Please use nsAString& as the argument
> type instead of PRUnichar*.
Will do.
> >+ nsresult rv;
> >+ nsCOMPtr<nsIMIMEHeaderParam> mimehdrpar =
> >+ do_GetService(NS_MIMEHEADERPARAM_CONTRACTID, &rv);
> >+ if (NS_FAILED(rv))
> >+ return false;
>
> Checking rv here is probably useless. If someone manages to override the
> service with something that doesn't instantiate, it's probably not
> worthwhile to try to keep the app running but behaving incorrectly.
Copied and pasted from somewhere else. I honestly don't know what the right approach is. If it's not needed, we should clean up more code that uses this pattern, though.
> (I expect the rest of this method to change substantially once the argument
> is nsAString&.)
>
> >+ } else if (attr.LowerCaseEqualsLiteral("title*")) {
> >+ if (titleStar.IsEmpty() && !needsUnescape) {
>
> Could you, please, add a comment about the semantics of needsUnescape and
> mention that it's different kind of unescaping than what's about to happen
> here.
The comment is actually on the next line, but I'm also renaming "needsUnescape" to "wasQuotedString" in order to avoid confusion.
> >+ * This function is purposefully picky; it will abort for all (most?)
> >+ * invalid inputs. This is by design. In particular, it does not support
> >+ * any character encodings other than UTF-8 and ISO-8859-1, in order
> >+ * not to promote non-interoperable usage.
>
> Please make it clear whether ISO-8859-1 means Windows-1252 per general Web
> convention or real ISO-8859-1 per some exception to the rule.
It's real ISO-8859-1; will clarify.
> +1 for not supporting random encodings. (But does even ISO-8851-1 need to be
> supported?)
According to RFC 5987 yes. According to a future revision, probably not (see <http://greenbytes.de/tech/webdav/draft-reschke-rfc5987bis-issues.html#issue.iso-8859-1>). It appears it's best to simplify things ASAP and to only support UTF-8 here.
> >+ AString decodeRFC5987Param(in ACString aParamVal,
> >+ [optional] out ACString aLang);
>
> Does it make sense to make string outparams optional? Won't the caller need
> to pass a string that could be written to anyway?
I don't know. Will remove the flag.
> >+// percent-decode a value
> >+// returns false on failure
> >+bool PercentDecode(nsACString& aValue)
> >+{
> >+ char *c = (char *) nsMemory::Alloc(aValue.Length() + 1);
>
> You could use moz_xmalloc and moz_free directly to avoid all the indirection.
I wasn't aware of that. For now I'd prefer to stick with the coding convention used in this file. Maybe we should do a cleanup patch for all of it later on.
> >+ if (!c) {
> >+ return false;
> >+ }
>
> No need to check for null after infallible malloc.
The one currently used is fallible, right?
> >+ [ // ISO-8859-1
> >+ "ISO-8859-1''%A3%20rates", "\u00a3 rates", "", Cr.NS_OK ],
>
> Maybe test some C1 range bytes.
This would go if we remove ISO-8859-1 support.
> Marking r- mainly because of the argument type of
> nsContentSink::Decode5987Format.
Thanks for the feedback so far; will try to submit a revised patch soonish.
Assignee | ||
Comment 29•13 years ago
|
||
Changes the method signature, clarifies the unescaping flag, removes support for ISO-8859-1
Attachment #608982 -
Attachment is obsolete: true
Attachment #621396 -
Flags: review?(hsivonen)
Assignee | ||
Comment 30•13 years ago
|
||
Try results: https://tbpl.mozilla.org/?tree=Try&rev=a6b8ba23e36c
Comment 31•13 years ago
|
||
Comment on attachment 621396 [details] [diff] [review]
Proposed patch, incl test cases
Looks good. Sorry about the delay.
One nit though:
> if (! IsHexDigit(c[0]) || ! IsHexDigit(c[1])) {
(and other instances)
I'm not sure for networking code, but in DOM at least we tend not to put a space after the exclamation point in cases like this.
Attachment #621396 -
Flags: review?(hsivonen) → review+
Assignee | ||
Comment 32•13 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #31)
> Comment on attachment 621396 [details] [diff] [review]
> Proposed patch, incl test cases
>
> Looks good. Sorry about the delay.
>
> One nit though:
>
> > if (! IsHexDigit(c[0]) || ! IsHexDigit(c[1])) {
> (and other instances)
>
> I'm not sure for networking code, but in DOM at least we tend not to put a
> space after the exclamation point in cases like this.
Ack; I'll change that in my patch (there are other places in the code using that pattern as well, also written by myself, but I'll not change those right now).
Assignee | ||
Comment 33•13 years ago
|
||
Patch with "! " nit fixed.
Attachment #621396 -
Attachment is obsolete: true
Attachment #625626 -
Flags: review?(hsivonen)
Comment 34•13 years ago
|
||
Comment on attachment 625626 [details] [diff] [review]
Proposed patch, incl test case
Thanks.
Attachment #625626 -
Flags: review?(hsivonen) → review+
Assignee | ||
Updated•13 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 35•13 years ago
|
||
try results: https://tbpl.mozilla.org/?tree=Try&rev=9939d414f8d8
Comment 36•13 years ago
|
||
Try looks good, so landed it.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e58342de10e
Comment 37•13 years ago
|
||
Please clear the checkin-needed flag when landing on inbound.
Comment 38•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 39•13 years ago
|
||
Adding addon-compat and dev-doc-needed keywords because of the interface changes.
Keywords: addon-compat,
dev-doc-needed
Comment 40•13 years ago
|
||
Is "aAllowSubstitution" optional? If not, can it be, for the sake of backward add-on compatibility?
Assignee | ||
Comment 41•13 years ago
|
||
(In reply to Jorge Villalobos [:jorgev] from comment #40)
> Is "aAllowSubstitution" optional? If not, can it be, for the sake of
> backward add-on compatibility?
It probably could be made optional; if you can point me to an example where something like this is done I can give it a try.
Comment 42•13 years ago
|
||
If it's made optional, then for backwards compatibility it would have to default to true, or the sense would have to switch (aForbidSubstitution)
This is probably as close as it gets to a similar situation:
http://mxr.mozilla.org/mozilla-central/source/storage/public/mozIStorageConnection.idl#90
XMLHttpRequest.open takes an optional third boolean parameter that defaults to true, but it's really intended for JavaScript consumers.
Comment 43•13 years ago
|
||
Yeah, the simplest thing to do is to make it optional, and default to true.
That means putting "[optional]" before the argument and "[optional_argc]" before the whole method in the IDL. Then your signature will include a count of how many optional arguments (in your case, 0 or 1) were passed and if it's 0 you can use true. If it's 1, you use the passed-in value.
Assignee | ||
Comment 44•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #43)
> Yeah, the simplest thing to do is to make it optional, and default to true.
Should we open a separate bug to track this?
Assignee | ||
Comment 45•13 years ago
|
||
For some reason none of my unit tests is affected by not having the UTF8 converter change. So for now the simplest approach might be to back it out (from aurora, on which the patch applies, and trunk), and then let me worry later on about restoring the functionality I was trying to achieve.
(Please let me know if I should open a separate ticket for this change)
Attachment #640760 -
Flags: review?(bzbarsky)
Comment 46•13 years ago
|
||
Yes, please open a separate ticket. It'll need separate tracking for landing on branches and such...
(And for what it's worth, review requests on closed bugs don't show up in my normal search for review requests, so I only saw this today when I got the "this has been waiting for a long time" nagmail.)
All that said, it's too late for that patch. This code has gone to beta as-is. You can't revert the change now. You _do_ need to write a patch to make it optional, however. In a separate bug. And that bug needs to have tracking flags set for 15 and 16...
Comment 47•13 years ago
|
||
Comment on attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.
r-: binary-incompatible change on beta. Ship has sailed.
(Oh, and this binary-incompatible change initially happened without changing the iid, which was wrong.... This is why sr is required for interface changes.)
Attachment #640760 -
Flags: review?(bzbarsky) → review+
Comment 48•13 years ago
|
||
Comment on attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.
bz's finger slipped.
Attachment #640760 -
Flags: review+ → review-
Comment 49•13 years ago
|
||
Er, yes. ;)
Assignee | ||
Comment 50•13 years ago
|
||
OK, so I:
- open a separate ticket about the incompatible change,
- and we'll need to apply it to beta/aurora/central
Right?
Comment 51•13 years ago
|
||
Yes, ideally.
Assignee | ||
Comment 52•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #51)
> Yes, ideally.
-> Bug 776399
(and sorry for the chaos I caused)
Comment 53•13 years ago
|
||
It's not a big deal. Let's just fix it. ;)
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•