Closed Bug 663057 Opened 13 years ago Closed 12 years ago

support RFC2231/5987 encoding for title parameter in HTTP link header fields

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: julian.reschke, Assigned: julian.reschke)

References

(Depends on 1 open bug, )

Details

(Keywords: addon-compat, dev-doc-needed)

Attachments

(2 files, 11 obsolete files)

18.59 KB, patch
hsivonen
: review+
Details | Diff | Splinter Review
5.99 KB, patch
jdm
: review-
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: (mozilla-central nightly builds)

RFC 5988 states in <http://greenbytes.de/tech/webdav/rfc5988.html#rfc.section.5.4.p.5>:

"The "title*" parameter can be used to encode this label in a different character set, and/or contain language information as per [RFC5987]...."

A test case is available at <http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987>

Reproducible: Always

Steps to Reproduce:
1. Run test case at <http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987>


Actual Results:  
no title information for the stylesheet available

Expected Results:  
should extract title from title* parameter

To do this properly, we probably should refactor the RCF2231/5987 decoding in nsMIMEHeaderParamImpl so that it can be used from outside, such as the Link header field handling in nsContentSink.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #0)
> To do this properly, we probably should refactor the RCF2231/5987 decoding
> in nsMIMEHeaderParamImpl so that it can be used from outside, such as the
> Link header field handling in nsContentSink.

Not trivial, as nsMIMEHeaderParamImpl operates on octets, while nsContentSink has UCS-2 characters.
Attached patch rough patch (obsolete) — Splinter Review
Rough patch; help on making the decoder less ugly/more robust appreciated.
Attached patch proposed patch (obsolete) — Splinter Review
Attachment #551023 - Flags: review?(bzbarsky)
I'm not likely to get to this review until after my vacation, so figure at least two weeks.  May be worth asking someone else.
(In reply to Boris Zbarsky (:bz) from comment #4)
> I'm not likely to get to this review until after my vacation, so figure at
> least two weeks.  May be worth asking someone else.

No problem. This can wait, and you already have all the link-header field context present :-). This can wait a few weeks.
BTW: test cases over here...:

  http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987

(and subsequent tests)
OS: Windows 7 → All
Hardware: x86 → All
Julian, is there a reason to not just check for non-ascii, and if not found convert to ASCII and then use the existing code?

Or alternately, adding API for this on nsIMIMEHeaderParam and sharing the code there (e.g. via a template on char type)?

I'm really not very happy with the code duplication here.
The code in nsIMIMEHeaderParam would need a big refactoring, and options to turn off all the workarounds it currently does. I absolutely would not want to leak all the quirks into another header field.

I would propose to go there later on once we have made progress on cleaning up that code, but not right now...

(We could also ask why we have separate header field parsers for Link and Content-Disposition)
...we could also just move the function into nsIMIMEHeaderParam, so it can be re-used from other places more easily.
Comment on attachment 551023 [details] [diff] [review]
proposed patch

Refactoring the patch in order to move most of the functionality into nsMIMEHeaderParamImpl.cpp
Attachment #551023 - Flags: review?(bzbarsky)
Attached patch proposed patch, work-in-progress (obsolete) — Splinter Review
Patch, work-in-progress. Moves most of the code into nsMIMEHeaderParamImpl so that it can be re-used elsewhere. C-D does *not* use it for now, because it's intentionally strict.
Attachment #550820 - Attachment is obsolete: true
Attachment #551023 - Attachment is obsolete: true
Attached patch proposed patch, incl test case (obsolete) — Splinter Review
Implements 5987 decoding for title* in Link HTTP header fields; delegates most of the decoding to new code in nsMIMEHeaderParamImpl. The latter doesn't use it itself for now because it is designed to be strict, and thus will reject instances that the old code would accept. Also adds JS test cases.
Attachment #561279 - Attachment is obsolete: true
Attachment #561472 - Flags: review?(bzbarsky)
Attachment #561472 - Attachment is obsolete: true
Attachment #561472 - Flags: review?(bzbarsky)
Attachment #564831 - Flags: review?(bzbarsky)
OK.  So... I'm still not that happy with the code duplication.  If we want the strict behavior, then why is not not ok for existing consumers?  If it's not ok for consumers, why do we want it?

I'm also not sure I understand why this is restricting the encoding to utf-8 and iso-8859-1.  Does the RFC say anything about that?
(In reply to Boris Zbarsky (:bz) from comment #14)
> OK.  So... I'm still not that happy with the code duplication.  If we want
> the strict behavior, then why is not not ok for existing consumers?  If it's
> not ok for consumers, why do we want it?

Let's categorize strictness here:

1) Parsing the field value
2) Decoding the percent-escapes
3) Mapping the octets to characters (with respect to broken octet sequences)
4) Supported character encodings

I think existing uses of C-D should be ok with 1 through 3; not sure about 4) (see below)

> I'm also not sure I understand why this is restricting the encoding to utf-8
> and iso-8859-1.  Does the RFC say anything about that?

It's because there's no point to leave this open-ended. That was actually feedback from those browser vendors who didn't support the encoding 12 months ago.

Senders previously did not know what they can rely on so we had to require certain encodings. UTF-8 was obvious, ISO-8859-1 was kept because all UAs need to support it anyway. The latter is under discussion for RFC 5987bis.

For a new use of the encoding, I wanted to avoid another pointless source of interop problems, thus did not want to support more than the required encodings for Link.

How about parametrizing this, so C-D can start using the same code? (The public interface would still only support the required encodings for RFC 5987, but the C-D code would get a separate entry point).
Depends on: 692414
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
updated patch brought up to date with fix for bug 693806
Attachment #564831 - Attachment is obsolete: true
Attachment #564831 - Flags: review?(bzbarsky)
Attachment #568756 - Attachment is obsolete: true
Assignee: nobody → julian.reschke
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
Attachment #572401 - Attachment is obsolete: true
Attachment #572607 - Flags: review?(bzbarsky)
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
Updated to apply cleanly on top of latest patch for bug 693806
Attachment #572607 - Attachment is obsolete: true
Attachment #572607 - Flags: review?(bzbarsky)
Attachment #575748 - Flags: review?(bzbarsky)
Comment on attachment 575748 [details] [diff] [review]
Proposed patch, incl test case

(patch needs to be updated)
Attachment #575748 - Flags: review?(bzbarsky)
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

I don't know when I'll be able to get to this.  Please find another reviewer?  Maybe one of the necko folks and/or sicking?
Attachment #608982 - Flags: review?(bzbarsky) → review?(jonas)
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

I don't know nearly enough about encodings to review this. Maybe Henri?
Attachment #608982 - Flags: review?(jonas) → review?(hsivonen)
Attachment #608982 - Attachment is patch: true
Attachment #608982 - Flags: review?(hsivonen) → review?(jduell.mcbugs)
Jason has looked at my recent change to this code, so assigning to him for now (Jason: take your time: one change per release cycle is sufficient :-).
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

Review of attachment 608982 [details] [diff] [review]:
-----------------------------------------------------------------

Henri, do you know encodings well enough to review this?  I'd have to devote some time getting up to speed on this stuff to review it.
Attachment #608982 - Flags: review?(jduell.mcbugs) → review?(hsivonen)
(In reply to Jason Duell (:jduell) from comment #25)
> Henri, do you know encodings well enough to review this?

I'll take a look.
Target Milestone: --- → mozilla14
Version: unspecified → Trunk
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

In general, I think it's not at all clear that it's worthwhile to support the Link header at all. It seems to lead to having to maintain a bunch of code without a clear benefit beyond making a few pages made by Hixie and Anne work.

>+bool
>+nsContentSink::Decode5987Format(PRUnichar *encoded) {

It seems bad to use PRUnichar* as an in-out parameter. It seem the caller ends up assigning into an nsAutoString anyway and this method does a string copy before returning with success. Please use nsAString& as the argument type instead of PRUnichar*.

>+  nsresult rv;
>+  nsCOMPtr<nsIMIMEHeaderParam> mimehdrpar =
>+  do_GetService(NS_MIMEHEADERPARAM_CONTRACTID, &rv);
>+  if (NS_FAILED(rv))
>+    return false;

Checking rv here is probably useless. If someone manages to override the service with something that doesn't instantiate, it's probably not worthwhile to try to keep the app running but behaving incorrectly.

(I expect the rest of this method to change substantially once the argument is nsAString&.)

>+          } else if (attr.LowerCaseEqualsLiteral("title*")) {
>+            if (titleStar.IsEmpty() && !needsUnescape) {

Could you, please, add a comment about the semantics of needsUnescape and mention that it's different kind of unescaping than what's about to happen here.

>+   * This function is purposefully picky; it will abort for all (most?)
>+   * invalid inputs. This is by design. In particular, it does not support
>+   * any character encodings other than UTF-8 and ISO-8859-1, in order
>+   * not to promote non-interoperable usage.

Please make it clear whether ISO-8859-1 means Windows-1252 per general Web convention or real ISO-8859-1 per some exception to the rule.

+1 for not supporting random encodings. (But does even ISO-8851-1 need to be supported?)

>+  AString decodeRFC5987Param(in ACString aParamVal, 
>+                             [optional] out ACString aLang);

Does it make sense to make string outparams optional? Won't the caller need to pass a string that could be written to anyway?

>+// percent-decode a value
>+// returns false on failure
>+bool PercentDecode(nsACString& aValue)
>+{
>+  char *c = (char *) nsMemory::Alloc(aValue.Length() + 1);

You could use moz_xmalloc and moz_free directly to avoid all the indirection.

>+  if (!c) {
>+    return false;
>+  }

No need to check for null after infallible malloc.

>+  [ // ISO-8859-1
>+    "ISO-8859-1''%A3%20rates", "\u00a3 rates", "", Cr.NS_OK ],

Maybe test some C1 range bytes.

Marking r- mainly because of the argument type of nsContentSink::Decode5987Format.
Attachment #608982 - Flags: review?(hsivonen) → review-
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> ...
> >+bool
> >+nsContentSink::Decode5987Format(PRUnichar *encoded) {
> 
> It seems bad to use PRUnichar* as an in-out parameter. It seem the caller
> ends up assigning into an nsAutoString anyway and this method does a string
> copy before returning with success. Please use nsAString& as the argument
> type instead of PRUnichar*.

Will do.
 
> >+  nsresult rv;
> >+  nsCOMPtr<nsIMIMEHeaderParam> mimehdrpar =
> >+  do_GetService(NS_MIMEHEADERPARAM_CONTRACTID, &rv);
> >+  if (NS_FAILED(rv))
> >+    return false;
> 
> Checking rv here is probably useless. If someone manages to override the
> service with something that doesn't instantiate, it's probably not
> worthwhile to try to keep the app running but behaving incorrectly.

Copied and pasted from somewhere else. I honestly don't know what the right approach is. If it's not needed, we should clean up more code that uses this pattern, though.

> (I expect the rest of this method to change substantially once the argument
> is nsAString&.)
> 
> >+          } else if (attr.LowerCaseEqualsLiteral("title*")) {
> >+            if (titleStar.IsEmpty() && !needsUnescape) {
> 
> Could you, please, add a comment about the semantics of needsUnescape and
> mention that it's different kind of unescaping than what's about to happen
> here.

The comment is actually on the next line, but I'm also renaming "needsUnescape" to "wasQuotedString" in order to avoid confusion.

> >+   * This function is purposefully picky; it will abort for all (most?)
> >+   * invalid inputs. This is by design. In particular, it does not support
> >+   * any character encodings other than UTF-8 and ISO-8859-1, in order
> >+   * not to promote non-interoperable usage.
> 
> Please make it clear whether ISO-8859-1 means Windows-1252 per general Web
> convention or real ISO-8859-1 per some exception to the rule.

It's real ISO-8859-1; will clarify.

> +1 for not supporting random encodings. (But does even ISO-8851-1 need to be
> supported?)

According to RFC 5987 yes. According to a future revision, probably not (see <http://greenbytes.de/tech/webdav/draft-reschke-rfc5987bis-issues.html#issue.iso-8859-1>). It appears it's best to simplify things ASAP and to only support UTF-8 here.

> >+  AString decodeRFC5987Param(in ACString aParamVal, 
> >+                             [optional] out ACString aLang);
> 
> Does it make sense to make string outparams optional? Won't the caller need
> to pass a string that could be written to anyway?

I don't know. Will remove the flag.

> >+// percent-decode a value
> >+// returns false on failure
> >+bool PercentDecode(nsACString& aValue)
> >+{
> >+  char *c = (char *) nsMemory::Alloc(aValue.Length() + 1);
> 
> You could use moz_xmalloc and moz_free directly to avoid all the indirection.

I wasn't aware of that. For now I'd prefer to stick with the coding convention used in this file. Maybe we should do a cleanup patch for all of it later on.

> >+  if (!c) {
> >+    return false;
> >+  }
> 
> No need to check for null after infallible malloc.

The one currently used is fallible, right?

> >+  [ // ISO-8859-1
> >+    "ISO-8859-1''%A3%20rates", "\u00a3 rates", "", Cr.NS_OK ],
> 
> Maybe test some C1 range bytes.

This would go if we remove ISO-8859-1 support.

> Marking r- mainly because of the argument type of
> nsContentSink::Decode5987Format.

Thanks for the feedback so far; will try to submit a revised patch soonish.
Attached patch Proposed patch, incl test cases (obsolete) — Splinter Review
Changes the method signature, clarifies the unescaping flag, removes support for ISO-8859-1
Attachment #608982 - Attachment is obsolete: true
Attachment #621396 - Flags: review?(hsivonen)
Comment on attachment 621396 [details] [diff] [review]
Proposed patch, incl test cases

Looks good. Sorry about the delay.

One nit though:

> if (! IsHexDigit(c[0]) || ! IsHexDigit(c[1])) {
(and other instances)

I'm not sure for networking code, but in DOM at least we tend not to put a space after the exclamation point in cases like this.
Attachment #621396 - Flags: review?(hsivonen) → review+
(In reply to Henri Sivonen (:hsivonen) from comment #31)
> Comment on attachment 621396 [details] [diff] [review]
> Proposed patch, incl test cases
> 
> Looks good. Sorry about the delay.
> 
> One nit though:
> 
> > if (! IsHexDigit(c[0]) || ! IsHexDigit(c[1])) {
> (and other instances)
> 
> I'm not sure for networking code, but in DOM at least we tend not to put a
> space after the exclamation point in cases like this.

Ack; I'll change that in my patch (there are other places in the code using that pattern as well, also written by myself, but I'll not change those right now).
Patch with "! " nit fixed.
Attachment #621396 - Attachment is obsolete: true
Attachment #625626 - Flags: review?(hsivonen)
Comment on attachment 625626 [details] [diff] [review]
Proposed patch, incl test case

Thanks.
Attachment #625626 - Flags: review?(hsivonen) → review+
Keywords: checkin-needed
Please clear the checkin-needed flag when landing on inbound.
Flags: in-testsuite+
Keywords: checkin-needed
Target Milestone: mozilla14 → mozilla15
https://hg.mozilla.org/mozilla-central/rev/3e58342de10e
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Adding addon-compat and dev-doc-needed keywords because of the interface changes.
Is "aAllowSubstitution" optional? If not, can it be, for the sake of backward add-on compatibility?
(In reply to Jorge Villalobos [:jorgev] from comment #40)
> Is "aAllowSubstitution" optional? If not, can it be, for the sake of
> backward add-on compatibility?

It probably could be made optional; if you can point me to an example where something like this is done I can give it a try.
If it's made optional, then for backwards compatibility it would have to default to true, or the sense would have to switch (aForbidSubstitution)

This is probably as close as it gets to a similar situation:

http://mxr.mozilla.org/mozilla-central/source/storage/public/mozIStorageConnection.idl#90

XMLHttpRequest.open takes an optional third boolean parameter that defaults to true, but it's really intended for JavaScript consumers.
Yeah, the simplest thing to do is to make it optional, and default to true.

That means putting "[optional]" before the argument and "[optional_argc]" before the whole method in the IDL.  Then your signature will include a count of how many optional arguments (in your case, 0 or 1) were passed and if it's 0 you can use true.  If it's 1, you use the passed-in value.
(In reply to Boris Zbarsky (:bz) from comment #43)
> Yeah, the simplest thing to do is to make it optional, and default to true.

Should we open a separate bug to track this?
For some reason none of my unit tests is affected by not having the UTF8 converter change. So for now the simplest approach might be to back it out (from aurora, on which the patch applies, and trunk), and then let me worry later on about restoring the functionality I was trying to achieve.

(Please let me know if I should open a separate ticket for this change)
Attachment #640760 - Flags: review?(bzbarsky)
Yes, please open a separate ticket.  It'll need separate tracking for landing on branches and such...

(And for what it's worth, review requests on closed bugs don't show up in my normal search for review requests, so I only saw this today when I got the "this has been waiting for a long time" nagmail.)

All that said, it's too late for that patch.  This code has gone to beta as-is.  You can't revert the change now.  You _do_ need to write a patch to make it optional, however.  In a separate bug.  And that bug needs to have tracking flags set for 15 and 16...
Comment on attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.

r-: binary-incompatible change on beta.  Ship has sailed.

(Oh, and this binary-incompatible change initially happened without changing the iid, which was wrong.... This is why sr is required for interface changes.)
Attachment #640760 - Flags: review?(bzbarsky) → review+
Comment on attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.

bz's finger slipped.
Attachment #640760 - Flags: review+ → review-
Er, yes.  ;)
OK, so I:

- open a separate ticket about the incompatible change, 

- and we'll need to apply it to beta/aurora/central

Right?
Yes, ideally.
(In reply to Boris Zbarsky (:bz) from comment #51)
> Yes, ideally.

-> Bug 776399 

(and sorry for the chaos I caused)
It's not a big deal.  Let's just fix it.  ;)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: