support RFC2231/5987 encoding for title parameter in HTTP link header fields

RESOLVED FIXED in mozilla15

Status

()

Core
DOM
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: Julian Reschke, Assigned: Julian Reschke)

Tracking

(Depends on: 1 bug, {addon-compat, dev-doc-needed})

Trunk
mozilla15
addon-compat, dev-doc-needed
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(2 attachments, 11 obsolete attachments)

18.59 KB, patch
hsivonen
: review+
Details | Diff | Splinter Review
5.99 KB, patch
jdm
: review-
Details | Diff | Splinter Review
(Assignee)

Description

6 years ago
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Build Identifier: (mozilla-central nightly builds)

RFC 5988 states in <http://greenbytes.de/tech/webdav/rfc5988.html#rfc.section.5.4.p.5>:

"The "title*" parameter can be used to encode this label in a different character set, and/or contain language information as per [RFC5987]...."

A test case is available at <http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987>

Reproducible: Always

Steps to Reproduce:
1. Run test case at <http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987>


Actual Results:  
no title information for the stylesheet available

Expected Results:  
should extract title from title* parameter

To do this properly, we probably should refactor the RCF2231/5987 decoding in nsMIMEHeaderParamImpl so that it can be used from outside, such as the Link header field handling in nsContentSink.
(Assignee)

Updated

6 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Comment 1

6 years ago
(In reply to comment #0)
> To do this properly, we probably should refactor the RCF2231/5987 decoding
> in nsMIMEHeaderParamImpl so that it can be used from outside, such as the
> Link header field handling in nsContentSink.

Not trivial, as nsMIMEHeaderParamImpl operates on octets, while nsContentSink has UCS-2 characters.
(Assignee)

Comment 2

6 years ago
Created attachment 550820 [details] [diff] [review]
rough patch

Rough patch; help on making the decoder less ugly/more robust appreciated.
(Assignee)

Comment 3

6 years ago
Created attachment 551023 [details] [diff] [review]
proposed patch
Attachment #551023 - Flags: review?(bzbarsky)
I'm not likely to get to this review until after my vacation, so figure at least two weeks.  May be worth asking someone else.
(Assignee)

Comment 5

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #4)
> I'm not likely to get to this review until after my vacation, so figure at
> least two weeks.  May be worth asking someone else.

No problem. This can wait, and you already have all the link-header field context present :-). This can wait a few weeks.
(Assignee)

Comment 6

6 years ago
BTW: test cases over here...:

  http://greenbytes.de/tech/tc/httplink/#simplecsstitle5987

(and subsequent tests)
(Assignee)

Updated

6 years ago
OS: Windows 7 → All
Hardware: x86 → All
Julian, is there a reason to not just check for non-ascii, and if not found convert to ASCII and then use the existing code?

Or alternately, adding API for this on nsIMIMEHeaderParam and sharing the code there (e.g. via a template on char type)?

I'm really not very happy with the code duplication here.
(Assignee)

Comment 8

6 years ago
The code in nsIMIMEHeaderParam would need a big refactoring, and options to turn off all the workarounds it currently does. I absolutely would not want to leak all the quirks into another header field.

I would propose to go there later on once we have made progress on cleaning up that code, but not right now...

(We could also ask why we have separate header field parsers for Link and Content-Disposition)
(Assignee)

Comment 9

6 years ago
...we could also just move the function into nsIMIMEHeaderParam, so it can be re-used from other places more easily.
(Assignee)

Comment 10

6 years ago
Comment on attachment 551023 [details] [diff] [review]
proposed patch

Refactoring the patch in order to move most of the functionality into nsMIMEHeaderParamImpl.cpp
Attachment #551023 - Flags: review?(bzbarsky)
(Assignee)

Comment 11

6 years ago
Created attachment 561279 [details] [diff] [review]
proposed patch, work-in-progress

Patch, work-in-progress. Moves most of the code into nsMIMEHeaderParamImpl so that it can be re-used elsewhere. C-D does *not* use it for now, because it's intentionally strict.
Attachment #550820 - Attachment is obsolete: true
Attachment #551023 - Attachment is obsolete: true
(Assignee)

Comment 12

6 years ago
Created attachment 561472 [details] [diff] [review]
proposed patch, incl test case

Implements 5987 decoding for title* in Link HTTP header fields; delegates most of the decoding to new code in nsMIMEHeaderParamImpl. The latter doesn't use it itself for now because it is designed to be strict, and thus will reject instances that the old code would accept. Also adds JS test cases.
Attachment #561279 - Attachment is obsolete: true
Attachment #561472 - Flags: review?(bzbarsky)
(Assignee)

Comment 13

6 years ago
Created attachment 564831 [details] [diff] [review]
proposed patch brought up-to-date with trunk, incl test cases
Attachment #561472 - Attachment is obsolete: true
Attachment #561472 - Flags: review?(bzbarsky)
Attachment #564831 - Flags: review?(bzbarsky)
OK.  So... I'm still not that happy with the code duplication.  If we want the strict behavior, then why is not not ok for existing consumers?  If it's not ok for consumers, why do we want it?

I'm also not sure I understand why this is restricting the encoding to utf-8 and iso-8859-1.  Does the RFC say anything about that?
(Assignee)

Comment 15

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #14)
> OK.  So... I'm still not that happy with the code duplication.  If we want
> the strict behavior, then why is not not ok for existing consumers?  If it's
> not ok for consumers, why do we want it?

Let's categorize strictness here:

1) Parsing the field value
2) Decoding the percent-escapes
3) Mapping the octets to characters (with respect to broken octet sequences)
4) Supported character encodings

I think existing uses of C-D should be ok with 1 through 3; not sure about 4) (see below)

> I'm also not sure I understand why this is restricting the encoding to utf-8
> and iso-8859-1.  Does the RFC say anything about that?

It's because there's no point to leave this open-ended. That was actually feedback from those browser vendors who didn't support the encoding 12 months ago.

Senders previously did not know what they can rely on so we had to require certain encodings. UTF-8 was obvious, ISO-8859-1 was kept because all UAs need to support it anyway. The latter is under discussion for RFC 5987bis.

For a new use of the encoding, I wanted to avoid another pointless source of interop problems, thus did not want to support more than the required encodings for Link.

How about parametrizing this, so C-D can start using the same code? (The public interface would still only support the required encodings for RFC 5987, but the C-D code would get a separate entry point).
(Assignee)

Updated

6 years ago
Depends on: 692414
(Assignee)

Comment 16

6 years ago
Created attachment 568756 [details] [diff] [review]
Proposed patch, incl test case

updated patch brought up to date with fix for bug 693806
Attachment #564831 - Attachment is obsolete: true
Attachment #564831 - Flags: review?(bzbarsky)
(Assignee)

Comment 17

6 years ago
Created attachment 572401 [details] [diff] [review]
Proposed patch, incl test case (work-in-progress)
Attachment #568756 - Attachment is obsolete: true
(Assignee)

Updated

6 years ago
Assignee: nobody → julian.reschke
(Assignee)

Comment 18

6 years ago
Created attachment 572607 [details] [diff] [review]
Proposed patch, incl test case
Attachment #572401 - Attachment is obsolete: true
Attachment #572607 - Flags: review?(bzbarsky)
(Assignee)

Comment 19

6 years ago
Created attachment 575748 [details] [diff] [review]
Proposed patch, incl test case

Updated to apply cleanly on top of latest patch for bug 693806
Attachment #572607 - Attachment is obsolete: true
Attachment #572607 - Flags: review?(bzbarsky)
Attachment #575748 - Flags: review?(bzbarsky)
(Assignee)

Comment 20

6 years ago
Comment on attachment 575748 [details] [diff] [review]
Proposed patch, incl test case

(patch needs to be updated)
Attachment #575748 - Flags: review?(bzbarsky)
(Assignee)

Comment 21

6 years ago
Created attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

Try build: https://tbpl.mozilla.org/?tree=Try&rev=6a3900f16bcb
Attachment #575748 - Attachment is obsolete: true
Attachment #608982 - Flags: review?(bzbarsky)
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

I don't know when I'll be able to get to this.  Please find another reviewer?  Maybe one of the necko folks and/or sicking?
Attachment #608982 - Flags: review?(bzbarsky) → review?(jonas)
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

I don't know nearly enough about encodings to review this. Maybe Henri?
Attachment #608982 - Flags: review?(jonas) → review?(hsivonen)
Attachment #608982 - Attachment is patch: true
(Assignee)

Updated

5 years ago
Attachment #608982 - Flags: review?(hsivonen) → review?(jduell.mcbugs)
(Assignee)

Comment 24

5 years ago
Jason has looked at my recent change to this code, so assigning to him for now (Jason: take your time: one change per release cycle is sufficient :-).
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

Review of attachment 608982 [details] [diff] [review]:
-----------------------------------------------------------------

Henri, do you know encodings well enough to review this?  I'd have to devote some time getting up to speed on this stuff to review it.
Attachment #608982 - Flags: review?(jduell.mcbugs) → review?(hsivonen)
(In reply to Jason Duell (:jduell) from comment #25)
> Henri, do you know encodings well enough to review this?

I'll take a look.
Target Milestone: --- → mozilla14
Version: unspecified → Trunk
Comment on attachment 608982 [details] [diff] [review]
Proposed patch, incl test case, applies on top of patch for bug 693806

In general, I think it's not at all clear that it's worthwhile to support the Link header at all. It seems to lead to having to maintain a bunch of code without a clear benefit beyond making a few pages made by Hixie and Anne work.

>+bool
>+nsContentSink::Decode5987Format(PRUnichar *encoded) {

It seems bad to use PRUnichar* as an in-out parameter. It seem the caller ends up assigning into an nsAutoString anyway and this method does a string copy before returning with success. Please use nsAString& as the argument type instead of PRUnichar*.

>+  nsresult rv;
>+  nsCOMPtr<nsIMIMEHeaderParam> mimehdrpar =
>+  do_GetService(NS_MIMEHEADERPARAM_CONTRACTID, &rv);
>+  if (NS_FAILED(rv))
>+    return false;

Checking rv here is probably useless. If someone manages to override the service with something that doesn't instantiate, it's probably not worthwhile to try to keep the app running but behaving incorrectly.

(I expect the rest of this method to change substantially once the argument is nsAString&.)

>+          } else if (attr.LowerCaseEqualsLiteral("title*")) {
>+            if (titleStar.IsEmpty() && !needsUnescape) {

Could you, please, add a comment about the semantics of needsUnescape and mention that it's different kind of unescaping than what's about to happen here.

>+   * This function is purposefully picky; it will abort for all (most?)
>+   * invalid inputs. This is by design. In particular, it does not support
>+   * any character encodings other than UTF-8 and ISO-8859-1, in order
>+   * not to promote non-interoperable usage.

Please make it clear whether ISO-8859-1 means Windows-1252 per general Web convention or real ISO-8859-1 per some exception to the rule.

+1 for not supporting random encodings. (But does even ISO-8851-1 need to be supported?)

>+  AString decodeRFC5987Param(in ACString aParamVal, 
>+                             [optional] out ACString aLang);

Does it make sense to make string outparams optional? Won't the caller need to pass a string that could be written to anyway?

>+// percent-decode a value
>+// returns false on failure
>+bool PercentDecode(nsACString& aValue)
>+{
>+  char *c = (char *) nsMemory::Alloc(aValue.Length() + 1);

You could use moz_xmalloc and moz_free directly to avoid all the indirection.

>+  if (!c) {
>+    return false;
>+  }

No need to check for null after infallible malloc.

>+  [ // ISO-8859-1
>+    "ISO-8859-1''%A3%20rates", "\u00a3 rates", "", Cr.NS_OK ],

Maybe test some C1 range bytes.

Marking r- mainly because of the argument type of nsContentSink::Decode5987Format.
Attachment #608982 - Flags: review?(hsivonen) → review-
(Assignee)

Comment 28

5 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #27)
> ...
> >+bool
> >+nsContentSink::Decode5987Format(PRUnichar *encoded) {
> 
> It seems bad to use PRUnichar* as an in-out parameter. It seem the caller
> ends up assigning into an nsAutoString anyway and this method does a string
> copy before returning with success. Please use nsAString& as the argument
> type instead of PRUnichar*.

Will do.
 
> >+  nsresult rv;
> >+  nsCOMPtr<nsIMIMEHeaderParam> mimehdrpar =
> >+  do_GetService(NS_MIMEHEADERPARAM_CONTRACTID, &rv);
> >+  if (NS_FAILED(rv))
> >+    return false;
> 
> Checking rv here is probably useless. If someone manages to override the
> service with something that doesn't instantiate, it's probably not
> worthwhile to try to keep the app running but behaving incorrectly.

Copied and pasted from somewhere else. I honestly don't know what the right approach is. If it's not needed, we should clean up more code that uses this pattern, though.

> (I expect the rest of this method to change substantially once the argument
> is nsAString&.)
> 
> >+          } else if (attr.LowerCaseEqualsLiteral("title*")) {
> >+            if (titleStar.IsEmpty() && !needsUnescape) {
> 
> Could you, please, add a comment about the semantics of needsUnescape and
> mention that it's different kind of unescaping than what's about to happen
> here.

The comment is actually on the next line, but I'm also renaming "needsUnescape" to "wasQuotedString" in order to avoid confusion.

> >+   * This function is purposefully picky; it will abort for all (most?)
> >+   * invalid inputs. This is by design. In particular, it does not support
> >+   * any character encodings other than UTF-8 and ISO-8859-1, in order
> >+   * not to promote non-interoperable usage.
> 
> Please make it clear whether ISO-8859-1 means Windows-1252 per general Web
> convention or real ISO-8859-1 per some exception to the rule.

It's real ISO-8859-1; will clarify.

> +1 for not supporting random encodings. (But does even ISO-8851-1 need to be
> supported?)

According to RFC 5987 yes. According to a future revision, probably not (see <http://greenbytes.de/tech/webdav/draft-reschke-rfc5987bis-issues.html#issue.iso-8859-1>). It appears it's best to simplify things ASAP and to only support UTF-8 here.

> >+  AString decodeRFC5987Param(in ACString aParamVal, 
> >+                             [optional] out ACString aLang);
> 
> Does it make sense to make string outparams optional? Won't the caller need
> to pass a string that could be written to anyway?

I don't know. Will remove the flag.

> >+// percent-decode a value
> >+// returns false on failure
> >+bool PercentDecode(nsACString& aValue)
> >+{
> >+  char *c = (char *) nsMemory::Alloc(aValue.Length() + 1);
> 
> You could use moz_xmalloc and moz_free directly to avoid all the indirection.

I wasn't aware of that. For now I'd prefer to stick with the coding convention used in this file. Maybe we should do a cleanup patch for all of it later on.

> >+  if (!c) {
> >+    return false;
> >+  }
> 
> No need to check for null after infallible malloc.

The one currently used is fallible, right?

> >+  [ // ISO-8859-1
> >+    "ISO-8859-1''%A3%20rates", "\u00a3 rates", "", Cr.NS_OK ],
> 
> Maybe test some C1 range bytes.

This would go if we remove ISO-8859-1 support.

> Marking r- mainly because of the argument type of
> nsContentSink::Decode5987Format.

Thanks for the feedback so far; will try to submit a revised patch soonish.
(Assignee)

Comment 29

5 years ago
Created attachment 621396 [details] [diff] [review]
Proposed patch, incl test cases

Changes the method signature, clarifies the unescaping flag, removes support for ISO-8859-1
Attachment #608982 - Attachment is obsolete: true
Attachment #621396 - Flags: review?(hsivonen)
(Assignee)

Comment 30

5 years ago
Try results: https://tbpl.mozilla.org/?tree=Try&rev=a6b8ba23e36c
Comment on attachment 621396 [details] [diff] [review]
Proposed patch, incl test cases

Looks good. Sorry about the delay.

One nit though:

> if (! IsHexDigit(c[0]) || ! IsHexDigit(c[1])) {
(and other instances)

I'm not sure for networking code, but in DOM at least we tend not to put a space after the exclamation point in cases like this.
Attachment #621396 - Flags: review?(hsivonen) → review+
(Assignee)

Comment 32

5 years ago
(In reply to Henri Sivonen (:hsivonen) from comment #31)
> Comment on attachment 621396 [details] [diff] [review]
> Proposed patch, incl test cases
> 
> Looks good. Sorry about the delay.
> 
> One nit though:
> 
> > if (! IsHexDigit(c[0]) || ! IsHexDigit(c[1])) {
> (and other instances)
> 
> I'm not sure for networking code, but in DOM at least we tend not to put a
> space after the exclamation point in cases like this.

Ack; I'll change that in my patch (there are other places in the code using that pattern as well, also written by myself, but I'll not change those right now).
(Assignee)

Comment 33

5 years ago
Created attachment 625626 [details] [diff] [review]
Proposed patch, incl test case

Patch with "! " nit fixed.
Attachment #621396 - Attachment is obsolete: true
Attachment #625626 - Flags: review?(hsivonen)
Comment on attachment 625626 [details] [diff] [review]
Proposed patch, incl test case

Thanks.
Attachment #625626 - Flags: review?(hsivonen) → review+
(Assignee)

Updated

5 years ago
Keywords: checkin-needed
(Assignee)

Comment 35

5 years ago
try results: https://tbpl.mozilla.org/?tree=Try&rev=9939d414f8d8
Try looks good, so landed it.

  https://hg.mozilla.org/integration/mozilla-inbound/rev/3e58342de10e
Please clear the checkin-needed flag when landing on inbound.
Flags: in-testsuite+
Keywords: checkin-needed
Target Milestone: mozilla14 → mozilla15
https://hg.mozilla.org/mozilla-central/rev/3e58342de10e
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Adding addon-compat and dev-doc-needed keywords because of the interface changes.
Keywords: addon-compat, dev-doc-needed
Is "aAllowSubstitution" optional? If not, can it be, for the sake of backward add-on compatibility?
(Assignee)

Comment 41

5 years ago
(In reply to Jorge Villalobos [:jorgev] from comment #40)
> Is "aAllowSubstitution" optional? If not, can it be, for the sake of
> backward add-on compatibility?

It probably could be made optional; if you can point me to an example where something like this is done I can give it a try.
If it's made optional, then for backwards compatibility it would have to default to true, or the sense would have to switch (aForbidSubstitution)

This is probably as close as it gets to a similar situation:

http://mxr.mozilla.org/mozilla-central/source/storage/public/mozIStorageConnection.idl#90

XMLHttpRequest.open takes an optional third boolean parameter that defaults to true, but it's really intended for JavaScript consumers.
Yeah, the simplest thing to do is to make it optional, and default to true.

That means putting "[optional]" before the argument and "[optional_argc]" before the whole method in the IDL.  Then your signature will include a count of how many optional arguments (in your case, 0 or 1) were passed and if it's 0 you can use true.  If it's 1, you use the passed-in value.
(Assignee)

Comment 44

5 years ago
(In reply to Boris Zbarsky (:bz) from comment #43)
> Yeah, the simplest thing to do is to make it optional, and default to true.

Should we open a separate bug to track this?
(Assignee)

Comment 45

5 years ago
Created attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.

For some reason none of my unit tests is affected by not having the UTF8 converter change. So for now the simplest approach might be to back it out (from aurora, on which the patch applies, and trunk), and then let me worry later on about restoring the functionality I was trying to achieve.

(Please let me know if I should open a separate ticket for this change)
Attachment #640760 - Flags: review?(bzbarsky)
Yes, please open a separate ticket.  It'll need separate tracking for landing on branches and such...

(And for what it's worth, review requests on closed bugs don't show up in my normal search for review requests, so I only saw this today when I got the "this has been waiting for a long time" nagmail.)

All that said, it's too late for that patch.  This code has gone to beta as-is.  You can't revert the change now.  You _do_ need to write a patch to make it optional, however.  In a separate bug.  And that bug needs to have tracking flags set for 15 and 16...
Comment on attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.

r-: binary-incompatible change on beta.  Ship has sailed.

(Oh, and this binary-incompatible change initially happened without changing the iid, which was wrong.... This is why sr is required for interface changes.)
Attachment #640760 - Flags: review?(bzbarsky) → review+
Comment on attachment 640760 [details] [diff] [review]
This change backs out the changes from the UTF8 converter.

bz's finger slipped.
Attachment #640760 - Flags: review+ → review-
Er, yes.  ;)
(Assignee)

Comment 50

5 years ago
OK, so I:

- open a separate ticket about the incompatible change, 

- and we'll need to apply it to beta/aurora/central

Right?
Yes, ideally.
(Assignee)

Comment 52

5 years ago
(In reply to Boris Zbarsky (:bz) from comment #51)
> Yes, ideally.

-> Bug 776399 

(and sorry for the chaos I caused)
It's not a big deal.  Let's just fix it.  ;)
You need to log in before you can comment on or make changes to this bug.