Closed Bug 666562 Opened 9 years ago Closed 9 years ago

HTTP link header field title param: quoted-string not processed correctly


(Core :: DOM: Core & HTML, defect)

Not set





(Reporter: julian.reschke, Assigned: julian.reschke)




(Whiteboard: [inbound])


(1 file)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build Identifier: 

When parsing the title param in an HTTP link header field, the quoted-string form isn't processed properly with respect to unescaping.

Reproducible: Always

Steps to Reproduce:
1. Visit <> for overview.
2. Run linked test at <>
3. The CSS should be loaded with the properly unescaped title observable from the loaded page.

Actual Results:  
It is not.
This code should generally be using the mime header param class, no?
Ever confirmed: true
(In reply to comment #1)
> This code should generally be using the mime header param class, no?

In a perfect world: yes.

Problem 1: the MIME header param class carries lots of historical baggage, bugs, and workarounds. We shouldn't extend those to new header fields that don't come with the same history.

Problem 2: the formats aren't as uniform as they should be. For instance, RFC 5988 allows quoted media types without making them use the quoted-string syntax. So parsers need special rules per *parameter* (I personally consider this a bug in RFC 5988, but before reporting errata over there I'd like to explore how close we can come with a sane parsing algorithm).

That being said: controlled code-reuse would be good; both for decoding quoted-string (just refactored in mime header parser a few weeks ago), and also in decoding 2231/5987 (which we'll need for bug 663057)
I think it would be useful to have a generic tokenizer that can process header field values as used in Content-Type, Content-Disposition, Link and several other header fields.

It would take a set of header field values (several for those header fields using the "," notation), and return a set of lists containing each token, where each token would come with:

- token name
- token value (which could be null it there was no "=")
- flags (was-quoted-string etc)

Parsers for individual header fields could then process the output of the generic tokenizer, such as applying 2231/5987 unescaping, 2047 handling (sigh), enforcing constraints like "first token must be disposition type", and so on.

I could give this a try in plain C, or do a p-o-c in Java or javascript. Would that be useful?
this patch implements quoted-string unescaping, fixing the test failures in <> and <>.

(it also removes redundant constants for PRUnichars)
Attachment #546328 - Flags: review?(bzbarsky)
Comment on attachment 546328 [details] [diff] [review]
implement quoted-string unescaping

This looks fine, but are we sure we don't want the unescaping for single-quoted strings?
Attachment #546328 - Flags: review?(bzbarsky) → review+
Assignee: nobody → julian.reschke
My plan is to actually remove the support for single-quoting, see bug 672079.
Keywords: checkin-needed
Whiteboard: [inbound]
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
OS: Windows 7 → All
Hardware: x86 → All
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.