Closed Bug 666562 Opened 14 years ago Closed 14 years ago

HTTP link header field title param: quoted-string not processed correctly

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: julian.reschke, Assigned: julian.reschke)

References

()

Details

(Whiteboard: [inbound])

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0 Build Identifier: When parsing the title param in an HTTP link header field, the quoted-string form isn't processed properly with respect to unescaping. Reproducible: Always Steps to Reproduce: 1. Visit <http://greenbytes.de/tech/tc/httplink/#simplecsstitleq> for overview. 2. Run linked test at <http://greenbytes.de/tech/tc/httplink/simplecsstitleq.asis> 3. The CSS should be loaded with the properly unescaped title observable from the loaded page. Actual Results: It is not.
This code should generally be using the mime header param class, no?
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #1) > This code should generally be using the mime header param class, no? In a perfect world: yes. Problem 1: the MIME header param class carries lots of historical baggage, bugs, and workarounds. We shouldn't extend those to new header fields that don't come with the same history. Problem 2: the formats aren't as uniform as they should be. For instance, RFC 5988 allows quoted media types without making them use the quoted-string syntax. So parsers need special rules per *parameter* (I personally consider this a bug in RFC 5988, but before reporting errata over there I'd like to explore how close we can come with a sane parsing algorithm). That being said: controlled code-reuse would be good; both for decoding quoted-string (just refactored in mime header parser a few weeks ago), and also in decoding 2231/5987 (which we'll need for bug 663057)
I think it would be useful to have a generic tokenizer that can process header field values as used in Content-Type, Content-Disposition, Link and several other header fields. It would take a set of header field values (several for those header fields using the "," notation), and return a set of lists containing each token, where each token would come with: - token name - token value (which could be null it there was no "=") - flags (was-quoted-string etc) Parsers for individual header fields could then process the output of the generic tokenizer, such as applying 2231/5987 unescaping, 2047 handling (sigh), enforcing constraints like "first token must be disposition type", and so on. I could give this a try in plain C, or do a p-o-c in Java or javascript. Would that be useful?
this patch implements quoted-string unescaping, fixing the test failures in <http://greenbytes.de/tech/tc/httplink/#simplecsstitleq> and <http://greenbytes.de/tech/tc/httplink/#simplecsstitleq2>. (it also removes redundant constants for PRUnichars)
Attachment #546328 - Flags: review?(bzbarsky)
Comment on attachment 546328 [details] [diff] [review] implement quoted-string unescaping This looks fine, but are we sure we don't want the unescaping for single-quoted strings?
Attachment #546328 - Flags: review?(bzbarsky) → review+
Assignee: nobody → julian.reschke
My plan is to actually remove the support for single-quoting, see bug 672079.
Keywords: checkin-needed
Whiteboard: [inbound]
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
OS: Windows 7 → All
Hardware: x86 → All
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: