Last Comment Bug 666562 - HTTP link header field title param: quoted-string not processed correctly
: HTTP link header field title param: quoted-string not processed correctly
Product: Core
Classification: Components
Component: DOM (show other bugs)
: unspecified
: All All
-- normal (vote)
: mozilla8
Assigned To: Julian Reschke
: Andrew Overholt [:overholt]
Depends on:
  Show dependency treegraph
Reported: 2011-06-23 05:55 PDT by Julian Reschke
Modified: 2011-08-09 01:12 PDT (History)
2 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

implement quoted-string unescaping (3.99 KB, patch)
2011-07-16 08:31 PDT, Julian Reschke
bzbarsky: review+
Details | Diff | Splinter Review

Description User image Julian Reschke 2011-06-23 05:55:25 PDT
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build Identifier: 

When parsing the title param in an HTTP link header field, the quoted-string form isn't processed properly with respect to unescaping.

Reproducible: Always

Steps to Reproduce:
1. Visit <> for overview.
2. Run linked test at <>
3. The CSS should be loaded with the properly unescaped title observable from the loaded page.

Actual Results:  
It is not.
Comment 1 User image Boris Zbarsky [:bz] (still a bit busy) 2011-06-23 08:09:20 PDT
This code should generally be using the mime header param class, no?
Comment 2 User image Julian Reschke 2011-06-23 08:18:28 PDT
(In reply to comment #1)
> This code should generally be using the mime header param class, no?

In a perfect world: yes.

Problem 1: the MIME header param class carries lots of historical baggage, bugs, and workarounds. We shouldn't extend those to new header fields that don't come with the same history.

Problem 2: the formats aren't as uniform as they should be. For instance, RFC 5988 allows quoted media types without making them use the quoted-string syntax. So parsers need special rules per *parameter* (I personally consider this a bug in RFC 5988, but before reporting errata over there I'd like to explore how close we can come with a sane parsing algorithm).

That being said: controlled code-reuse would be good; both for decoding quoted-string (just refactored in mime header parser a few weeks ago), and also in decoding 2231/5987 (which we'll need for bug 663057)
Comment 3 User image Julian Reschke 2011-07-08 05:32:36 PDT
I think it would be useful to have a generic tokenizer that can process header field values as used in Content-Type, Content-Disposition, Link and several other header fields.

It would take a set of header field values (several for those header fields using the "," notation), and return a set of lists containing each token, where each token would come with:

- token name
- token value (which could be null it there was no "=")
- flags (was-quoted-string etc)

Parsers for individual header fields could then process the output of the generic tokenizer, such as applying 2231/5987 unescaping, 2047 handling (sigh), enforcing constraints like "first token must be disposition type", and so on.

I could give this a try in plain C, or do a p-o-c in Java or javascript. Would that be useful?
Comment 4 User image Julian Reschke 2011-07-16 08:31:56 PDT
Created attachment 546328 [details] [diff] [review]
implement quoted-string unescaping

this patch implements quoted-string unescaping, fixing the test failures in <> and <>.

(it also removes redundant constants for PRUnichars)
Comment 5 User image Boris Zbarsky [:bz] (still a bit busy) 2011-07-22 08:36:05 PDT
Comment on attachment 546328 [details] [diff] [review]
implement quoted-string unescaping

This looks fine, but are we sure we don't want the unescaping for single-quoted strings?
Comment 6 User image Julian Reschke 2011-07-22 16:39:21 PDT
My plan is to actually remove the support for single-quoting, see bug 672079.
Comment 7 User image Marco Bonardo [::mak] 2011-07-25 06:13:54 PDT

Note You need to log in before you can comment on or make changes to this bug.