Closed
Bug 136538
Opened 22 years ago
Closed 22 years ago
data: protocol doesn't unescape its uri string
Categories
(Core :: Networking, defect, P1)
Core
Networking
Tracking
()
VERIFIED
FIXED
mozilla1.1alpha
People
(Reporter: rbs, Assigned: Biesinger)
References
()
Details
(Keywords: testcase)
Attachments
(1 file)
|
1.24 KB,
patch
|
bbaetz
:
review+
darin.moz
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
According to RFC 2397, the data: protocol should also work with escaped strings.
I am trying to use JavaScript to do: open("data:text/html," + escape(myString)),
but I am no getting the desired result because the data: protocol isn't
unescape'ing its uri string.
<begin quote from RFC 2397>
3. Syntax
dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype := [ type "/" subtype ] *( ";" parameter )
data := *urlchar
parameter := attribute "=" value
where "urlchar" is imported from [RFC2396], and "type", "subtype",
"attribute" and "value" are the corresponding tokens from [RFC2045],
represented using URL escaped encoding of [RFC2396] as necessary.
</end quote>
The following examples are given in the prose of RFC 2397 for illustration:
data:,A%20brief%20note <!-- renders as "A brief note" -->
data:text/plain;charset=iso-8859-7,%be%fg%be <!-- renders as greek letters -->
In mozilla, the hex sequences are not converted back to their numeric values.| Assignee | ||
Comment 1•22 years ago
|
||
Taking bug Note that data:,A%20brief%20note doesn't work at all right now because of the missing mimetype
Assignee: new-network-bugs → cbiesinger
| Assignee | ||
Comment 2•22 years ago
|
||
also makes data:,foo urls work again
| Assignee | ||
Comment 3•22 years ago
|
||
bbaetz, could you review this patch?
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.1alpha
I applied the patch. But the other case still doesn't seem to render properly: data:text/plain;charset=iso-8859-7,%be%fg%be
| Assignee | ||
Comment 5•22 years ago
|
||
rbs: yes... that's a somewhat strange case... I've observed that as well. However, imho the rendering is correct: %be%fg%be %be displays the character that's at 0xbe in the font, and 0xfg is an invalid code, so it isn't decoded which is somewhat logical too. As I don't know how this is supposed to look, I can't say if this renders properly :)
> %be displays the character that's at 0xbe in the font
Nope. It is meant to display the character 0xbe in the specified charset.| Assignee | ||
Comment 7•22 years ago
|
||
>> %be displays the character that's at 0xbe in the font
>Nope. It is meant to display the character 0xbe in the specified charset.
Hm, true. However, my statement is correct - I didn't mean to say what should
get displayed, but what is displayed.
Generally, I suspect that mapping charset position == font position.
Which character would you expect in this case if not 0xbe?
Comment 8•22 years ago
|
||
rbs: are you missing teh apparopriate fonts, maybe? biesi: Should you be using nsITextToSubURI to do teh unescaping, instead?
http://czyborra.com/charsets/iso8859.html#ISO-8859-7 > Generally, I suspect that mapping charset position == font position. FYI these are not necessarily the same (e.g., mathematical fonts, see http://www.mozilla.org/projects/mathml/fonts/encoding/. In general, the match is just a coincidence arising from de facto practices). My JavaScript "data:" was escaping some other weird math symbols, and apparently the default encoding of the data coming from the "data:" protocol case isn't utf-8 as per http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 When I specified charset=utf-8 explicitly, all went fine. So your patch therefore has fixed the original issue, and defaulting to utf-8 for the data: url is another matter.
| Assignee | ||
Comment 10•22 years ago
|
||
bbaetz: >biesi: Should you be using nsITextToSubURI to do teh unescaping, instead? I don't think so. I just unescape the characters. The html parser or something like that turns the result into UTF8 or UCS2 or whatever. rbs: >So your patch >therefore has fixed the original issue, and defaulting to utf-8 for the data: >url is another matter. Well, this rises the question which charset we should default to. Currently, we use US-ASCII, maybe we should use ISO-8859-1 - the data: RFC doesn't specify a default cahrset.
Comment 11•22 years ago
|
||
Comment on attachment 78561 [details] [diff] [review] patch rbs: %fg is not a valid hex encoding, thus it isn't a valid char/glyph/etc. r=bbaetz - default charset is a separate issue.
Attachment #78561 -
Flags: review+
| Reporter | ||
Comment 12•22 years ago
|
||
> Well, this rises the question which charset we should default to. I don't what the necko guys think -- although the spec is indicating utf-8: // http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 if (charset) mContentCharset = charset + sizeof("charset=") - 1; + else + mContentCharset = UTF-8 In the case of only "data:,...", RFC 2397 mentions US-ASCII: "If <mediatype> is omitted, it defaults to text/plain;charset=US-ASCII. As a shorthand, "text/plain" can be omitted but the charset parameter supplied."
| Reporter | ||
Comment 13•22 years ago
|
||
s/I don't what/I don't know what/
| Assignee | ||
Comment 14•22 years ago
|
||
I fail to see why the HTML spec is relevant, especially as that section talks about <a href="">. A HTTP spec would be more appropriate. Thanks for quoting the RFC, I somehow didn't see that part of it. I wonder what we use if a content type, but no charset is specified... looks like ISO-8859-1, which is OK imho and anyway, that doesn't belong in this bug!
| Reporter | ||
Comment 15•22 years ago
|
||
I don't know what HTTP says here. Usually in the absence of a spec, other specs are referred to, and actually I have seen that the default encoding that the HTML spec says about the URI is used internally in some places of the code base. But let's see what the necko guys say. Anyway although it would be instructive to know the definitive answer, that isn't really the point of this bug. Did you already ask for sr= for your patch? The bug is presently blocking the other bug 49721 - "View Partial Source capability".
Blocks: 49721
| Assignee | ||
Comment 16•22 years ago
|
||
yes, I already mailed darin for sr. if you want the default charset to be changed, please file a new bug.
Comment 17•22 years ago
|
||
Comment on attachment 78561 [details] [diff] [review] patch sr=darin
Attachment #78561 -
Flags: superreview+
| Reporter | ||
Comment 18•22 years ago
|
||
cbiesinger, since this is a little gem worth having, you might do well to ask for approval for the branch too if drivers like the patch.
| Assignee | ||
Comment 19•22 years ago
|
||
rbs, sure, I just mailed drivers.
| Assignee | ||
Updated•22 years ago
|
Keywords: mozilla1.0,
patch
Comment 20•22 years ago
|
||
Comment on attachment 78561 [details] [diff] [review] patch a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #78561 -
Flags: approval+
| Assignee | ||
Comment 21•22 years ago
|
||
checked in on branch leaving open for trunk checkin
Keywords: fixed1.0.0
| Assignee | ||
Comment 22•22 years ago
|
||
Checking in nsDataChannel.cpp; /cvsroot/mozilla/netwerk/protocol/data/src/nsDataChannel.cpp,v <-- nsDataChannel.cpp new revision: 1.56; previous revision: 1.55 done
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 23•22 years ago
|
||
VERIFIED trunk+branch: Please check the URL I provided as a valid testcase. If not, please put a new testcase in and I will test again. mozilla 1.1b, commercial 2002-08-13-1.0 (Win98).
Status: RESOLVED → VERIFIED
Whiteboard: checkmac, checklinux
Comment 24•22 years ago
|
||
VERIFIED: Mozilla 1.3a, Mac OS X.
Whiteboard: checkmac, checklinux → checklinux
You need to log in
before you can comment on or make changes to this bug.
Description
•