Closed Bug 136538 Opened 22 years ago Closed 22 years ago

data: protocol doesn't unescape its uri string

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.1alpha

People

(Reporter: rbs, Assigned: Biesinger)

References

()

Details

(Keywords: testcase)

Attachments

(1 file)

According to RFC 2397, the data: protocol should also work with escaped strings.
I am trying to use JavaScript to do: open("data:text/html," + escape(myString)),
but I am no getting the desired result because the data: protocol isn't 
unescape'ing its uri string.

<begin quote from RFC 2397>
3. Syntax

       dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
       mediatype  := [ type "/" subtype ] *( ";" parameter )
       data       := *urlchar
       parameter  := attribute "=" value

   where "urlchar" is imported from [RFC2396], and "type", "subtype",
   "attribute" and "value" are the corresponding tokens from [RFC2045],
   represented using URL escaped encoding of [RFC2396] as necessary.
</end quote>

The following examples are given in the prose of RFC 2397 for illustration:

data:,A%20brief%20note                       <!-- renders as "A brief note" -->
data:text/plain;charset=iso-8859-7,%be%fg%be <!-- renders as greek letters  -->

In mozilla, the hex sequences are not converted back to their numeric values.
Taking bug

Note that data:,A%20brief%20note doesn't work at all right now because of the
missing mimetype
Assignee: new-network-bugs → cbiesinger
Attached patch patchSplinter Review
also makes data:,foo urls work again
bbaetz, could you review this patch?
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.1alpha
I applied the patch. But the other case still doesn't seem to render properly:
data:text/plain;charset=iso-8859-7,%be%fg%be
rbs: yes... that's a somewhat strange case... I've observed that as well.

However, imho the rendering is correct:
%be%fg%be
%be displays the character that's at 0xbe in the font, and 0xfg is an invalid
code, so it isn't decoded which is somewhat logical too. As I don't know how
this is supposed to look, I can't say if this renders properly :)
> %be displays the character that's at 0xbe in the font
Nope. It is meant to display the character 0xbe in the specified charset.
>> %be displays the character that's at 0xbe in the font
>Nope. It is meant to display the character 0xbe in the specified charset.

Hm, true. However, my statement is correct - I didn't mean to say what should
get displayed, but what is displayed.

Generally, I suspect that mapping charset position == font position.

Which character would you expect in this case if not 0xbe?
rbs: are you missing teh apparopriate fonts, maybe?

biesi: Should you be using nsITextToSubURI to do teh unescaping, instead?
http://czyborra.com/charsets/iso8859.html#ISO-8859-7

> Generally, I suspect that mapping charset position == font position.

FYI these are not necessarily the same (e.g., mathematical fonts, see
http://www.mozilla.org/projects/mathml/fonts/encoding/. In general, the match is
just a coincidence arising from de facto practices).

My JavaScript "data:" was escaping some other weird math symbols, and apparently
the default encoding of the data coming from the "data:" protocol case isn't
utf-8 as per http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1

When I specified charset=utf-8 explicitly, all went fine. So your patch
therefore has fixed the original issue, and defaulting to utf-8 for the data:
url is another matter.
bbaetz:
>biesi: Should you be using nsITextToSubURI to do teh unescaping, instead?

I don't think so. I just unescape the characters. The html parser or something
like that turns the result into UTF8 or UCS2 or whatever.

rbs:
>So your patch
>therefore has fixed the original issue, and defaulting to utf-8 for the data:
>url is another matter.

Well, this rises the question which charset we should default to.
Currently, we use US-ASCII, maybe we should use ISO-8859-1 - the data: RFC
doesn't specify a default cahrset.
Comment on attachment 78561 [details] [diff] [review]
patch

rbs: %fg is not a valid hex encoding, thus it isn't a valid char/glyph/etc.

r=bbaetz - default charset is a separate issue.
Attachment #78561 - Flags: review+
> Well, this rises the question which charset we should default to.

I don't what the necko guys think -- although the spec is indicating utf-8:
// http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1

            if (charset)
                mContentCharset = charset + sizeof("charset=") - 1;
+           else
+               mContentCharset = UTF-8

In the case of only "data:,...", RFC 2397 mentions US-ASCII:
  "If <mediatype> is omitted, it
   defaults to text/plain;charset=US-ASCII.  As a shorthand,
   "text/plain" can be omitted but the charset parameter supplied."
s/I don't what/I don't know what/
I fail to see why the HTML spec is relevant, especially as that section talks
about <a href="">. A HTTP spec would be more appropriate.

Thanks for quoting the RFC, I somehow didn't see that part of it.

I wonder what we use if a content type, but no charset is specified... looks
like ISO-8859-1, which is OK imho and anyway, that doesn't belong in this bug!
I don't know what HTTP says here. Usually in the absence of a spec, other specs
are referred to, and actually I have seen that the default encoding that the
HTML spec says about the URI is used internally in some places of the code base.
But let's see what the necko guys say. Anyway although it would be instructive
to know the definitive answer, that isn't really the point of this bug.

Did you already ask for sr= for your patch? The bug is presently blocking the
other bug 49721 - "View Partial Source capability".
Blocks: 49721
yes, I already mailed darin for sr.

if you want the default charset to be changed, please file a new bug.
Comment on attachment 78561 [details] [diff] [review]
patch

sr=darin
Attachment #78561 - Flags: superreview+
cbiesinger, since this is a little gem worth having, you might do well to ask
for approval for the branch too if drivers like the patch.
rbs, sure, I just mailed drivers.
Comment on attachment 78561 [details] [diff] [review]
patch

a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #78561 - Flags: approval+
checked in on branch

leaving open for trunk checkin
Keywords: fixed1.0.0
Checking in nsDataChannel.cpp;
/cvsroot/mozilla/netwerk/protocol/data/src/nsDataChannel.cpp,v  <-- 
nsDataChannel.cpp
new revision: 1.56; previous revision: 1.55
done
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
VERIFIED trunk+branch:

Please check the URL I provided as a valid testcase. If not, please put a new
testcase in and I will test again.

mozilla 1.1b, commercial 2002-08-13-1.0 (Win98).
Status: RESOLVED → VERIFIED
Whiteboard: checkmac, checklinux
VERIFIED: Mozilla 1.3a, Mac OS X.
Whiteboard: checkmac, checklinux → checklinux
VERIFIED: Mozilla 1.3a, Linux
Whiteboard: checklinux
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: