Closed Bug 567660 Opened 14 years ago Closed 14 years ago

cfx xpi can't produce install.rdf with non-ASCII characters

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: asqueella, Assigned: avarma)

References

Details

Attachments

(3 files)

patch 14 years ago Atul Varma [:atul] 997 bytes, patch	warner : review+	Details \| Diff \| Splinter Review
patch_with_unit_test 14 years ago Atul Varma [:atul] 1.63 KB, patch	warner : review+	Details \| Diff \| Splinter Review
patch w/ bwarner's comment 14 years ago Atul Varma [:atul] 2.47 KB, patch		Details \| Diff \| Splinter Review

Nickolay_Ponomarev

Reporter

Description

•

14 years ago

If you have non-ASCII data to be written to install.rdf, "cfx xpi" fails with the following trace:

Exporting extension to test.xpi.
Traceback (most recent call last):
  File "/Users/nickolay/dev/jetpack-sdk/bin/cfx", line 6, in <module>
    cuddlefish.run()
  File "/Users/nickolay/dev/jetpack-sdk/python-lib/cuddlefish/__init__.py", line 541, in run
    xpts=xpts)
  File "/Users/nickolay/dev/jetpack-sdk/python-lib/cuddlefish/xpi.py", line 10, in build_xpi
    open('.install.rdf', 'w').write(str(manifest))
  File "/Users/nickolay/dev/jetpack-sdk/python-lib/cuddlefish/rdf.py", line 11, in __str__
    self.dom.writexml(buf, encoding="utf-8")
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 1749, in writexml
    node.writexml(writer, indent, addindent, newl)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 817, in writexml
    node.writexml(writer,indent+addindent,addindent,newl)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 817, in writexml
    node.writexml(writer,indent+addindent,addindent,newl)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 817, in writexml
    node.writexml(writer,indent+addindent,addindent,newl)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 1036, in writexml
    _write_data(writer, "%s%s%s"%(indent, self.data, newl))
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 303, in _write_data
    writer.write(data)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in position 0: ordinal not in range(128)

I originally got this when I had:
    <em:translator>&#x017B;</em:translator>
in the template install.rdf, but you can get the same effect by using \u escapes in package.json, e.g.:
{
    "author": "\u017b",
    ...
}

Atul Varma [:atul]

Assignee

Comment 2

•

14 years ago

Attached patch patch — Details — Splinter Review

Atul Varma [:atul]

Assignee

Comment 3

•

14 years ago

Oops, my bad. This patch fixes the bug... we also use StringIO instead of cStringIO because the latter doesn't support unicode string buffers.

Atul Varma [:atul]

Assignee

Updated

•

14 years ago

Attachment #452881 - Flags: review?(warner-bugzilla)

Brian Warner [:warner :bwarner]

Comment 4

•

14 years ago

Comment on attachment 452881 [details] [diff] [review]
patch

I'm not too familiar with xml.dom.minidom . Does the encoding= argument to writexml() merely set an attribute, and not actually affect the way writexml() encodes its output? Equivalently, does writexml() write unicode into the buf object?

If so, this patch looks good. If not, I'd be worried about double-encoding.

Attachment #452881 - Flags: review?(warner-bugzilla) → review+

Brian Warner [:warner :bwarner]

Comment 5

•

14 years ago

oh, of course, it'd be nice to have a unit test for this.

Myk Melez [:myk] [@mykmelez]

Comment 6

•

14 years ago

Atul: can you land this?

Assignee: nobody → avarma

Keywords: checkin-needed

Atul Varma [:atul]

Assignee

Comment 7

•

14 years ago

Attached patch patch_with_unit_test — Details — Splinter Review

Atul Varma [:atul]

Assignee

Comment 8

•

14 years ago

Comment on attachment 470087 [details] [diff] [review]
patch_with_unit_test

Here's a version of the patch w/ a unit test. Indeed, the 'encoding' kwarg to dom.writexml() just sets the value of the eponymous attribute on the xml header, and the actual unicode object returned isn't encoded to that encoding, so we have to do that ourselves.

Attachment #470087 - Flags: review?(warner-bugzilla)

Atul Varma [:atul]

Assignee

Comment 9

•

14 years ago

Removed 'checkin-needed' since I've busted a patch w/ unit test...

Status: NEW → ASSIGNED

Keywords: checkin-needed

Brian Warner [:warner :bwarner]

Comment 10

•

14 years ago

Comment on attachment 470087 [details] [diff] [review]
patch_with_unit_test

I like it. The one suggestion I'd make is to add a comment into RDF.__str__
to explain the apparent double-encoding there, something like:

 # real files have an .encoding attribute and use it when you write() unicode
 # into them: they read()/write() unicode and put encoded bytes in the
 # backend file. StringIO objects read()/write() unicode and put unicode in
 # the backing store, so we must encode the output of getvalue() to get a
 # bytestring. (cStringIO objects are weirder: they effectively have
 # .encoding hardwired to "ascii" and put only bytes in the backing store, so
 # we can't use them here).
 #
 # The encoding= argument to dom.writexml() merely sets the XML header's
 # encoding= attribute. It still writes unencoded unicode to the output file,
 # so we have to encode it for real afterwards.

Attachment #470087 - Flags: review?(warner-bugzilla) → review+

Atul Varma [:atul]

Assignee

Comment 11

•

14 years ago

Attached patch patch w/ bwarner's comment — Details — Splinter Review

Atul Varma [:atul]

Assignee

Comment 12

•

14 years ago

Good idea, I've added the comment and pushed:

Bug 567660 - cfx xpi can't produce install.rdf with non-ASCII characters
Atul Varma [:atul]
http://hg.mozilla.org/labs/jetpack-sdk/rev/479b348157dc

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Mathias Panzenböck

Comment 14

•

14 years ago

I was wondering why you use StringIO at all and not just dom.toxml(encoding="utf-8")? And is the differing behaviour between StringIO and cStringIO really intended and can be expected to stay that way?

Myk Melez [:myk] [@mykmelez]

Comment 15

•

14 years ago

The Add-on SDK is no longer a Mozilla Labs experiment and has become a big enough project to warrant its own Bugzilla product, so the "Add-on SDK" product has been created for it, and I am moving its bugs to that product.

To filter bugmail related to this change, filter on the word "looptid".

Component: Jetpack SDK → General

Product: Mozilla Labs → Add-on SDK

QA Contact: jetpack-sdk → general

Version: Trunk → unspecified

Brian Warner [:warner :bwarner]

Comment 16

•

14 years ago

> I was wondering why you use StringIO at all and not just
> dom.toxml(encoding="utf-8")? And is the differing behaviour between
> StringIO and cStringIO really intended and can be expected to stay that
> way?

panzi: good points. From what I figured out, dom.toxml(encoding=) doesn't
actually affect the encoding at all: it merely sets an attribute in the XML
header, so we need some mechanism to get from our unicode data to the UTF-8
bytes in the output.

The behavioral differences between StringIO and cStringIO are surprising, and
probably a result of the optimized cStringIO version not being as
general-purpose as the original. In most cases, I'd expect the pure-Python
original (StringIO) to be canonical, for for alternative implementations to
try and converge on its behavior. So for our purposes, since StringIO's
performance is plenty, I think we're best off sticking to StringIO.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

cfx xpi can't produce install.rdf with non-ASCII characters

Categories

(Add-on SDK Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: asqueella, Assigned: avarma)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 14

Comment 15

Comment 16

Attachment

General

Description

File Name

Content Type