Closed
Bug 567660
Opened 14 years ago
Closed 14 years ago
cfx xpi can't produce install.rdf with non-ASCII characters
Categories
(Add-on SDK Graveyard :: General, defect)
Add-on SDK Graveyard
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: asqueella, Assigned: avarma)
References
Details
Attachments
(3 files)
997 bytes,
patch
|
warner
:
review+
|
Details | Diff | Splinter Review |
1.63 KB,
patch
|
warner
:
review+
|
Details | Diff | Splinter Review |
2.47 KB,
patch
|
Details | Diff | Splinter Review |
If you have non-ASCII data to be written to install.rdf, "cfx xpi" fails with the following trace: Exporting extension to test.xpi. Traceback (most recent call last): File "/Users/nickolay/dev/jetpack-sdk/bin/cfx", line 6, in <module> cuddlefish.run() File "/Users/nickolay/dev/jetpack-sdk/python-lib/cuddlefish/__init__.py", line 541, in run xpts=xpts) File "/Users/nickolay/dev/jetpack-sdk/python-lib/cuddlefish/xpi.py", line 10, in build_xpi open('.install.rdf', 'w').write(str(manifest)) File "/Users/nickolay/dev/jetpack-sdk/python-lib/cuddlefish/rdf.py", line 11, in __str__ self.dom.writexml(buf, encoding="utf-8") File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 1749, in writexml node.writexml(writer, indent, addindent, newl) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 817, in writexml node.writexml(writer,indent+addindent,addindent,newl) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 817, in writexml node.writexml(writer,indent+addindent,addindent,newl) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 817, in writexml node.writexml(writer,indent+addindent,addindent,newl) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 1036, in writexml _write_data(writer, "%s%s%s"%(indent, self.data, newl)) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 303, in _write_data writer.write(data) UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in position 0: ordinal not in range(128) I originally got this when I had: <em:translator>Ż</em:translator> in the template install.rdf, but you can get the same effect by using \u escapes in package.json, e.g.: { "author": "\u017b", ... }
Assignee | ||
Comment 2•14 years ago
|
||
Assignee | ||
Comment 3•14 years ago
|
||
Oops, my bad. This patch fixes the bug... we also use StringIO instead of cStringIO because the latter doesn't support unicode string buffers.
Assignee | ||
Updated•14 years ago
|
Attachment #452881 -
Flags: review?(warner-bugzilla)
Comment 4•14 years ago
|
||
Comment on attachment 452881 [details] [diff] [review] patch I'm not too familiar with xml.dom.minidom . Does the encoding= argument to writexml() merely set an attribute, and not actually affect the way writexml() encodes its output? Equivalently, does writexml() write unicode into the buf object? If so, this patch looks good. If not, I'd be worried about double-encoding.
Attachment #452881 -
Flags: review?(warner-bugzilla) → review+
Comment 5•14 years ago
|
||
oh, of course, it'd be nice to have a unit test for this.
Assignee | ||
Comment 7•14 years ago
|
||
Assignee | ||
Comment 8•14 years ago
|
||
Comment on attachment 470087 [details] [diff] [review] patch_with_unit_test Here's a version of the patch w/ a unit test. Indeed, the 'encoding' kwarg to dom.writexml() just sets the value of the eponymous attribute on the xml header, and the actual unicode object returned isn't encoded to that encoding, so we have to do that ourselves.
Attachment #470087 -
Flags: review?(warner-bugzilla)
Assignee | ||
Comment 9•14 years ago
|
||
Removed 'checkin-needed' since I've busted a patch w/ unit test...
Status: NEW → ASSIGNED
Keywords: checkin-needed
Comment 10•14 years ago
|
||
Comment on attachment 470087 [details] [diff] [review] patch_with_unit_test I like it. The one suggestion I'd make is to add a comment into RDF.__str__ to explain the apparent double-encoding there, something like: # real files have an .encoding attribute and use it when you write() unicode # into them: they read()/write() unicode and put encoded bytes in the # backend file. StringIO objects read()/write() unicode and put unicode in # the backing store, so we must encode the output of getvalue() to get a # bytestring. (cStringIO objects are weirder: they effectively have # .encoding hardwired to "ascii" and put only bytes in the backing store, so # we can't use them here). # # The encoding= argument to dom.writexml() merely sets the XML header's # encoding= attribute. It still writes unencoded unicode to the output file, # so we have to encode it for real afterwards.
Attachment #470087 -
Flags: review?(warner-bugzilla) → review+
Assignee | ||
Comment 11•14 years ago
|
||
Assignee | ||
Comment 12•14 years ago
|
||
Good idea, I've added the comment and pushed: Bug 567660 - cfx xpi can't produce install.rdf with non-ASCII characters Atul Varma [:atul] http://hg.mozilla.org/labs/jetpack-sdk/rev/479b348157dc
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 14•14 years ago
|
||
I was wondering why you use StringIO at all and not just dom.toxml(encoding="utf-8")? And is the differing behaviour between StringIO and cStringIO really intended and can be expected to stay that way?
Comment 15•14 years ago
|
||
The Add-on SDK is no longer a Mozilla Labs experiment and has become a big enough project to warrant its own Bugzilla product, so the "Add-on SDK" product has been created for it, and I am moving its bugs to that product. To filter bugmail related to this change, filter on the word "looptid".
Component: Jetpack SDK → General
Product: Mozilla Labs → Add-on SDK
QA Contact: jetpack-sdk → general
Version: Trunk → unspecified
Comment 16•14 years ago
|
||
> I was wondering why you use StringIO at all and not just
> dom.toxml(encoding="utf-8")? And is the differing behaviour between
> StringIO and cStringIO really intended and can be expected to stay that
> way?
panzi: good points. From what I figured out, dom.toxml(encoding=) doesn't
actually affect the encoding at all: it merely sets an attribute in the XML
header, so we need some mechanism to get from our unicode data to the UTF-8
bytes in the output.
The behavioral differences between StringIO and cStringIO are surprising, and
probably a result of the optimized cStringIO version not being as
general-purpose as the original. In most cases, I'd expect the pure-Python
original (StringIO) to be canonical, for for alternative implementations to
try and converge on its behavior. So for our purposes, since StringIO's
performance is plenty, I think we're best off sticking to StringIO.
You need to log in
before you can comment on or make changes to this bug.
Description
•