Closed
Bug 465572
Opened 17 years ago
Closed 17 years ago
[silme] Improve BOM handling in io/file.py
Categories
(Mozilla Localizations :: Infrastructure, defect)
Mozilla Localizations
Infrastructure
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: adriank, Assigned: adriank)
References
Details
Attachments
(2 files, 5 obsolete files)
2.40 KB,
patch
|
zbraniecki
:
review-
|
Details | Diff | Splinter Review |
6.73 KB,
patch
|
Details | Diff | Splinter Review |
As discussed with Axel, we need to handle BOM even if the file would be properly loaded as 'utf_8'.
This patch also adds removing of other BOMs which were previously left untouched.
Assignee | ||
Updated•17 years ago
|
Attachment #348802 -
Flags: review?(gandalf)
Assignee | ||
Updated•17 years ago
|
Component: is / Icelandic → Infrastructure
QA Contact: icelandic.is → infrastructure
Comment 2•17 years ago
|
||
IMHO we really shouldn't removing any content of source files and fail silently with BOMs - this is bad idea, example: bom sesitive file format/app/buildsystem.
BOMs should be handled on a per app/file format level.
Assignee | ||
Comment 3•17 years ago
|
||
Stef: we are only removing BOMs if they exist and the BOMs will be recreated when the object will be written to file (see the "writeToFile()" method).
So looking at the encoding parameter you always see if the object had a BOM, and because of that no BOM gets lost (if in a 'utf_8' text a BOM will be detected, the encoding parameter will be changed to 'utf_8_sig').
Comment 4•17 years ago
|
||
In some situations, BOM in UTF-8 encoded file will be a problem and should raise error (or add to logger) with optional skip bomsign rule (to load the rest of a file) - this is the silent aspect I'm writing about, not the strips when (re)writing file.
This is also wrong place, especially to hardcode, such default format independent rules.
Assignee | ||
Comment 5•17 years ago
|
||
changes compared with v1:
in getSourceWithEncoding():
-delete a BOM only if it really exists
-if the encoding was 'utf_8_sig', but no BOM was found, change the encoding to 'utf_8'
in getSource():
-if after finding a BOM reading fails, raise an UnicodeError
Regarding Stef's comment:
I'm not sure what's the best solution for that problem... One option would be to add a new attribute "strict" to parsers and have something like that:
(...)
def getSource(cls, path, encoding=None, fallback=None, strict=False):
if encoding is not None:
output = cls.getSourceWithEncoding(path, encoding)
if strict and encoding != output[1]:
raise Exception('The output is not encoded in ' + encoding + \
' like expected, but in ' + output[1])
return output
(...)
Attachment #348802 -
Attachment is obsolete: true
Attachment #348802 -
Flags: review?(gandalf)
Comment 6•17 years ago
|
||
don't we already cover the strict issue with encoding=''?
If encoding is different than the given one we throw an Exception. If someone wants to TRY encoding but not fail if it's not the one he should use fallback.
Am I missing sonething?
Assignee | ||
Comment 7•17 years ago
|
||
patch changed as discussed on IRC with Gandalf.
The problem with the existing implementation is, that even if you do 'encoding='utf_8', reading will NOT fail if the encoding is in fact 'utf_8_sig'.
This patch tries to solve that problem.
Attachment #348915 -
Attachment is obsolete: true
Assignee | ||
Comment 8•17 years ago
|
||
Assignee | ||
Comment 9•17 years ago
|
||
parallel working on multiple repos is hard...
Here the patch with a few additional crash fixes.
Attachment #349265 -
Attachment is obsolete: true
Assignee | ||
Comment 10•17 years ago
|
||
changed default fallback value from "None" to "[]", because fallback has to be a list
Attachment #349266 -
Attachment is obsolete: true
Assignee | ||
Updated•17 years ago
|
Attachment #349385 -
Flags: review?(gandalf)
Assignee | ||
Updated•17 years ago
|
Attachment #349386 -
Flags: review?(gandalf)
Updated•17 years ago
|
Attachment #349386 -
Flags: review?(gandalf) → review-
Comment 11•17 years ago
|
||
Comment on attachment 349386 [details] [diff] [review]
v2 patch adding 'fallback' attribute to parsers
you should not put param=[] like this if it's not static because it will be shared between all instances.
You need __init__ for this.
Assignee | ||
Comment 12•17 years ago
|
||
fixed a few short problems
Attachment #349385 -
Attachment is obsolete: true
Attachment #349385 -
Flags: review?(gandalf)
Comment 13•17 years ago
|
||
fixed with todays checkin http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/50b2918343a7
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•17 years ago
|
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•