Closed Bug 465572 Opened 16 years ago Closed 16 years ago

[silme] Improve BOM handling in io/file.py

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: adriank, Assigned: adriank)

References

Details

Attachments

(2 files, 5 obsolete files)

patch v1 16 years ago Adrian Kalla [:adriank] 3.85 KB, patch		Details \| Diff \| Splinter Review
(main) patch v2 16 years ago Adrian Kalla [:adriank] 4.07 KB, patch		Details \| Diff \| Splinter Review
(main) patch v3 16 years ago Adrian Kalla [:adriank] 5.75 KB, patch		Details \| Diff \| Splinter Review
patch adding 'fallback' attribute to parsers 16 years ago Adrian Kalla [:adriank] 2.40 KB, patch		Details \| Diff \| Splinter Review
(main) patch v4 16 years ago Adrian Kalla [:adriank] 6.70 KB, patch		Details \| Diff \| Splinter Review
v2 patch adding 'fallback' attribute to parsers 16 years ago Adrian Kalla [:adriank] 2.40 KB, patch	zbraniecki : review-	Details \| Diff \| Splinter Review
(main) patch v5 16 years ago Adrian Kalla [:adriank] 6.73 KB, patch		Details \| Diff \| Splinter Review

Adrian Kalla [:adriank]

Assignee

Description

•

16 years ago

Attached patch patch v1 (obsolete) — Details — Splinter Review

As discussed with Axel, we need to handle BOM even if the file would be properly loaded as 'utf_8'. This patch also adds removing of other BOMs which were previously left untouched.

Adrian Kalla [:adriank]

Assignee

Updated

•

16 years ago

Attachment #348802 - Flags: review?(gandalf)

Adrian Kalla [:adriank]

Assignee

Updated

•

16 years ago

Component: is / Icelandic → Infrastructure

QA Contact: icelandic.is → infrastructure

Adrian Kalla [:adriank]

Assignee

Comment 1

•

16 years ago

I'm sorry for the bugspam...

Assignee: nobody → akalla

Stefan Plewako [:stef]

Comment 2

•

16 years ago

IMHO we really shouldn't removing any content of source files and fail silently with BOMs - this is bad idea, example: bom sesitive file format/app/buildsystem. BOMs should be handled on a per app/file format level.

Adrian Kalla [:adriank]

Assignee

Comment 3

•

16 years ago

Stef: we are only removing BOMs if they exist and the BOMs will be recreated when the object will be written to file (see the "writeToFile()" method). So looking at the encoding parameter you always see if the object had a BOM, and because of that no BOM gets lost (if in a 'utf_8' text a BOM will be detected, the encoding parameter will be changed to 'utf_8_sig').

Stefan Plewako [:stef]

Comment 4

•

16 years ago

In some situations, BOM in UTF-8 encoded file will be a problem and should raise error (or add to logger) with optional skip bomsign rule (to load the rest of a file) - this is the silent aspect I'm writing about, not the strips when (re)writing file. This is also wrong place, especially to hardcode, such default format independent rules.

Adrian Kalla [:adriank]

Assignee

Comment 5

•

16 years ago

Attached patch (main) patch v2 (obsolete) — Details — Splinter Review

changes compared with v1: in getSourceWithEncoding(): -delete a BOM only if it really exists -if the encoding was 'utf_8_sig', but no BOM was found, change the encoding to 'utf_8' in getSource(): -if after finding a BOM reading fails, raise an UnicodeError Regarding Stef's comment: I'm not sure what's the best solution for that problem... One option would be to add a new attribute "strict" to parsers and have something like that: (...) def getSource(cls, path, encoding=None, fallback=None, strict=False): if encoding is not None: output = cls.getSourceWithEncoding(path, encoding) if strict and encoding != output[1]: raise Exception('The output is not encoded in ' + encoding + \ ' like expected, but in ' + output[1]) return output (...)

Attachment #348802 - Attachment is obsolete: true

Attachment #348802 - Flags: review?(gandalf)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 6

•

16 years ago

don't we already cover the strict issue with encoding=''? If encoding is different than the given one we throw an Exception. If someone wants to TRY encoding but not fail if it's not the one he should use fallback. Am I missing sonething?

Adrian Kalla [:adriank]

Assignee

Comment 7

•

16 years ago

Attached patch (main) patch v3 (obsolete) — Details — Splinter Review

patch changed as discussed on IRC with Gandalf. The problem with the existing implementation is, that even if you do 'encoding='utf_8', reading will NOT fail if the encoding is in fact 'utf_8_sig'. This patch tries to solve that problem.

Attachment #348915 - Attachment is obsolete: true

Adrian Kalla [:adriank]

Assignee

Comment 8

•

16 years ago

Attached patch patch adding 'fallback' attribute to parsers (obsolete) — Details — Splinter Review

Adrian Kalla [:adriank]

Assignee

Comment 9

•

16 years ago

Attached patch (main) patch v4 (obsolete) — Details — Splinter Review

parallel working on multiple repos is hard... Here the patch with a few additional crash fixes.

Attachment #349265 - Attachment is obsolete: true

Adrian Kalla [:adriank]

Assignee

Comment 10

•

16 years ago

Attached patch v2 patch adding 'fallback' attribute to parsers — Details — Splinter Review

changed default fallback value from "None" to "[]", because fallback has to be a list

Attachment #349266 - Attachment is obsolete: true

Adrian Kalla [:adriank]

Assignee

Updated

•

16 years ago

Attachment #349385 - Flags: review?(gandalf)

Adrian Kalla [:adriank]

Assignee

Updated

•

16 years ago

Attachment #349386 - Flags: review?(gandalf)

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

16 years ago

Attachment #349386 - Flags: review?(gandalf) → review-

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 11

•

16 years ago

Comment on attachment 349386 [details] [diff] [review] v2 patch adding 'fallback' attribute to parsers you should not put param=[] like this if it's not static because it will be shared between all instances. You need __init__ for this.

Adrian Kalla [:adriank]

Assignee

Comment 12

•

16 years ago

Attached patch (main) patch v5 — Details — Splinter Review

fixed a few short problems

Attachment #349385 - Attachment is obsolete: true

Attachment #349385 - Flags: review?(gandalf)

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 13

•

16 years ago

fixed with todays checkin http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/50b2918343a7

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Adrian Kalla [:adriank]

Assignee

Updated

•

16 years ago

Status: RESOLVED → VERIFIED

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

16 years ago

Blocks: 478225

You need to log in before you can comment on or make changes to this bug.