Closed Bug 477987 Opened 14 years ago Closed 14 years ago

[silme] improve encoding handling

Categories

(Mozilla Localizations :: Infrastructure, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

Attachments

(1 file, 3 obsolete files)

First, I modified get_source to return without encoding in case there is no fallback, no encoding and no char detector.

This scenario is important for Blobs. Imagine a jpeg file that we read as a Blob and then try to save.

Without this change it'll try to read it as utf8 or raise exception.

I'm not confident it's the best solution. Maybe we should split reading files for Blob and for L10nObject instead?

Next is similar change for writing.

Currently it'll write as utf8 even files like JPEG.

Once again, I'm not super-confident its the best possible solution.

Last change switches reading packages to read Blobs always with source. It makes sense cause there's no other goal in reading Blobs than to get the source of it.
It would be much more interesting to be able to select not to read source even for Blobs (for case scenario when I'm just reading package, not writing, and I don't want to present Blobs anyhow. Blobs are usually big. it would save time and memory not to read them) but that would require probably another parameter to get_l10npackage.

Adrian, let me know what you think about it
Attached file patch v1 (obsolete) —
Attachment #361735 - Flags: review?(akalla)
                 except Exception:
-                    l10npackage.objects[elem] = cls.get_blob(p2, source=source)
+                    l10npackage.objects[elem] = cls.get_blob(p2, source=True)
                 else:
                     if object_type=='object':
-                        l10npackage.objects[elem] = cls.get_blob(p2, source=source)
+                        l10npackage.objects[elem] = cls.get_blob(p2, source=True)
                     elif object_type=='entitylist':

I think, that always reading the source of the Blob is a bad idea: it slows things really down and it seems to be unnecessary:
-get_l10npackage() has the optional parameter "source", so you are able to force to read the source anyway
-we could try to read the source during a write attempt (if it wasn't read before)
What do you think about that?
I don't like the concept of combining write and read because I want to keep ability to transport L10nPackage as a fully "reproduceable" unit for example over the network.

So I'd much more prefer ability to read Blob's source always when you want to create a package that you will want to write.

The problem is that for Blob to be writable you *need* source. For others, not.

So I'm thinking about doing sth like this:

if source is True, get sources
if source is False, don't
is source is undefined - get source for Blob/EntityList and ignore for L10nObject.

How does that sound to you?
> if source is True, get sources
> if source is False, don't
> is source is undefined - get source for Blob/EntityList and ignore for L10nObject.


Sounds good
wait:

> is source is undefined - get source for Blob/EntityList and ignore for L10nObject.

Why for EntityLists? To write an entitylist we don't need the source, do we?
Attached patch patch, v2 (obsolete) — Splinter Review
updated patch.

split functions write_to_file and write_to_file_with_encoding and improve source handling on read
Attachment #361735 - Attachment is obsolete: true
Attachment #361934 - Flags: review?(akalla)
Attachment #361735 - Flags: review?(akalla)
Blocks: 458441
Attached patch patch, v3 (obsolete) — Splinter Review
improved patch. As Adrian noted on IRC, entitylist should use the same pattern as l10nobject, not blob.
Attachment #361934 - Attachment is obsolete: true
Attachment #361980 - Flags: review?(akalla)
Attachment #361934 - Flags: review?(akalla)
Comment on attachment 361980 [details] [diff] [review]
patch, v3

>-        cls.write_to_file(string, os.path.join(path, object.id), encoding)
>+
>+        if isinstance(object, Blob):
>+            cls.write_to_file(string,
>+                              os.path.join(path, object.id),
>+                              encoding)

write_to_file doesn't have the attribute "encoding"... 

>+        else:
>+            cls.write_to_file_with_encoding(string,
>+                                            os.path.join(path, object.id))
>         return True

...but write_to_file_with_encoding has it, and needs it here.
Attachment #361980 - Flags: review?(akalla) → review-
Attached patch patch, v3.1Splinter Review
good catch. thanks!

3.1 with this updated
Attachment #361980 - Attachment is obsolete: true
Attachment #361983 - Flags: review?(akalla)
Comment on attachment 361983 [details] [diff] [review]
patch, v3.1

looks good now. Thanks
Attachment #361983 - Flags: review?(akalla) → review+
commited, thanks
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 478225
You need to log in before you can comment on or make changes to this bug.