Because of bug 479150 I found out, that in fact, write_to_file() writes files using the default system encoding, which is for OS X Mac-Roman, for some systems ASCII and for some others UTF8. Because of that, I'm not sure if that method makes sense at all... At least it does not guarantee proper writing ob Blobs in any way - and that is why it was written in the beginning. Maybe hardcoding the ASCII encoding for Blobs would make more sense? The second way to solve this could be something like this (for Blobs): 1. Try ASCII 2. If it fails: try UTF8 What do you think about this? Axel? Gandalf?
I'm still working on the patch, but I took this case to clean up the IOClient API. Basically, current proposal is (more or less) like this: class IOClient(object): def get_blob (cls, path, source=True, blob=None): pass def get_entitylist (cls, path, source=False, code='default', parser=None): pass def get_l10nobject (cls, path, source=False, code='default', parser=None): pass def get_l10npackage (cls, path): pass def write_blob(cls, blob, path): pass def write_entitylist(cls, elist, path): pass def write_l10nobject(cls, l10nobject, path): pass def write_l10npackage(cls, l10npackage, path): pass def get_source(cls, path, encoding=None, fallback=None): """ reads source from the path """ pass def get_source_with_encoding(cls, path, encoding): """ reads source with encoding fallback """ pass def get_source_without_encoding(cls, path): """ reads source ignoring encoding (in binary compatible mode) """ pass
Status: NEW → ASSIGNED
I'm attaching initial rewrite POC. I limited the patch to what happens in silme.io.clients and silme.io.file to make it simpler to read. I cleaned whole IOClient to make it more abstract (at least for now) and reorganized API. FileClient should support whole IOClient now. I moved some methods to the private zone in order to keep the API clean. One thing I did not touch but consider to, is follow Java and split IOClient into IOClientReadable and IOClientWriteable interfaces so that each client can support one or both. My only concern here is potential performance drop due to interfaces use in scriptable language. Adrian, could you take a look at it? It's not finished so no need of testing, but it would be helpful for me to get your opinion on those changes. One thing to notice for you is that in IOClient.get_l10npackage, argument ignore may be a function now which implements your long awaited filter function :) More patches will come soon. I'll try to implement the API on other clients now to see how it matches the needs of DBClients and RCSClients.
Summary: [silme] silme.io.file.write_to_file() and the encoding used to write files → [silme] clean silme.io
this is WIP1. * clean API * silme.io.file clean * silme.io.jar almost clean
This is WIP3: * whole silme.io.file * whole silme.io.jar * partial silme.io.mysql * a lot of unittests The last thing before commit will be silme.io.hg|svn
WIP4: * silme.io.clients * silme.io.file * silme.io.jar * silme.io.mysql * silme.io.sqlite * silme.io.svn * silme.io.hg 1) SQLite support is only partial which is not a problem at the moment 2) HG support is extremely early in the game. I just made sure that proposed API supports Mercurial. This patch is feature ready. I'm cleaning the code and reducing redundancy by moving overlapping pieces of code to abstract classes in silme.io.clients. Adrian, it would be super helpful if you could take a look at the approach and let me know how it fits your needs. The ultimate goal is to be able to give path: ./path/to/package or jar:/path/to/file.jar or hg:http://hg.mozilla.org/l10n-central/pl as an argument and it should work without any change in the code. We may not be there yet, but for 0.5 I want to make sure we can achieve it.
Assignee: nobody → gandalf
Attachment #368094 - Attachment is obsolete: true
Attachment #368978 - Attachment is obsolete: true
Gandalf: could you please clean up the patch of obviously obsolete files, e.g.: pl.jar, test.jpg ? It's already big enough without them...
Gandalf: or, if you want to have them in the repo: making a separate patch with the test data would help
patch v2. minor updates and remove ./tests part to make it cleaner for review
Attachment #369098 - Attachment is obsolete: true
Comment on attachment 369511 [details] [diff] [review] patch v2 diff --git a/lib/silme/io/clients.py b/lib/silme/io/clients.py --- a/lib/silme/io/clients.py +++ b/lib/silme/io/clients.py +class FileFormatClient(IOClient): + @classmethod + def get_l10npackage(cls, path, + code='default', + object_type='l10nobject', + source=None, + ignore=['CVS','.svn','.DS_Store', '.hg']): + l10npackage = L10nPackage() + l10npackage.id = os.path.basename(path) + l10npackage.uri = path Looks like this method is not returning anything? @@ -44,17 +174,17 @@ class IOClient(object): for coding in fallback: try: return cls.get_source_with_encoding(path, coding) except IOError, e: raise except UnicodeDecodeError, e: continue # TODO: logging # if we still did not succeed, try to check if the BOM is specified - text = cls.read_without_encoding(path) + text = cls._read_without_encoding(path) Shoudn't either IOClient or FileFormatClient have a method returning "NotImplementedError()" or something else? Second problem: silme.io.file.FileClient._read_without_encoding(path) returns data in binary mode, so you cannot use it for testing text. @classmethod def get_source_with_encoding(cls, path, encoding): try: - text = cls.read_with_encoding(path, encoding) + text = cls._read_with_encoding(path, encoding) Shoudn't either IOClient or FileFormatClient have such a method returning "NotImplementedError()"? + @staticmethod + def _get_source_policy(source): please add a comment what this method is doing + @staticmethod + def _should_ignore(ignore, path, elems): please add a comment what this method is doing (ignore what?) diff --git a/lib/silme/io/file.py b/lib/silme/io/file.py --- a/lib/silme/io/file.py +++ b/lib/silme/io/file.py -class FileClient (IOClient): +class FileClient (FileFormatClient): + @classmethod + def write_object(cls, object, path, encoding=None): + if isinstance(object, L10nObject): + cls.write_l10nobject(object, path, encoding=encoding) + elif isinstance(object, EntityList): + cls.write_entitylist(object, path, encoding=encoding) + elif isinstance(object, Blob): + cls.write_blob(object, path) Error handling in an "else" case? + @classmethod + def write_l10npackage(cls, l10npack, path): + if not os.path.isdir(path): + os.mkdir(path) + for i in l10npack.packages: + if not os.path.isdir(os.path.join(path, i)): + os.mkdir(os.path.join(path, i)) + cls.write_l10npackage(l10npack.packages[i], os.path.join(path, i)) + for i in l10npack.objects: + cls.write_object(l10npack.objects[i], path) + return True I'm not sure if we should better use here mktree instead of mkdir + @classmethod + def _read_without_encoding(cls, path): + file = open(path, 'rb') + text = file.read() + file.close() + return text you are reading here in binary mode - so probably we should rename this method to avoid confusion? diff --git a/lib/silme/io/hg.py b/lib/silme/io/hg.py new file mode 100644 --- /dev/null +++ b/lib/silme/io/hg.py + @classmethod + def _read_with_encoding(cls, path, encoding): + client = pysvn.Client() + text = client.cat(path, revision=pysvn.Revision( pysvn.opt_revision_kind.head )) + return text Don't know how it works, but do we really use svn for hg...? + @classmethod + def _read_without_encoding(cls, path): + repo = hg.repository(ui.ui(), 'http://hg.mozilla.org/webtools/mcs/') + print repo + text = commands.cat(ui, repo, 'theme/html/index.html') + #text = client.cat(path, revision=pysvn.Revision( pysvn.opt_revision_kind.head )) + return text like above diff --git a/lib/silme/io/jar.py b/lib/silme/io/jar.py --- a/lib/silme/io/jar.py +++ b/lib/silme/io/jar.py def matches_path(cls, path): """ tests if the ioclient should be used for this type of path + example: jar:browser.jar!/content/branding/ """ - return path.endswith('.jar') + return path.startswith('jar:') As talked on IRC, we should return true also in the case when a path simply ends with ".jar" (we could test such a path with os.path.isdir, probably) + @classmethod + def _explode_path(cls, path): + if path.startswith('jar:'): + protocol = path[0:4] + jarpath = path[4:path.find('!')] + internalpath = path[path.find('!')+2:] what happens here if we get such a path: jar:/Users/bla.jar and what if: jar:/Users/bla.jar!/ ?
(In reply to comment #10) Thanks Adrian! I applied most of your comments in the following patch. > Second problem: silme.io.file.FileClient._read_without_encoding(path) returns > data in binary mode, so you cannot use it for testing text. My tests indicate I can. Can you provide an example? > I'm not sure if we should better use here mktree instead of mkdir I assume you mean os.makedirs? > you are reading here in binary mode - so probably we should rename this method > to avoid confusion? Same as above. I enabled binary mode, but I can read text. It's just that I ignore encoding and read it "raw". > Don't know how it works, but do we really use svn for hg...? I removed silme.io.hg from this patch. It's experimental code that is able to read blobs for now ;) Sorry for stealing your time with it. patch v3. addresses known issues.
Comment on attachment 370163 [details] [diff] [review] patch3 >+ @staticmethod >+ def _get_source_policy(source): >+ # returns two variables that manages in which >+ # case source of the file is beeing attached to the object Maybe it's just my bad English, but that sentence seems to be not correct (and: s/beeing/being/)... >+ @classmethod >+ def write_object(cls, object, path, encoding=None): >+ if isinstance(object, L10nObject): >+ cls.write_l10nobject(object, path, encoding=encoding) >+ elif isinstance(object, EntityList): >+ cls.write_entitylist(object, path, encoding=encoding) >+ elif isinstance(object, Blob): >+ cls.write_blob(object, path) > else: >+ raise TypeError('Cannot write object of suh type (%s)' % object.__class__.__name__) s/suh/such/ >+def zipfile_delete(self, file): >+ zfile = zipfile.ZipFile('/tmp/random.zip', mode='w') >+ for item in self.infolist(): >+ buf = self.read(item.filename) >+ if not item.filename==file: >+ zfile.writestr(item, buf) >+ zfile.close() >+ shutil.move(zfile.filename, self.filename) '/tmp/random.zip' ? Does this method work with Windows?
Attachment #370163 - Flags: review?(akalla) → review-
Great feedback! Thanks! Attaching the patch with fixes.
Comment on attachment 372651 [details] [diff] [review] Patch, v4 the code looks good now :) Good work. Unfortunately, I didn't have time to test it to see if it really works (e.g. with compare-locales2)...
Attachment #372651 - Flags: review?(akalla) → review+
commited! http://hg.mozilla.org/users/zbraniecki_mozilla.com/silme/rev/a0c4f0569abd I'm not going to close the bug just yet, I'd like to give you a chance with compare-locales2. Do you think you'll be able to give it a try over the next few days?
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.