Last Comment Bug 655337 - Use a JAR friendly method of loading hyphenation dictionaries
: Use a JAR friendly method of loading hyphenation dictionaries
Status: RESOLVED FIXED
mobilestartupshrink
: helpwanted, mobile, perf
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All Android
: -- normal (vote)
: mozilla10
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
Depends on:
Blocks: 686480 685214
  Show dependency treegraph
 
Reported: 2011-05-06 12:29 PDT by Mark Finkle (:mfinkle) (use needinfo?)
Modified: 2011-10-07 04:58 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
part 1 - use nsIURI rather than nsIFile to specify hyphenation resources (6.34 KB, patch)
2011-09-15 03:49 PDT, Jonathan Kew (:jfkthame)
mark.finkle: review+
smontagu: review+
Details | Diff | Review
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar (7.41 KB, patch)
2011-09-15 10:17 PDT, Jonathan Kew (:jfkthame)
mark.finkle: review+
smontagu: review+
benjamin: review+
Details | Diff | Review

Description Mark Finkle (:mfinkle) (use needinfo?) 2011-05-06 12:29:02 PDT
Bug 253317 added support for hyphenation dictionaries loaded from the gre/hyphenation or app/hyphenation folders.

On android this requires extracting the dictionaries out of the omnijar file. This hurts startup a little and it increases the size of our footprint on the device. It also makes it impossible to load the dictionaries from a locale JAR.

Could we try loading the dictionaries using a resource:// approach? We could still load from resource://gre/hyphenation and resource://app/hyphenation, but we could also add support for locale JARs too.

And we could stop manually extracting the dictionaries on Android.
Comment 1 Jonathan Kew (:jfkthame) 2011-05-07 12:09:38 PDT
The libhyphen code we're using expects to be handed a file path, and loads the dictionary from there using standard C library functions (fopen, fgets, fclose). I assume this won't work directly with JARs and resource:// paths.

We could of course refactor the dictionary-loading code so as to read via some other API within Gecko, but this will mean maintaining some kind of patch on top of the upstream code. Probably worth doing, though. And maybe we can get upstream to accept a patch providing a more flexible API.
Comment 2 Ryan VanderMeulen [:RyanVM] 2011-05-07 12:11:05 PDT
Would be nice to do the same refactoring for the spellcheck dictionary too.
Comment 3 :Ehsan Akhgari (busy, don't ask for review please) 2011-09-02 12:15:08 PDT
Jonathan, you can use the trick I used in bug 669116, so that we wouldn't need to patch libhyphen...
Comment 4 :Ehsan Akhgari (busy, don't ask for review please) 2011-09-07 10:06:21 PDT
Benjamin was saying something about whether we should use mmap here instead of overriding fread and friends to read from JAR files.  I didn't follow the rest of the conversation...  Do we know how we want to tackle this problem yet?
Comment 5 Benjamin Smedberg [:bsmedberg] 2011-09-07 10:21:55 PDT
Reading hnj_hyphen_load, it basically reads sequentially through the file and creates its data structures. AIUI, we can mmap things from the omnijar, in which case it should be a relatively simple code change to allow that code to simply read directly from a buffer rather than using fopen/fgets. I'll also file a bug about how the current loading pattern will be completely broken on Windows if the user installs to a unicode path :-(
Comment 6 :Ehsan Akhgari (busy, don't ask for review please) 2011-09-07 11:23:49 PDT
So, here's what I think should happen (it would be great if someone can confirm that I'm not mistaken here):

1. Modify nsHyphenationManager::LoadPatternList to load the hyphenation dictionaries from resource:///gre and resource:///app.  I'm not sure if we can enumerate hyphenation dictionaries using nsIFile interfaces, but it might be good to generate a list of these files at build time and just modify nsHyphenationManager::LoadPatternListFromDir to use that list.

2. Pass the URI of the hyphenation dictionary to hnj_hyphen_load.

3. Replace fopen/fgets/fclose with equivalent functions which read the data from a URI, possibly by using APIs such as NS_OpenURI/nsInputStream::Read, etc.  Alternatively, we can mmap the jar content at this point, although I have no idea how to do that.
Comment 7 Jonathan Kew (:jfkthame) 2011-09-08 04:11:14 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #6)
> 1. Modify nsHyphenationManager::LoadPatternList to load the hyphenation
> dictionaries from resource:///gre and resource:///app.  I'm not sure if we
> can enumerate hyphenation dictionaries using nsIFile interfaces, but it
> might be good to generate a list of these files at build time and just
> modify nsHyphenationManager::LoadPatternListFromDir to use that list.

If we generate a build-time list, we'd still need a way to update it at run time, as we're almost certainly going to want to support post-install or on-demand addition of dictionaries (for languages where we can't ship them as part of the product, and/or for other reasons).
Comment 8 Benjamin Smedberg [:bsmedberg] 2011-09-08 07:12:30 PDT
I don't think comment 7 is relevant: if we were to support downloadable dictionaries, we would store them in the profile, not in the application directory, along with our other downloadable resources.
Comment 9 Mike Hommey [:glandium] 2011-09-08 07:30:45 PDT
Linux distros will most likely provide the dictionary from the openoffice dictionary packages, not in the profile, but in the application directory (or yet another directory).
Comment 10 :Ehsan Akhgari (busy, don't ask for review please) 2011-09-08 19:17:29 PDT
(In reply to Mike Hommey [:glandium] from comment #9)
> Linux distros will most likely provide the dictionary from the openoffice
> dictionary packages, not in the profile, but in the application directory
> (or yet another directory).

They can modify the list file, right?  I'm not sure if expecting them to add the file paths to a manifest file is unreasonable.
Comment 11 Mike Hommey [:glandium] 2011-09-08 22:40:20 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #10)
> (In reply to Mike Hommey [:glandium] from comment #9)
> > Linux distros will most likely provide the dictionary from the openoffice
> > dictionary packages, not in the profile, but in the application directory
> > (or yet another directory).
> 
> They can modify the list file, right?  I'm not sure if expecting them to add
> the file paths to a manifest file is unreasonable.

Anything that needs a manifest being modified when a third party package is installed is painful.
Comment 12 Jonathan Kew (:jfkthame) 2011-09-10 08:05:04 PDT
I've just posted a patch in bug 685214 that moves one step towards this - it modifies things so that we use a URL spec (as a utf8 string) to specify the hyphenation dictionary, rather than a pathname, and then uses nsIInputStream to read it, so it should work with resource:// as an alternative to file:// URLs.

(Basically, this deals with point 3 of Ehsan's strategy in comment #6.)
Comment 13 Jonathan Kew (:jfkthame) 2011-09-15 03:49:40 PDT
Created attachment 560329 [details] [diff] [review]
part 1 - use nsIURI rather than nsIFile to specify hyphenation resources

(Applies on top of bug 685214, currently -inbound.)

This modifies nsHyphenationManager and nsHyphenator to use nsIURI instead of nsIFile to specify the hyphenation resource to be loaded.

With this, it should be possible for nsHyphenationManager to instantiate hyphenators using resource:// URIs as well (though this isn't actually implemented yet).
Comment 14 Jonathan Kew (:jfkthame) 2011-09-15 10:17:35 PDT
Created attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

I have no idea if this is a sensible approach, but it seems to work nicely on my Android device. :) Basically, it just uses an nsZipFind to look for hyphenation/*.dic entries in omnijar, and includes these in the list of available patterns.

This avoids the need to unpack the pattern dictionaries from the .apk (bug 686637), and so it includes (almost all of) the patch from there.

Presumably it would be possible to extend this to look in a locale jar file as well, though I have no idea about any of this packaging stuff really.
Comment 15 Jonathan Kew (:jfkthame) 2011-10-04 06:45:55 PDT
review ping...?
Comment 16 Mark Finkle (:mfinkle) (use needinfo?) 2011-10-04 08:02:55 PDT
Comment on attachment 560329 [details] [diff] [review]
part 1 - use nsIURI rather than nsIFile to specify hyphenation resources

I think this is a great change. Simple too. I didn'y spot any issues, but you'll need to get someone else to do the "official" review since I am not a toolkit peer.
Comment 17 Mark Finkle (:mfinkle) (use needinfo?) 2011-10-04 08:10:15 PDT
Comment on attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

>diff --git a/xpcom/build/Omnijar.h b/xpcom/build/Omnijar.h

> #include "nsString.h"
>+#include "nsIFile.h"
> 
>-class nsIFile;

Could this be avoided by including the file in nsHyphenationManager.h/cpp ?

r+ for the Android part. You need someone else to "officially" review the nsHyphenationManager changes. smontagu looks like a good candidate.


In the meantime, I'll make an Android build with these changes and do some timing tests.
Comment 18 Jonathan Kew (:jfkthame) 2011-10-04 08:36:47 PDT
(In reply to Mark Finkle (:mfinkle) from comment #17)
> >diff --git a/xpcom/build/Omnijar.h b/xpcom/build/Omnijar.h
> 
> > #include "nsString.h"
> >+#include "nsIFile.h"
> > 
> >-class nsIFile;
> 
> Could this be avoided by including the file in nsHyphenationManager.h/cpp ?

It could, but as Omnijar.h requires the inclusion of nsIFile.h (because of the use of NS_IF_ADDREF on an nsIFile* in the inline definition of the GetPath() function), I think it's correct to #include the header here, rather than relying on anyone who wants to use Omnijar.h to have already included nsIFile.h themselves first.

I assume the fact that Omnijar.h requires nsIFile.h (and not just a forward-declaration of nsIFile) simply went unnoticed because all existing users happened to include nsIFile.h themselves already.
Comment 19 Jonathan Kew (:jfkthame) 2011-10-04 08:38:06 PDT
Comment on attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

Also flagging bsmedberg for review, due to the Omnijar.h change (see above).
Comment 20 Mike Hommey [:glandium] 2011-10-04 08:40:09 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #18)
> I assume the fact that Omnijar.h requires nsIFile.h (and not just a
> forward-declaration of nsIFile) simply went unnoticed because all existing
> users happened to include nsIFile.h themselves already.

indeed.
Comment 21 Benjamin Smedberg [:bsmedberg] 2011-10-05 11:54:19 PDT
Comment on attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

Oh how I hate this enumeration pattern and wish it would go away at the earliest opportunity.
Comment 23 Jonathan Kew (:jfkthame) 2011-10-06 09:31:46 PDT
Backed out pt 2 for leaking an nsZipFind:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6b75c79c618
Comment 24 Jonathan Kew (:jfkthame) 2011-10-06 10:50:57 PDT
And re-landed, with the leak fixed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef08991a3411
Comment 25 Ed Morley [:emorley] 2011-10-07 04:20:01 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #24)
> And re-landed, with the leak fixed:

https://hg.mozilla.org/mozilla-central/rev/ef08991a3411
Comment 26 Ed Morley [:emorley] 2011-10-07 04:58:54 PDT
Merge of part 1:
https://hg.mozilla.org/mozilla-central/rev/bacc2c7eee29

Note You need to log in before you can comment on or make changes to this bug.