Closed Bug 655337 Opened 13 years ago Closed 13 years ago

Use a JAR friendly method of loading hyphenation dictionaries

Categories

(Core :: Layout: Text and Fonts, defect)

All
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: mfinkle, Assigned: jfkthame)

References

Details

(Keywords: helpwanted, mobile, perf, Whiteboard: mobilestartupshrink)

Attachments

(2 files)

Bug 253317 added support for hyphenation dictionaries loaded from the gre/hyphenation or app/hyphenation folders.

On android this requires extracting the dictionaries out of the omnijar file. This hurts startup a little and it increases the size of our footprint on the device. It also makes it impossible to load the dictionaries from a locale JAR.

Could we try loading the dictionaries using a resource:// approach? We could still load from resource://gre/hyphenation and resource://app/hyphenation, but we could also add support for locale JARs too.

And we could stop manually extracting the dictionaries on Android.
The libhyphen code we're using expects to be handed a file path, and loads the dictionary from there using standard C library functions (fopen, fgets, fclose). I assume this won't work directly with JARs and resource:// paths.

We could of course refactor the dictionary-loading code so as to read via some other API within Gecko, but this will mean maintaining some kind of patch on top of the upstream code. Probably worth doing, though. And maybe we can get upstream to accept a patch providing a more flexible API.
Would be nice to do the same refactoring for the spellcheck dictionary too.
Whiteboard: mobilestartupshrink
Jonathan, you can use the trick I used in bug 669116, so that we wouldn't need to patch libhyphen...
Keywords: helpwanted, mobile, perf
OS: Linux → Android
Hardware: x86 → All
Version: unspecified → Trunk
Benjamin was saying something about whether we should use mmap here instead of overriding fread and friends to read from JAR files.  I didn't follow the rest of the conversation...  Do we know how we want to tackle this problem yet?
Reading hnj_hyphen_load, it basically reads sequentially through the file and creates its data structures. AIUI, we can mmap things from the omnijar, in which case it should be a relatively simple code change to allow that code to simply read directly from a buffer rather than using fopen/fgets. I'll also file a bug about how the current loading pattern will be completely broken on Windows if the user installs to a unicode path :-(
So, here's what I think should happen (it would be great if someone can confirm that I'm not mistaken here):

1. Modify nsHyphenationManager::LoadPatternList to load the hyphenation dictionaries from resource:///gre and resource:///app.  I'm not sure if we can enumerate hyphenation dictionaries using nsIFile interfaces, but it might be good to generate a list of these files at build time and just modify nsHyphenationManager::LoadPatternListFromDir to use that list.

2. Pass the URI of the hyphenation dictionary to hnj_hyphen_load.

3. Replace fopen/fgets/fclose with equivalent functions which read the data from a URI, possibly by using APIs such as NS_OpenURI/nsInputStream::Read, etc.  Alternatively, we can mmap the jar content at this point, although I have no idea how to do that.
(In reply to Ehsan Akhgari [:ehsan] from comment #6)
> 1. Modify nsHyphenationManager::LoadPatternList to load the hyphenation
> dictionaries from resource:///gre and resource:///app.  I'm not sure if we
> can enumerate hyphenation dictionaries using nsIFile interfaces, but it
> might be good to generate a list of these files at build time and just
> modify nsHyphenationManager::LoadPatternListFromDir to use that list.

If we generate a build-time list, we'd still need a way to update it at run time, as we're almost certainly going to want to support post-install or on-demand addition of dictionaries (for languages where we can't ship them as part of the product, and/or for other reasons).
I don't think comment 7 is relevant: if we were to support downloadable dictionaries, we would store them in the profile, not in the application directory, along with our other downloadable resources.
Linux distros will most likely provide the dictionary from the openoffice dictionary packages, not in the profile, but in the application directory (or yet another directory).
Blocks: 685214
(In reply to Mike Hommey [:glandium] from comment #9)
> Linux distros will most likely provide the dictionary from the openoffice
> dictionary packages, not in the profile, but in the application directory
> (or yet another directory).

They can modify the list file, right?  I'm not sure if expecting them to add the file paths to a manifest file is unreasonable.
(In reply to Ehsan Akhgari [:ehsan] from comment #10)
> (In reply to Mike Hommey [:glandium] from comment #9)
> > Linux distros will most likely provide the dictionary from the openoffice
> > dictionary packages, not in the profile, but in the application directory
> > (or yet another directory).
> 
> They can modify the list file, right?  I'm not sure if expecting them to add
> the file paths to a manifest file is unreasonable.

Anything that needs a manifest being modified when a third party package is installed is painful.
I've just posted a patch in bug 685214 that moves one step towards this - it modifies things so that we use a URL spec (as a utf8 string) to specify the hyphenation dictionary, rather than a pathname, and then uses nsIInputStream to read it, so it should work with resource:// as an alternative to file:// URLs.

(Basically, this deals with point 3 of Ehsan's strategy in comment #6.)
Blocks: 686480
(Applies on top of bug 685214, currently -inbound.)

This modifies nsHyphenationManager and nsHyphenator to use nsIURI instead of nsIFile to specify the hyphenation resource to be loaded.

With this, it should be possible for nsHyphenationManager to instantiate hyphenators using resource:// URIs as well (though this isn't actually implemented yet).
I have no idea if this is a sensible approach, but it seems to work nicely on my Android device. :) Basically, it just uses an nsZipFind to look for hyphenation/*.dic entries in omnijar, and includes these in the list of available patterns.

This avoids the need to unpack the pattern dictionaries from the .apk (bug 686637), and so it includes (almost all of) the patch from there.

Presumably it would be possible to extend this to look in a locale jar file as well, though I have no idea about any of this packaging stuff really.
Attachment #560388 - Flags: review?(mark.finkle)
Attachment #560329 - Flags: review?(mark.finkle)
review ping...?
Comment on attachment 560329 [details] [diff] [review]
part 1 - use nsIURI rather than nsIFile to specify hyphenation resources

I think this is a great change. Simple too. I didn'y spot any issues, but you'll need to get someone else to do the "official" review since I am not a toolkit peer.
Attachment #560329 - Flags: review?(mark.finkle) → review+
Comment on attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

>diff --git a/xpcom/build/Omnijar.h b/xpcom/build/Omnijar.h

> #include "nsString.h"
>+#include "nsIFile.h"
> 
>-class nsIFile;

Could this be avoided by including the file in nsHyphenationManager.h/cpp ?

r+ for the Android part. You need someone else to "officially" review the nsHyphenationManager changes. smontagu looks like a good candidate.


In the meantime, I'll make an Android build with these changes and do some timing tests.
Attachment #560388 - Flags: review?(mark.finkle) → review+
Attachment #560329 - Flags: review?(smontagu)
Attachment #560388 - Flags: review?(smontagu)
(In reply to Mark Finkle (:mfinkle) from comment #17)
> >diff --git a/xpcom/build/Omnijar.h b/xpcom/build/Omnijar.h
> 
> > #include "nsString.h"
> >+#include "nsIFile.h"
> > 
> >-class nsIFile;
> 
> Could this be avoided by including the file in nsHyphenationManager.h/cpp ?

It could, but as Omnijar.h requires the inclusion of nsIFile.h (because of the use of NS_IF_ADDREF on an nsIFile* in the inline definition of the GetPath() function), I think it's correct to #include the header here, rather than relying on anyone who wants to use Omnijar.h to have already included nsIFile.h themselves first.

I assume the fact that Omnijar.h requires nsIFile.h (and not just a forward-declaration of nsIFile) simply went unnoticed because all existing users happened to include nsIFile.h themselves already.
Comment on attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

Also flagging bsmedberg for review, due to the Omnijar.h change (see above).
Attachment #560388 - Flags: review?(benjamin)
(In reply to Jonathan Kew (:jfkthame) from comment #18)
> I assume the fact that Omnijar.h requires nsIFile.h (and not just a
> forward-declaration of nsIFile) simply went unnoticed because all existing
> users happened to include nsIFile.h themselves already.

indeed.
Comment on attachment 560388 [details] [diff] [review]
part 2 - don't unpack hyphenation patterns on android, look for them in omnijar

Oh how I hate this enumeration pattern and wish it would go away at the earliest opportunity.
Attachment #560388 - Flags: review?(benjamin) → review+
Attachment #560329 - Flags: review?(smontagu) → review+
Attachment #560388 - Flags: review?(smontagu) → review+
Assignee: nobody → jfkthame
Target Milestone: --- → mozilla10
Backed out pt 2 for leaking an nsZipFind:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6b75c79c618
(In reply to Jonathan Kew (:jfkthame) from comment #24)
> And re-landed, with the leak fixed:

https://hg.mozilla.org/mozilla-central/rev/ef08991a3411
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: