Bug 1095719 (downloadable-hyphenation-dicts)

[meta] Download hyphenation dictionaries at runtime

NEW
Unassigned

Status

()

Firefox for Android
General
3 years ago
3 months ago

People

(Reporter: rnewman, Unassigned, Mentored)

Tracking

(Depends on: 2 bugs, Blocks: 4 bugs, {meta})

Trunk
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Nick notes that a fairly decent chunk of our omni.ja (and thus of our APK) is hyphenation data. Indeed:

680107  hyphenation/hyph_hu.dic        Hungarian
273418  hyphenation/hyph_nn.dic        Norwegian Nyorsk
273418  hyphenation/hyph_nb.dic        Norwegian Bokmål
143200  hyphenation/hyph_af.dic        Afrikaans
138675  hyphenation/hyph_de-1901.dic   German
136060  hyphenation/hyph_de-1996.dic   German
133553  hyphenation/hyph_de-CH.dic     Swiss German
120460  hyphenation/hyph_nl.dic        Dutch
106063  hyphenation/hyph_en_US.dic     English
 90755  hyphenation/hyph_ru.dic
 76038  hyphenation/hyph_sh.dic
 65610  hyphenation/hyph_uk.dic
 60986  hyphenation/hyph_cy.dic
 42921  hyphenation/hyph_sv.dic
 39158  hyphenation/hyph_es.dic
 39089  hyphenation/hyph_pl.dic
 35105  hyphenation/hyph_et.dic
 34744  hyphenation/hyph_is.dic
 32206  hyphenation/hyph_eo.dic
 27370  hyphenation/hyph_gl.dic
 15408  hyphenation/hyph_bg.dic
 14390  hyphenation/hyph_fr.dic
 13894  hyphenation/hyph_mn.dic
 11711  hyphenation/hyph_hsb.dic
 10063  hyphenation/hyph_lt.dic
  7917  hyphenation/hyph_sl.dic
  7755  hyphenation/hyph_hr.dic
  6779  hyphenation/hyph_da.dic
  6106  hyphenation/hyph_ca.dic
  4248  hyphenation/hyph_ia.dic
  3765  hyphenation/hyph_tr.dic
  2556  hyphenation/hyph_it.dic
  2130  hyphenation/hyph_la.dic
  1969  hyphenation/hyph_kmr.dic
  1704  hyphenation/hyph_fi.dic
  1542  hyphenation/hyph_pt.dic

It seems positively criminal to subject millions of users to 1.3MB of hyphenation data for Hungarian, Norwegian, and Afrikaans alone (Norwegian twice!).

We should download these at runtime for affected users.
(Reporter)

Updated

2 years ago
Depends on: 1175555
(Reporter)

Comment 1

2 years ago
I want to do this via add-ons.
This is actually cross platform, On desktop I'd be worried less about the disk space, but I'd still want to lazy download those post install. If we can do one we can do both.  How often do t

cc-ing Laura, is there a general case bug/idea around systematically shifting parts of the download to lazy loading?  There's probably multiple opportunities like this one, just in splitting omni.jar.
(Reporter)

Comment 3

2 years ago
Nick, Margaret and I have thought about this some. Fonts, hyph dicts, locales (needs lots of work that someday I'll write down), and a bunch of dev/power user features are all good candidates. 

These dicts are good candidates because their absence degrades gracefully, and they seemingly would ship easily in an add-on. The work is in plumbing when to download them. 

I plan to chat with Nick about this next week. Very interested to know if anyone else has spent time on this.

Comment 4

2 years ago
I think it's a good idea (for our platform in general) to have an option to install hyphenation dicts at runtime. I'd like to get a firm idea on how the UX should be before we get too technical, though. Like, should there be notifications, confirmations, any kind of UI that explains to users or webdevs what's happening?

In particular for mobile, is it OK to download 600k over the data connection?

Also, noteworthy data point, fx os uses -moz-hyphens as part of their UX, so we probably want to ship the hyphenation dicts by default there.

Technically, https://dxr.mozilla.org/mozilla-central/source/intl/hyphenation/nsHyphenationManager.cpp doesn't seem to have a code path that's open to add-ons yet. 'Cause all our things called dictionaries are different.
(In reply to Axel Hecht [:Pike] from comment #4)

> Technically,
> https://dxr.mozilla.org/mozilla-central/source/intl/hyphenation/
> nsHyphenationManager.cpp doesn't seem to have a code path that's open to
> add-ons yet. 'Cause all our things called dictionaries are different.

We should look at the system added for Fonts. There is a notification, or something, sent out and the front-end decides how to handle the request. See bug 648548 for the front-end part.
(Reporter)

Updated

2 years ago
See Also: → bug 1194338
Haven't finished the addon that will hold all the hyphenation files because of my schedule. NI myself so I don't forget.
Flags: needinfo?(jonalmeida942)
Depends on: 1205423
Depends on: 1205425
See comment 2 in bug 1205423 where I've created an addon for this. To verify, it you need to grab a build that uses the flag in bug 1175555, install the addon and re-run the failing hyphenation tests.
Flags: needinfo?(jonalmeida942)
@Jonathan: Now as the "download service" is taking shape (bug 1197720) I think we could omit the add-on and let the service decide and download the "raw" hyphenation dictionaries from the CDN. What do you think?

Regardless of that: How do we notify Gecko about the downloaded dictionaries? Can we just call nsHyphenationManager::LoadPatternList() at runtime? Have you looked into this with the add-on approach?
Flags: needinfo?(jonalmeida942)
(In reply to Sebastian Kaspari (:sebastian) from comment #8)
> @Jonathan: Now as the "download service" is taking shape (bug 1197720) I
> think we could omit the add-on and let the service decide and download the
> "raw" hyphenation dictionaries from the CDN. What do you think?

I like the "raw" approach. Lees to deal with.

> Regardless of that: How do we notify Gecko about the downloaded
> dictionaries? Can we just call nsHyphenationManager::LoadPatternList() at
> runtime? Have you looked into this with the add-on approach?

Two thoughts here:
1. nsHyphenationManager has an nsIObserver it uses for memory pressure. If it receives a memory pressure, it will empty the hyphenators and it doesn't look like it tries to reload them during that session. This means maybe we don't need to worry about one missed session either. The dictionaries will be loaded on the next run.

2. As mentioned, nsHyphenationManager has an nsIObserver it uses for memory pressure. We could rename the class from MemoryPressureObserver to HyphenationObserver and add an observer for a new notification. We could send the notification when we download the dictionaries and the manager would call LoadPatternList().
(In reply to Sebastian Kaspari (:sebastian) from comment #8)
> @Jonathan: Now as the "download service" is taking shape (bug 1197720) I
> think we could omit the add-on and let the service decide and download the
> "raw" hyphenation dictionaries from the CDN. What do you think?

Sounds good to me. I'm not sure of one part: during the building/testing phase, we have to request to download all hyphenation files for all the individual reftests we have for that. We can probably think about it later, but its worth noticing now.
Flags: needinfo?(jonalmeida942)
(In reply to Jonathan Almeida (:jonalmeida) from comment #10)
> Sounds good to me. I'm not sure of one part: during the building/testing
> phase, we have to request to download all hyphenation files for all the
> individual reftests we have for that. We can probably think about it later,
> but its worth noticing now.

Yeah, I filed bug 1197716 for the font case. I don't have a solution for that yet but we should be able to use the same for fonts and hyphenation dictionaries. We can't download them from the web on the test devices so I want to look into other ways to sideload them and then move them to their destination.
(In reply to Sebastian Kaspari (:sebastian) from comment #11)
> (In reply to Jonathan Almeida (:jonalmeida) from comment #10)
> > Sounds good to me. I'm not sure of one part: during the building/testing
> > phase, we have to request to download all hyphenation files for all the
> > individual reftests we have for that. We can probably think about it later,
> > but its worth noticing now.
> 
> Yeah, I filed bug 1197716 for the font case. I don't have a solution for
> that yet but we should be able to use the same for fonts and hyphenation
> dictionaries. We can't download them from the web on the test devices so I
> want to look into other ways to sideload them and then move them to their
> destination.

I think it makes sense to make a reftest-only addon that contains the reftest-required data and installs it in first run.  Or we make the reftest harness produce a profile with the reftest-required data files already in place.
(In reply to Nick Alexander :nalexander from comment #12)
> (In reply to Sebastian Kaspari (:sebastian) from comment #11)
> > (In reply to Jonathan Almeida (:jonalmeida) from comment #10)
> > > Sounds good to me. I'm not sure of one part: during the building/testing
> > > phase, we have to request to download all hyphenation files for all the
> > > individual reftests we have for that. We can probably think about it later,
> > > but its worth noticing now.
> > 
> > Yeah, I filed bug 1197716 for the font case. I don't have a solution for
> > that yet but we should be able to use the same for fonts and hyphenation
> > dictionaries. We can't download them from the web on the test devices so I
> > want to look into other ways to sideload them and then move them to their
> > destination.
> 
> I think it makes sense to make a reftest-only addon that contains the
> reftest-required data and installs it in first run.  Or we make the reftest
> harness produce a profile with the reftest-required data files already in
> place.

We shold be able to create ZIPs and unzip the files to the required locations without *too* much trouble. IIRC, we already do some unzipping to get the preset profile data on to devices already.
(In reply to Mark Finkle (:mfinkle) from comment #13)
> (In reply to Nick Alexander :nalexander from comment #12)
> > (In reply to Sebastian Kaspari (:sebastian) from comment #11)
> > > (In reply to Jonathan Almeida (:jonalmeida) from comment #10)
> > > > Sounds good to me. I'm not sure of one part: during the building/testing
> > > > phase, we have to request to download all hyphenation files for all the
> > > > individual reftests we have for that. We can probably think about it later,
> > > > but its worth noticing now.
> > > 
> > > Yeah, I filed bug 1197716 for the font case. I don't have a solution for
> > > that yet but we should be able to use the same for fonts and hyphenation
> > > dictionaries. We can't download them from the web on the test devices so I
> > > want to look into other ways to sideload them and then move them to their
> > > destination.
> > 
> > I think it makes sense to make a reftest-only addon that contains the
> > reftest-required data and installs it in first run.  Or we make the reftest
> > harness produce a profile with the reftest-required data files already in
> > place.
> 
> We shold be able to create ZIPs and unzip the files to the required
> locations without *too* much trouble. IIRC, we already do some unzipping to
> get the preset profile data on to devices already.

The addon in bug 1205423 unzips the xpi `contents/` dir in the install() part and deletes anything it moved during the uninstall().

I haven't tested the part that checks if the hyphenation files are detected and if the failing reftests pass yet (I'll being doing that with the patches for bug 1087791 next).
Mentor: s.kaspari@gmail.com

Comment 15

2 years ago
Is this  a part of GSoC 2016 project or that project is similar to this?
(Reporter)

Comment 16

2 years ago
I'm not aware of a GSoC project around this, but perhaps Sebastian is?
Flags: needinfo?(s.kaspari)
Yeah, I proposed this as part of a GSoC project for this year:
https://wiki.mozilla.org/SummerOfCode#2016

I think I just answered you via mail: Currently I'm waiting on whether Google will accept Mozilla as participating organization this year (The decision should be made today). Meanwhile I created a draft on the Wiki to describe the project in more detail: https://wiki.mozilla.org/User:Skaspari/GSoC2016 - Feel free to use the discussion site on the Wiki to ask questions and I'll try to extended the article as needed.
Flags: needinfo?(s.kaspari)

Comment 18

2 years ago
Mozilla has been accepted as a mentor organization for GSoC 2016. I am interested in learning more about this project.
Summary: Download hyphenation dictionaries at runtime → [meta] Download hyphenation dictionaries at runtime
Alias: downloadable-hyphenation-dicts
Depends on: 1272226
Blocks: 1272226
No longer depends on: 1272226
Depends on: 1276586
Depends on: 1276588
Depends on: 1276560
Depends on: 1276589
Depends on: 1276587

Updated

a year ago
Depends on: 1280757

Updated

a year ago
Depends on: 1280769
Blocks: 1303172
(Reporter)

Updated

a year ago
Keywords: meta
Depends on: 1314019
Depends on: 1318544
(In reply to Mark Finkle (:mfinkle) (use needinfo?) from comment #9)
> Two thoughts here:
> 1. nsHyphenationManager has an nsIObserver it uses for memory pressure. If
> it receives a memory pressure, it will empty the hyphenators and it doesn't
> look like it tries to reload them during that session.

I don't think that's correct. The memory-pressure observer will discard any loaded hyphenators, but they'll be re-loaded by nsHyphenationManager::GetHyphenator() if content calls for them again, so memory-pressure doesn't have the effect of disabling hyphenation for the rest of the session, only of flushing any currently-cached hyphenators.

> 2. As mentioned, nsHyphenationManager has an nsIObserver it uses for memory
> pressure. We could rename the class from MemoryPressureObserver to
> HyphenationObserver and add an observer for a new notification. We could
> send the notification when we download the dictionaries and the manager
> would call LoadPatternList().

Yes, I think that should work just fine. Although LoadPatternList() was originally written to do a one-time initialization, it looks safe to call repeatedly. (Maybe rename it as Reload...?)
Depends on: 1323977
Thanks! I filed bug 1323977 for this.
Depends on: 1335429

Updated

8 months ago
Blocks: 1344625
Depends on: 1382589
You need to log in before you can comment on or make changes to this bug.