Closed Bug 1590167 Opened 5 years ago Closed 5 years ago

Replace libhyphen with a new implementation for better performance and smaller memory footprint

Categories

(Core :: Layout: Text and Fonts, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

Attachments

(1 file)

In bug 1487212, we'd like to share hyphenation data across processes, rather than having each process load its own copy. However, the runtime structures used by libhyphen are not at all suited to sharing.

To make this possible, I have created a new implementation ("mapped_hyph") of the algorithm that uses a flat, relocatable representation of the hyphenation state machine (instead of the pointer-based graph of structs used by libhyphen), suitable for use directly from an mmap'd file or for sharing between processes using shared memory.

This implementation requires the hyphenation dictionaries (pattern files) to be precompiled into a representation of the finite state machine that can be used directly by the runtime algorithm. The resulting compiled files are typically around twice the size of the original pattern files, so this will somewhat increase the on-disk footprint; but on the other hand, runtime memory footprint is substantially smaller.

When libhyphen loads a pattern file, a dictionary such as hyph_en_US.dic which is 107K on disk expands to nearly 900K in memory (in each process that uses it). In contrast, mapped_hyph's representation is 195K on disk, but requires no additional memory at runtime.

As a bonus, the new implementation is also somewhat faster. As a testcase, I used the Inspector to add hyphens:auto to the <body> element of a long wikipedia page, and then profiled toggling this property on and off a couple of times.

With libhyphen (current trunk code), the reflows with hyphens:auto enabled take 152ms.

With mapped_hyph, the hyphenated reflow time is reduced to 134ms.

Regarding loading of hyphenation dictionaries: in unpackaged developer builds, these files are found in a /hyphenation/ directory, and so mapped_hyph can use them directly by mmap'ing the files. (This would also be the case for resources added to a Firefox installation via an add-on or similar.) However, in packaged builds as shipped to users, the hyphenation resources are stored in omnijar, where they'll be compressed, and so we cannot directly mmap them.

In this case, therefore, the initial patch here just decompresses the resource into a memory buffer (in-process), which mapped_hyph can then use. So we end up with a per-process memory footprint equivalent to the (uncompressed) size of the table. This is already a substantial reduction from the libhyphen footprint (it's around 75+% smaller in the case of en-US hyphenation, for example), but as a followup in bug 1487212, I intend to load these resources from omnijar into shared memory buffers so that all content processes can map the same buffer instead of duplicating the uncompressed data in each process.

BTW, the Rust code for the hyphenation library (including some tests etc.) is available in standalone form at https://github.com/jfkthame/mapped_hyph.

(In reply to Jonathan Kew (:jfkthame) from comment #1)

In this case, therefore, the initial patch here just decompresses the resource into a memory buffer (in-process), which mapped_hyph can then use. So we end up with a per-process memory footprint equivalent to the (uncompressed) size of the table. This is already a substantial reduction from the libhyphen footprint (it's around 75+% smaller in the case of en-US hyphenation, for example), but as a followup in bug 1487212, I intend to load these resources from omnijar into shared memory buffers so that all content processes can map the same buffer instead of duplicating the uncompressed data in each process.

Updated note re memory usage: it turns out that the omnijar resources are uncompressed these days, and the file is already mapped into memory. So the hyphenation code will simply get a pointer to the already-mapped data and use it directly, without needing to allocate any new (shared) buffer.

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14e64e208672
Add Rust implementation of hyphenation (mapped_hyph) and hook up in place of libhyphen. r=heycam

Argh... so it turns out that when hyphenation files are packaged in omnijar, the beginning of each individual resource is not necessarily aligned in any particular way. OK, that was a bad assumption; fortunately, it's easy to remove it.

Flags: needinfo?(jfkthame)

I've removed the 4-byte alignment expectation, and pushed a try job to check that the packaged builds no longer assert:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbb00261fa794fbb3b8e11ad6c61fc6e169879e6

(Unpackaged developer builds load the resources from standalone files, so this didn't show up when I tested locally just before landing. :\ )

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0ecfae072196
Add Rust implementation of hyphenation (mapped_hyph) and hook up in place of libhyphen. r=heycam

Backed out for reftest failures on 1507661-spurious-hyphenation-after-explicit.htm

backout: https://hg.mozilla.org/integration/autoland/rev/98da6aa7a8a53528d2efc94e77b8024dd1646fd0

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=android%2C7.0%2Cx86-64%2Cdebug%2Creftests%2Ctest-android-em-7.0-x86_64%2Fdebug-geckoview-reftest-e10s-2%2Cr%28r2%29&tochange=aa04697d9a3f061726fd6acec930bfaa4062d290&fromchange=acfd160d39982d163d7fddbee91e79bf221841b9&selectedJob=275403478

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275403444&repo=autoland&lineNumber=10772

[task 2019-11-08T21:12:02.585Z] 21:12:02 INFO - REFTEST TEST-START | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html == http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html
[task 2019-11-08T21:12:02.585Z] 21:12:02 INFO - REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html | 1862 / 7960 (23%)
[task 2019-11-08T21:18:45.938Z] 21:18:45 INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
[task 2019-11-08T21:18:46.041Z] 21:18:46 INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
[task 2019-11-08T21:18:46.041Z] 21:18:46 INFO - REFTEST TEST-INFO | started process screentopng
[task 2019-11-08T21:18:46.318Z] 21:18:46 INFO - REFTEST TEST-INFO | screentopng: exit 0
[task 2019-11-08T21:19:00.654Z] 21:19:00 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html == http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html | application timed out after 370 seconds with no output
[task 2019-11-08T21:19:00.654Z] 21:19:00 INFO - remoteautomation.py | Application ran for: 0:13:33.861700
[task 2019-11-08T21:19:01.405Z] 21:19:01 INFO - REFTEST INFO | Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpTd4D__/19d91276-6880-11b1-6bc7-7b6b36770f18.dmp /builds/worker/workspace/build/symbols
[task 2019-11-08T21:19:06.047Z] 21:19:06 INFO - REFTEST INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/19d91276-6880-11b1-6bc7-7b6b36770f18.dmp
[task 2019-11-08T21:19:06.047Z] 21:19:06 INFO - REFTEST INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/19d91276-6880-11b1-6bc7-7b6b36770f18.extra
[task 2019-11-08T21:19:06.051Z] 21:19:06 WARNING - REFTEST PROCESS-CRASH | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html == http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html | application crashed [@ libc.so + 0x8c66a]

Flags: needinfo?(jfkthame)

Sigh... now it's an Android-only failure, apparently. Looks like maybe we don't package the hyphenation resources in the same way there... will investigate further.

Oh - I think I know... we don't package the resources at all, they're supplied separately as downloadable content (DLC). So we have to do a bit more work to make the new-format .hyf files available via that mechanism too.

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/97b8c3759aae
Add Rust implementation of hyphenation (mapped_hyph) and hook up in place of libhyphen. r=heycam
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Assignee: nobody → jfkthame
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6387f7830d66
Backed out 2 changesets (bug 1590167, bug 1575008) for lints failures at mapped_hyph.h on a CLOSED TREE.

Backed out 2 changesets (bug 1590167, bug 1575008) for lints failures at mapped_hyph.h on a CLOSED TREE.

Backout link: Backed out 2 changesets (bug 1590167, bug 1575008) for lints failures at mapped_hyph.h on a CLOSED TREE.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=97b8c3759aae60e6513fe0c2c7e399c0814081b3&selectedJob=276060227

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276060227&repo=autoland&lineNumber=470

Log snippet:
[task 2019-11-13T19:28:06.061Z] ./source-repo.h.stub
[task 2019-11-13T19:28:06.289Z] intl/hyphenation/glue/mapped_hyph.h.stub
[task 2019-11-13T19:28:06.289Z] Traceback (most recent call last):
[task 2019-11-13T19:28:06.289Z] File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
[task 2019-11-13T19:28:06.291Z] "main", fname, loader, pkg_name)
[task 2019-11-13T19:28:06.292Z] File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
[task 2019-11-13T19:28:06.293Z] exec code in run_globals
[task 2019-11-13T19:28:06.294Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/action/file_generate.py", line 121, in <module>
[task 2019-11-13T19:28:06.294Z] sys.exit(main(sys.argv[1:]))
[task 2019-11-13T19:28:06.295Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/action/file_generate.py", line 71, in main
[task 2019-11-13T19:28:06.295Z] ret = module.dict[method](output, *args.additional_arguments, **kwargs)
[task 2019-11-13T19:28:06.295Z] File "/builds/worker/checkouts/gecko/layout/style/RunCbindgen.py", line 24, in generate
[task 2019-11-13T19:28:06.295Z] env['CARGO'] = str(buildconfig.substs['CARGO'])
[task 2019-11-13T19:28:06.295Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/backend/configenvironment.py", line 286, in getitem
[task 2019-11-13T19:28:06.295Z] raise KeyError("'%s'" % key)
[task 2019-11-13T19:28:06.295Z] KeyError: "'CARGO'"
[task 2019-11-13T19:28:06.296Z] backend.mk:11: recipe for target '.deps/mapped_hyph.h.stub' failed
[task 2019-11-13T19:28:06.296Z] make[4]: *** [.deps/mapped_hyph.h.stub] Error 1
[task 2019-11-13T19:28:06.296Z] /builds/worker/checkouts/gecko/config/recurse.mk:101: recipe for target 'intl/hyphenation/glue/export' failed
[task 2019-11-13T19:28:06.296Z] make[3]: *** [intl/hyphenation/glue/export] Error 2
[task 2019-11-13T19:28:06.296Z] make[3]: *** Waiting for unfinished jobs....
[task 2019-11-13T19:28:06.297Z] xpcom/xpcom-config.h.stub
[task 2019-11-13T19:28:06.297Z] build/application.ini.stub
[task 2019-11-13T19:28:06.301Z] security/apps/xpcshell.inc.stub
[task 2019-11-13T19:28:06.458Z] xpcom/xpcom-private.h.stub
[task 2019-11-13T19:28:06.466Z] security/apps/addons-public.inc.stub
[task 2019-11-13T19:28:06.468Z] build/application.ini.h.stub
[task 2019-11-13T19:28:06.476Z] security/apps/addons-public-intermediate.inc.stub
[task 2019-11-13T19:28:06.577Z] security/apps/addons-stage.inc.stub
[task 2019-11-13T19:28:06.578Z] /builds/worker/checkouts/gecko/config/recurse.mk:32: recipe for target 'export' failed
[task 2019-11-13T19:28:06.578Z] make[2]: *** [export] Error 2
[task 2019-11-13T19:28:06.578Z] /builds/worker/checkouts/gecko/config/rules.mk:389: recipe for target 'default' failed
[task 2019-11-13T19:28:06.578Z] make[1]: *** [default] Error 2
[task 2019-11-13T19:28:06.579Z] client.mk:125: recipe for target 'build' failed
[task 2019-11-13T19:28:06.579Z] make: *** [build] Error 2

Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f0968dabe1ff
Add Rust implementation of hyphenation (mapped_hyph) and hook up in place of libhyphen. r=heycam

Backed out changeset f0968dabe1ff (bug 1590167) for build bustage at force-cargo-library-build on a CLOSED TREE.

Backout link: https://hg.mozilla.org/integration/autoland/rev/4264dbdf0850c10c6f36806461126c52bd9b74f9

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=f0968dabe1ff713ae3f28861dae1ca6524aafd80&selectedJob=276091033

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276091033&repo=autoland&lineNumber=12921

Log snippet:

[task 2019-11-13T23:14:01.997Z] 23:14:01 INFO - /builds/worker/fetches/rustc/bin/cargo rustc --frozen --manifest-path /builds/worker/workspace/build/src/toolkit/library/gtest/rust/Cargo.toml -vv --lib --target=i686-unknown-linux-gnu --features 'gecko_debug gecko_refcount_logging quantum_render cubeb_pulse_rust simd-accel cubeb-remoting moz_memory moz_places spidermonkey_rust cranelift_x86 gecko_profiler gecko_profiler_parse_elf new_xulstore new_cert_storage webrtc' --
[task 2019-11-13T23:14:01.998Z] 23:14:01 INFO - warning: /builds/worker/workspace/build/src/security/manager/ssl/cert_storage/Cargo.toml: dependency (rkv) specification is ambiguous. Only one of branch, tag or rev is allowed. This will be considered an error in future versions
[task 2019-11-13T23:14:01.998Z] 23:14:01 INFO - Blocking waiting for file lock on package cache
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - Blocking waiting for file lock on package cache
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - Blocking waiting for file lock on build directory
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - error: the listed checksum of /builds/worker/workspace/build/src/third_party/rust/mapped_hyph/tests/compound4.hyf has changed:
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - expected: 2093287bc41ee30ff9bdbf278f1f8209cb1d1a78236b46e9060af2a881572b8e
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - directory sources are not intended to be edited, if modifications are required then it is recommended that [replace] is used with a forked copy of the source
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - /builds/worker/workspace/build/src/config/makefiles/rust.mk:280: recipe for target 'force-cargo-library-build' failed
[task 2019-11-13T23:14:02.001Z] 23:14:02 ERROR - make[4]: *** [force-cargo-library-build] Error 101
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/toolkit/library/gtest/rust'
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - /builds/worker/workspace/build/src/config/recurse.mk:74: recipe for target 'toolkit/library/gtest/rust/target' failed
[task 2019-11-13T23:14:02.002Z] 23:14:02 ERROR - make[3]: *** [toolkit/library/gtest/rust/target] Error 2
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - make[3]: *** Waiting for unfinished jobs....
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/config/external/icu/common'
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - config/external/icu/common/servrbf.o

Huh, that's weird.

 [task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - error: the listed checksum of /builds/worker/workspace/build/src/third_party/rust/mapped_hyph/tests/compound4.hyf has changed:
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - expected: 2093287bc41ee30ff9bdbf278f1f8209cb1d1a78236b46e9060af2a881572b8e
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

There hasn't been any change to that file at all recently (last modified upstream 20 days ago). And a local build on my machine worked fine just now.

Guess I'll try re-running ./mach vendor rust and see what that does.

Flags: needinfo?(jfkthame)
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a78842ddeb29
Add Rust implementation of hyphenation (mapped_hyph) and hook up in place of libhyphen. r=heycam
See Also: → 1875444
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: