Replace libhyphen with a new implementation for better performance and smaller memory footprint
Categories
(Core :: Layout: Text and Fonts, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: jfkthame, Assigned: jfkthame)
References
Details
Attachments
(1 file)
In bug 1487212, we'd like to share hyphenation data across processes, rather than having each process load its own copy. However, the runtime structures used by libhyphen are not at all suited to sharing.
To make this possible, I have created a new implementation ("mapped_hyph") of the algorithm that uses a flat, relocatable representation of the hyphenation state machine (instead of the pointer-based graph of structs used by libhyphen), suitable for use directly from an mmap'd file or for sharing between processes using shared memory.
This implementation requires the hyphenation dictionaries (pattern files) to be precompiled into a representation of the finite state machine that can be used directly by the runtime algorithm. The resulting compiled files are typically around twice the size of the original pattern files, so this will somewhat increase the on-disk footprint; but on the other hand, runtime memory footprint is substantially smaller.
When libhyphen loads a pattern file, a dictionary such as hyph_en_US.dic which is 107K on disk expands to nearly 900K in memory (in each process that uses it). In contrast, mapped_hyph's representation is 195K on disk, but requires no additional memory at runtime.
As a bonus, the new implementation is also somewhat faster. As a testcase, I used the Inspector to add hyphens:auto
to the <body> element of a long wikipedia page, and then profiled toggling this property on and off a couple of times.
With libhyphen (current trunk code), the reflows with hyphens:auto
enabled take 152ms.
With mapped_hyph, the hyphenated reflow time is reduced to 134ms.
Assignee | ||
Comment 1•5 years ago
|
||
Regarding loading of hyphenation dictionaries: in unpackaged developer builds, these files are found in a /hyphenation/ directory, and so mapped_hyph can use them directly by mmap'ing the files. (This would also be the case for resources added to a Firefox installation via an add-on or similar.) However, in packaged builds as shipped to users, the hyphenation resources are stored in omnijar, where they'll be compressed, and so we cannot directly mmap them.
In this case, therefore, the initial patch here just decompresses the resource into a memory buffer (in-process), which mapped_hyph can then use. So we end up with a per-process memory footprint equivalent to the (uncompressed) size of the table. This is already a substantial reduction from the libhyphen footprint (it's around 75+% smaller in the case of en-US hyphenation, for example), but as a followup in bug 1487212, I intend to load these resources from omnijar into shared memory buffers so that all content processes can map the same buffer instead of duplicating the uncompressed data in each process.
Assignee | ||
Comment 2•5 years ago
|
||
Assignee | ||
Comment 3•5 years ago
|
||
BTW, the Rust code for the hyphenation library (including some tests etc.) is available in standalone form at https://github.com/jfkthame/mapped_hyph.
Assignee | ||
Comment 4•5 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #1)
In this case, therefore, the initial patch here just decompresses the resource into a memory buffer (in-process), which mapped_hyph can then use. So we end up with a per-process memory footprint equivalent to the (uncompressed) size of the table. This is already a substantial reduction from the libhyphen footprint (it's around 75+% smaller in the case of en-US hyphenation, for example), but as a followup in bug 1487212, I intend to load these resources from omnijar into shared memory buffers so that all content processes can map the same buffer instead of duplicating the uncompressed data in each process.
Updated note re memory usage: it turns out that the omnijar resources are uncompressed these days, and the file is already mapped into memory. So the hyphenation code will simply get a pointer to the already-mapped data and use it directly, without needing to allocate any new (shared) buffer.
Comment 6•5 years ago
|
||
Backed out changeset 14e64e208672 for causing failures in spurious-hyphenation-after-explicit.html and nsHyphenator.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/e5afdf94e02f05b1e6fca03cb9a3104407cf04ad
Failure logs:
Comment 7•5 years ago
•
|
||
Failure on webrender jobs as well:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275319244&repo=autoland&lineNumber=17216
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275319671&repo=autoland&lineNumber=44437
Also seeing assertions fail: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275319303&repo=autoland&lineNumber=10959
Assignee | ||
Comment 8•5 years ago
|
||
Argh... so it turns out that when hyphenation files are packaged in omnijar, the beginning of each individual resource is not necessarily aligned in any particular way. OK, that was a bad assumption; fortunately, it's easy to remove it.
Assignee | ||
Comment 9•5 years ago
|
||
I've removed the 4-byte alignment expectation, and pushed a try job to check that the packaged builds no longer assert:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbb00261fa794fbb3b8e11ad6c61fc6e169879e6
(Unpackaged developer builds load the resources from standalone files, so this didn't show up when I tested locally just before landing. :\ )
Comment 10•5 years ago
|
||
Comment 11•5 years ago
|
||
Backed out for reftest failures on 1507661-spurious-hyphenation-after-explicit.htm
backout: https://hg.mozilla.org/integration/autoland/rev/98da6aa7a8a53528d2efc94e77b8024dd1646fd0
failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=275403444&repo=autoland&lineNumber=10772
[task 2019-11-08T21:12:02.585Z] 21:12:02 INFO - REFTEST TEST-START | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html == http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html
[task 2019-11-08T21:12:02.585Z] 21:12:02 INFO - REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html | 1862 / 7960 (23%)
[task 2019-11-08T21:18:45.938Z] 21:18:45 INFO - wait for org.mozilla.geckoview.test complete; top activity=org.mozilla.geckoview.test
[task 2019-11-08T21:18:46.041Z] 21:18:46 INFO - org.mozilla.geckoview.test unexpectedly found running. Killing...
[task 2019-11-08T21:18:46.041Z] 21:18:46 INFO - REFTEST TEST-INFO | started process screentopng
[task 2019-11-08T21:18:46.318Z] 21:18:46 INFO - REFTEST TEST-INFO | screentopng: exit 0
[task 2019-11-08T21:19:00.654Z] 21:19:00 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html == http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html | application timed out after 370 seconds with no output
[task 2019-11-08T21:19:00.654Z] 21:19:00 INFO - remoteautomation.py | Application ran for: 0:13:33.861700
[task 2019-11-08T21:19:01.405Z] 21:19:01 INFO - REFTEST INFO | Copy/paste: /builds/worker/workspace/build/linux64-minidump_stackwalk /tmp/tmpTd4D__/19d91276-6880-11b1-6bc7-7b6b36770f18.dmp /builds/worker/workspace/build/symbols
[task 2019-11-08T21:19:06.047Z] 21:19:06 INFO - REFTEST INFO | Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/19d91276-6880-11b1-6bc7-7b6b36770f18.dmp
[task 2019-11-08T21:19:06.047Z] 21:19:06 INFO - REFTEST INFO | Saved app info as /builds/worker/workspace/build/blobber_upload_dir/19d91276-6880-11b1-6bc7-7b6b36770f18.extra
[task 2019-11-08T21:19:06.051Z] 21:19:06 WARNING - REFTEST PROCESS-CRASH | http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1.html == http://10.0.2.2:8854/tests/layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html | application crashed [@ libc.so + 0x8c66a]
Assignee | ||
Comment 12•5 years ago
|
||
Sigh... now it's an Android-only failure, apparently. Looks like maybe we don't package the hyphenation resources in the same way there... will investigate further.
Oh - I think I know... we don't package the resources at all, they're supplied separately as downloadable content (DLC). So we have to do a bit more work to make the new-format .hyf files available via that mechanism too.
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
Backed out 2 changesets (bug 1590167, bug 1575008) for lints failures at mapped_hyph.h on a CLOSED TREE.
Backout link: Backed out 2 changesets (bug 1590167, bug 1575008) for lints failures at mapped_hyph.h on a CLOSED TREE.
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=97b8c3759aae60e6513fe0c2c7e399c0814081b3&selectedJob=276060227
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276060227&repo=autoland&lineNumber=470
Log snippet:
[task 2019-11-13T19:28:06.061Z] ./source-repo.h.stub
[task 2019-11-13T19:28:06.289Z] intl/hyphenation/glue/mapped_hyph.h.stub
[task 2019-11-13T19:28:06.289Z] Traceback (most recent call last):
[task 2019-11-13T19:28:06.289Z] File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
[task 2019-11-13T19:28:06.291Z] "main", fname, loader, pkg_name)
[task 2019-11-13T19:28:06.292Z] File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
[task 2019-11-13T19:28:06.293Z] exec code in run_globals
[task 2019-11-13T19:28:06.294Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/action/file_generate.py", line 121, in <module>
[task 2019-11-13T19:28:06.294Z] sys.exit(main(sys.argv[1:]))
[task 2019-11-13T19:28:06.295Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/action/file_generate.py", line 71, in main
[task 2019-11-13T19:28:06.295Z] ret = module.dict[method](output, *args.additional_arguments, **kwargs)
[task 2019-11-13T19:28:06.295Z] File "/builds/worker/checkouts/gecko/layout/style/RunCbindgen.py", line 24, in generate
[task 2019-11-13T19:28:06.295Z] env['CARGO'] = str(buildconfig.substs['CARGO'])
[task 2019-11-13T19:28:06.295Z] File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/backend/configenvironment.py", line 286, in getitem
[task 2019-11-13T19:28:06.295Z] raise KeyError("'%s'" % key)
[task 2019-11-13T19:28:06.295Z] KeyError: "'CARGO'"
[task 2019-11-13T19:28:06.296Z] backend.mk:11: recipe for target '.deps/mapped_hyph.h.stub' failed
[task 2019-11-13T19:28:06.296Z] make[4]: *** [.deps/mapped_hyph.h.stub] Error 1
[task 2019-11-13T19:28:06.296Z] /builds/worker/checkouts/gecko/config/recurse.mk:101: recipe for target 'intl/hyphenation/glue/export' failed
[task 2019-11-13T19:28:06.296Z] make[3]: *** [intl/hyphenation/glue/export] Error 2
[task 2019-11-13T19:28:06.296Z] make[3]: *** Waiting for unfinished jobs....
[task 2019-11-13T19:28:06.297Z] xpcom/xpcom-config.h.stub
[task 2019-11-13T19:28:06.297Z] build/application.ini.stub
[task 2019-11-13T19:28:06.301Z] security/apps/xpcshell.inc.stub
[task 2019-11-13T19:28:06.458Z] xpcom/xpcom-private.h.stub
[task 2019-11-13T19:28:06.466Z] security/apps/addons-public.inc.stub
[task 2019-11-13T19:28:06.468Z] build/application.ini.h.stub
[task 2019-11-13T19:28:06.476Z] security/apps/addons-public-intermediate.inc.stub
[task 2019-11-13T19:28:06.577Z] security/apps/addons-stage.inc.stub
[task 2019-11-13T19:28:06.578Z] /builds/worker/checkouts/gecko/config/recurse.mk:32: recipe for target 'export' failed
[task 2019-11-13T19:28:06.578Z] make[2]: *** [export] Error 2
[task 2019-11-13T19:28:06.578Z] /builds/worker/checkouts/gecko/config/rules.mk:389: recipe for target 'default' failed
[task 2019-11-13T19:28:06.578Z] make[1]: *** [default] Error 2
[task 2019-11-13T19:28:06.579Z] client.mk:125: recipe for target 'build' failed
[task 2019-11-13T19:28:06.579Z] make: *** [build] Error 2
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Backed out changeset f0968dabe1ff (bug 1590167) for build bustage at force-cargo-library-build on a CLOSED TREE.
Backout link: https://hg.mozilla.org/integration/autoland/rev/4264dbdf0850c10c6f36806461126c52bd9b74f9
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=f0968dabe1ff713ae3f28861dae1ca6524aafd80&selectedJob=276091033
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276091033&repo=autoland&lineNumber=12921
Log snippet:
[task 2019-11-13T23:14:01.997Z] 23:14:01 INFO - /builds/worker/fetches/rustc/bin/cargo rustc --frozen --manifest-path /builds/worker/workspace/build/src/toolkit/library/gtest/rust/Cargo.toml -vv --lib --target=i686-unknown-linux-gnu --features 'gecko_debug gecko_refcount_logging quantum_render cubeb_pulse_rust simd-accel cubeb-remoting moz_memory moz_places spidermonkey_rust cranelift_x86 gecko_profiler gecko_profiler_parse_elf new_xulstore new_cert_storage webrtc' --
[task 2019-11-13T23:14:01.998Z] 23:14:01 INFO - warning: /builds/worker/workspace/build/src/security/manager/ssl/cert_storage/Cargo.toml: dependency (rkv) specification is ambiguous. Only one of branch
, tag
or rev
is allowed. This will be considered an error in future versions
[task 2019-11-13T23:14:01.998Z] 23:14:01 INFO - Blocking waiting for file lock on package cache
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - Blocking waiting for file lock on package cache
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - Blocking waiting for file lock on build directory
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - error: the listed checksum of /builds/worker/workspace/build/src/third_party/rust/mapped_hyph/tests/compound4.hyf
has changed:
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - expected: 2093287bc41ee30ff9bdbf278f1f8209cb1d1a78236b46e9060af2a881572b8e
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - directory sources are not intended to be edited, if modifications are required then it is recommended that [replace] is used with a forked copy of the source
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - /builds/worker/workspace/build/src/config/makefiles/rust.mk:280: recipe for target 'force-cargo-library-build' failed
[task 2019-11-13T23:14:02.001Z] 23:14:02 ERROR - make[4]: *** [force-cargo-library-build] Error 101
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/toolkit/library/gtest/rust'
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - /builds/worker/workspace/build/src/config/recurse.mk:74: recipe for target 'toolkit/library/gtest/rust/target' failed
[task 2019-11-13T23:14:02.002Z] 23:14:02 ERROR - make[3]: *** [toolkit/library/gtest/rust/target] Error 2
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - make[3]: *** Waiting for unfinished jobs....
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/config/external/icu/common'
[task 2019-11-13T23:14:02.002Z] 23:14:02 INFO - config/external/icu/common/servrbf.o
Assignee | ||
Comment 19•5 years ago
|
||
Huh, that's weird.
[task 2019-11-13T23:14:01.999Z] 23:14:01 INFO - error: the listed checksum of /builds/worker/workspace/build/src/third_party/rust/mapped_hyph/tests/compound4.hyf has changed:
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - expected: 2093287bc41ee30ff9bdbf278f1f8209cb1d1a78236b46e9060af2a881572b8e
[task 2019-11-13T23:14:02.001Z] 23:14:02 INFO - actual: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
There hasn't been any change to that file at all recently (last modified upstream 20 days ago). And a local build on my machine worked fine just now.
Guess I'll try re-running ./mach vendor rust
and see what that does.
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
bugherder |
Description
•