Sandbox hunspell using RLBox
Categories
(Core :: Spelling checker, enhancement)
Tracking
()
People
(Reporter: deian, Assigned: deian)
References
(Blocks 2 open bugs)
Details
(Keywords: perf-alert)
Attachments
(5 files, 3 obsolete files)
Use RLBox to WasmBox hunspell.
Assignee | ||
Comment 1•5 years ago
|
||
This patch only introduces the RLBox-wrapped hunspell (so uses the
NOOP-sandbox).
Assignee | ||
Comment 2•5 years ago
|
||
Depends on D84007
Assignee | ||
Comment 3•5 years ago
|
||
Depends on D84007
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Depends on D86063
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
Sorry for the delay with wrapping up the patches. We needed one more patch to pull in a hxx file from hunspell.
Everything should be set. And I ran @jkew's experiment:
- Load https://en.wikipedia.org/wiki/History_of_Western_civilization
- Set document.body.contentEditable = true
- Turn on profiler
- Scroll to bottom of page and click
On my machine, I can't perceive the difference in the delay.
Vanilla: https://share.firefox.dev/3lctVn6
Sandboxed: https://share.firefox.dev/2GKJvaC
Assignee | ||
Comment 6•5 years ago
|
||
When we WasmBox hunspell, MOZILLA_CLIENT is undefined and, so, hunspell
needs utf_info.hxx. This patch pulls in this header file.
Depends on D86063
Updated•5 years ago
|
Comment 7•5 years ago
|
||
There are definitely some issues with the patchset unfortunately.
This try run does not enable wasmboxing, just adds the first 3 patches:
https://treeherder.mozilla.org/jobs?repo=try&revision=b98af45327616998a6b6f75d529a17445db2841b&selectedTaskRun=C-LgYRZHQsug7er7TmKkGQ.0
https://treeherder.mozilla.org/push-health?repo=try&revision=b98af45327616998a6b6f75d529a17445db2841b&testGroup=pr&selectedTest=arenatDallocSmallarenachunktvoidarenachunkmapt&selectedTaskId=
- A lot of jemalloc failures; two different signatures
- 3 Android build failures (all looking like the same issue)
Assignee | ||
Comment 8•5 years ago
|
||
Fixed the double free. Sorry about that. hunspell_destroy freed the pointer so we didn't need to free_in_sandbox.
The build failure is a template specialization issue in RLBox. Creating separate bug for that.
Assignee | ||
Comment 9•5 years ago
|
||
Henry: Any chance I can get your take on us using hunspell with MOZ_CLIENT off? This would be using hunspell's utf table instead of the existing one in Firefox.
Comment 10•5 years ago
|
||
What impact would that have on the overall size?
Assignee | ||
Comment 11•5 years ago
|
||
Looking at the profiler data from https://bugzilla.mozilla.org/show_bug.cgi?id=1653659#c5 it seems like the memory usage is comparable.
Assignee | ||
Comment 12•5 years ago
|
||
Shravan kicked off try:
patches 1 to 3: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0a44728fd15dafff056278f5f67d67d8653d4a4
patches 1 to 4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5427fb7450933b54c095ca2a5d19faadb2b8dbd2
Comment 13•5 years ago
|
||
(In reply to Deian Stefan from comment #11)
Looking at the profiler data from https://bugzilla.mozilla.org/show_bug.cgi?id=1653659#c5 it seems like the memory usage is comparable.
I think Jonathan was talking about code size rather than memory usage.
Assignee | ||
Comment 14•5 years ago
|
||
Roughly a 1M increase in the bunzip:
$ du ./(wasm|vanilla)-build/dist/firefox-86.0a1.en-US.linux-x86_64.tar.bz2
102168 ./vanilla-build/dist/firefox-86.0a1.en-US.linux-x86_64.tar.bz2
102816 ./wasm-build/dist/firefox-86.0a1.en-US.linux-x86_64.tar.bz2
Comment 15•5 years ago
|
||
It seems unfortunate to increase the size so much (I think that'll equate to about a 2% increase in the download package size on Android, for example), if it's largely coming from duplication of character data/encoding tables that we already have in gecko. Is there some way we can avoid that?
Comment 16•5 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #15)
It seems unfortunate to increase the size so much (I think that'll equate to about a 2% increase in the download package size on Android, for example),
Particularly because I don't think we use hunspell on Android at all. We could special-case the platform, but I think we'd rather avoid a 1M increase of the Desktop installer too.
if it's largely coming from duplication of character data/encoding tables that we already have in gecko. Is there some way we can avoid that?
Yeah seems like if these are just read-only data tables, should be straightforward to expose them to the sandbox somehow?
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #16)
(In reply to Jonathan Kew (:jfkthame) from comment #15)
It seems unfortunate to increase the size so much (I think that'll equate to about a 2% increase in the download package size on Android, for example),
Particularly because I don't think we use hunspell on Android at all. We could special-case the platform, but I think we'd rather avoid a 1M increase of the Desktop installer too.
I was under the impression that we may already be doing this?
if it's largely coming from duplication of character data/encoding tables that we already have in gecko. Is there some way we can avoid that?
Yeah seems like if these are just read-only data tables, should be straightforward to expose them to the sandbox somehow?
We can do it, just have to modify hunspell. We need to expose functions like ToUpperCase. Most seem reasonable enough. The gnarly bits are in get_current_cs, but we can also just pull whole function this into a trusted callback. I can take a crack at this next week. My sense is that it might be easier to try to merge some of this back into hunspell? Our changes shouldn't really extend beyond the MOZILLA_CLIENT guards anyway.
Assignee | ||
Updated•5 years ago
|
Comment 18•5 years ago
|
||
(In reply to Deian Stefan from comment #17)
(In reply to Bobby Holley (:bholley) from comment #16)
(In reply to Jonathan Kew (:jfkthame) from comment #15)
It seems unfortunate to increase the size so much (I think that'll equate to about a 2% increase in the download package size on Android, for example),
Particularly because I don't think we use hunspell on Android at all. We could special-case the platform, but I think we'd rather avoid a 1M increase of the Desktop installer too.
I was under the impression that we may already be doing this?
if it's largely coming from duplication of character data/encoding tables that we already have in gecko. Is there some way we can avoid that?
Yeah seems like if these are just read-only data tables, should be straightforward to expose them to the sandbox somehow?
We can do it, just have to modify hunspell. We need to expose functions like ToUpperCase. Most seem reasonable enough. The gnarly bits are in get_current_cs, but we can also just pull whole function this into a trusted callback. I can take a crack at this next week. My sense is that it might be easier to try to merge some of this back into hunspell? Our changes shouldn't really extend beyond the MOZILLA_CLIENT guards anyway.
I think we should make whatever changes we need to hunspell (while obviously trying to err on the side of changing less). We can then figure out whether it's less work to either (1) upstream those changes, or (2) re-apply the changes if/when we ever pull in a hunspell update.
Comment 19•5 years ago
|
||
(In reply to Deian Stefan from comment #9)
Henry: Any chance I can get your take on us using hunspell with MOZ_CLIENT off? This would be using hunspell's utf table instead of the existing one in Firefox.
Primarily for code side reasons, my preference would be to make Gecko expose the relevant services to hunspell instead of having hunspell provide its own duplicate services.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 20•5 years ago
|
||
Got a clean try run on the first two patches and with the third.
Comment 21•5 years ago
|
||
Deian, just to confirm here, the patches do as you describe in #17, right? That is, they modify hunspell as needed so we don't duplicate the data tables and avoid increasing the binary size? If so i'll try to land patches 1 and 2.
Updated•5 years ago
|
Comment 23•5 years ago
|
||
![]() |
||
Comment 24•5 years ago
|
||
Backed out 2 changesets (bug 1653659) for Build bustage in builds/worker/workspace/obj-build/dist/include/mozilla/rlbox/rlbox_type_traits.hpp. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer?job_id=326720431&repo=autoland&lineNumber=13208
Push with failures:
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&revision=783310e1f5b82ef3b0c3ea9f66f564013ea1d0f0
Backout:
https://hg.mozilla.org/integration/autoland/rev/e923e68306f90f64be262b1c80ebafd30715ceea
Assignee | ||
Comment 25•5 years ago
|
||
Shravan: Can you look at this? I thought you addressed some of these
Comment 26•5 years ago
|
||
Hmm... this was fixed upstream rlbox, but the in-tree rlbox doesn't seem to have been updated. I need to follow up as to why.
But for now you can update your in-tree rlbox by running
cd mozilla-central/third_party/rlbox/
./update.sh
If you submit a build after this, the build should succeed
Comment 27•5 years ago
|
||
Oh, looks like the rlbox update hasn't been landed yet
https://phabricator.services.mozilla.com/D99989
@tom; Could you help with this?
Comment 29•5 years ago
|
||
Comment 30•5 years ago
|
||
bugherder |
Comment 31•5 years ago
|
||
I did a perf measurement
wasmbox: https://treeherder.mozilla.org/jobs?repo=try&revision=af138a2d8f5efe2aa8f15413191edee09b5eaaf8&selectedTaskRun=HQtufjEoQbyURu8Q5WlKHA.0
control: https://treeherder.mozilla.org/jobs?repo=try&revision=c7ff702e57508ec6e9f7b53364c62f49f830c6df&selectedTaskRun=Kq1m0dd9TdO0rR1eWlBZ2A.0
wasmbox control
1673 2440
1851 1944
1405 852
1460 1832
1309 1289
1539.6 1671.4
I pasted the contents of document.cpp (from searchfox) into a textarea with extensions.spellcheck.inline.max-misspellings
set to 500000. I recorded the time for DoSpellCheck in the profiler on my Macbook.
I can't entirely explain the differences here, maybe the perf differences would be exposed after a lot more trials, but it doesn't seem like wasmboxing hunspell produces a crazy difference.
Comment 32•5 years ago
|
||
I recorded the time for DoSpellCheck in the profiler on my Macbook.
Isn't DoSpellCheck
on the content-process side of things, which would be essentially unaffected here? AIUI, the actual use of hunspell happens in the parent via IPC, so the question would be what difference (if any) is there in parent-process time. Whatever extra time/work the sandboxing is costing would show up there, not in the content process. Maybe try looking at something like mozSpellChecker::CheckWord
in the parent?
Comment 33•5 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #32)
Whatever extra time/work the sandboxing is costing would show up there, not in the content process. Maybe try looking at something like
mozSpellChecker::CheckWord
in the parent?
Thanks! I checked CheckWord in the parent process and.... wasmbox is twice as fast. ¯_(ツ)_/¯
wasmbox control
48 111
55.7 98.7
39.7 101
43.7 105
33.5 118
44.12 106.74
Comment 34•5 years ago
|
||
That's curious.... I mean, that's great -- but it would be nice to understand why! (Do we end up using an entirely different allocator or something like that? I think hunspell does quite a bit of malloc/free-ing, so if the wasm'd version gets a better-performing allocator that could make a difference.)
Assignee | ||
Comment 35•5 years ago
|
||
Yep, that's probably it. In the sandbox we can't use the moz_alloc and free.
Comment 36•5 years ago
|
||
Now that we're in the new cycle, do you have any concerns about enabling this and letting it ride the trains?
Comment 37•5 years ago
|
||
Seems like it should be fine -- let's go for it.
Assignee | ||
Comment 38•5 years ago
|
||
@tjr rebased, good to go!
Comment 39•5 years ago
|
||
Comment 40•5 years ago
•
|
||
Backed out changeset for xpcshell failures on test_hunspell.js.
Failure log: https://treeherder.mozilla.org/logviewer?job_id=334031310&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/05ec5de62714ba35cfe468fcf7e49ed47081085b
Comment 41•5 years ago
•
|
||
There are also these: https://treeherder.mozilla.org/logviewer?job_id=334076976&repo=autoland&lineNumber=1052
![]() |
||
Comment 42•5 years ago
|
||
(In reply to Pulsebot from comment #39)
Pushed by tritter@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a137de215994
Part 3: Turn on Wasm sandboxing for hunspell. r=dmajor
== Change summary for alert #29385 (as of Mon, 22 Mar 2021 18:27:57 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
1% | installer size | osx-cross | 79,991,997.25 -> 80,452,306.00 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29385
![]() |
||
Updated•5 years ago
|
Comment 43•5 years ago
|
||
I don't know why; but sure enough on OSX there's a consistent fail Expect word foobar is spelled correctly
with the dictionary nosuggest on OSX.
The OSX Bpgo failure seems to be a packaging issue, I can dig into that.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 44•5 years ago
|
||
![]() |
||
Comment 45•5 years ago
|
||
Backed out changeset 1bbf93c8f424 (Bug 1653659) for causing xpcshell failures in test_hunspell.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/cbae78ec871e1b698f2051dd7cf0aeba0b5a9475
Push with failures, failure log.
![]() |
||
Comment 46•5 years ago
|
||
(In reply to Pulsebot from comment #44)
Pushed by tritter@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1bbf93c8f424
Part 3: Turn on Wasm sandboxing for hunspell. r=dmajor
== Change summary for alert #29494 (as of Mon, 29 Mar 2021 14:44:23 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
1% | installer size | osx-cross | 80,091,867.58 -> 80,640,593.08 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=29494
Assignee | ||
Comment 47•4 years ago
|
||
Depends on D114924
Updated•4 years ago
|
Assignee | ||
Comment 48•4 years ago
|
||
New patches should address the bug. Kicked off try: https://treeherder.mozilla.org/jobs?repo=try&revision=d5df9facfb9c4bcf711f5d5206a10d81fc7a95c8
Comment 49•4 years ago
|
||
![]() |
||
Comment 50•4 years ago
|
||
Backed out 2 changesets (bug 1653659) for Mochitest failures in mochitest/events/test_focus_general.html. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer?job_id=339745880&repo=autoland&lineNumber=2724
Push with failures:
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&revision=84821aa766abdebcf91be4913082218da63d96ca
Backout:
https://hg.mozilla.org/integration/autoland/rev/ef13365d8188c453af959b13198818f6dbffc833
Updated•4 years ago
|
Comment 51•4 years ago
|
||
Comment 52•4 years ago
|
||
Backed out for causing failures at test_hunspell.js.
Backout link: https://hg.mozilla.org/integration/autoland/rev/e0fcae5595f165eb92646a0fb7976580fcd6d0ed
Failure log: https://treeherder.mozilla.org/logviewer?job_id=339961716&repo=autoland&lineNumber=5278
Assignee | ||
Comment 53•4 years ago
|
||
Sorry for all the backouts. We need to land https://bugzilla.mozilla.org/show_bug.cgi?id=1710685 before we can land this.
![]() |
||
Comment 54•4 years ago
|
||
(In reply to Pulsebot from comment #49)
Pushed by tritter@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fac526423ccb
Part 3: Safely handle empty strings in RLBoxHunspell r=tjr
https://hg.mozilla.org/integration/autoland/rev/523e4bc2aa47
Part 4: Turn on Wasm sandboxing for hunspell r=dmajor
== Change summary for alert #30094 (as of Fri, 14 May 2021 16:50:48 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
1% | installer size | osx-cross | 79,883,220.08 -> 80,415,397.50 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30094
![]() |
||
Comment 55•4 years ago
|
||
(In reply to Pulsebot from comment #51)
Pushed by tritter@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b187a1b158df
Part 3: Only allow non-empty paths in RLBoxHunspell r=tjr
https://hg.mozilla.org/integration/autoland/rev/018859776d85
Part 4: Turn on Wasm sandboxing for hunspell r=dmajor
== Change summary for alert #30108 (as of Mon, 17 May 2021 18:37:39 GMT) ==
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
1% | installer size | osx-cross | 79,880,551.85 -> 80,485,449.42 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=30108
Assignee | ||
Comment 56•4 years ago
|
||
Depends on D116545
Assignee | ||
Comment 57•4 years ago
|
||
Depends on D116547
Assignee | ||
Comment 58•4 years ago
|
||
https://treeherder.mozilla.org/jobs?repo=try&revision=db6cff02184d0cc1dff5285acf5eff71d074b213
Can we abandon part 3 and 4 phabs (I can't access them anymore).
Updated•4 years ago
|
Comment 59•4 years ago
|
||
Comment 60•4 years ago
|
||
bugherder |
Comment 61•4 years ago
|
||
![]() |
||
Comment 62•4 years ago
|
||
bugherder |
Comment 63•4 years ago
|
||
I think the leave-open here was just an oversight and should have been cleared. Resolving as fixed.
Description
•