Investigating we can use Rust based xi-unicode as Linebreaker

NEW
Assigned to

Status

()

defect
P3
normal
3 years ago
a year ago

People

(Reporter: m_kato, Assigned: m_kato)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(firefox50 affected)

Details

Attachments

(2 attachments)

(Assignee)

Description

3 years ago
Actually, our legacy line breaker is too old.  Although I am considering that it replace with ICU (bug 820261), it causes that binary size (including data size) is 2-3+MB.

Now, servo uses xi-unicode, so we will be able to use it for replacement.
(Assignee)

Updated

3 years ago
Depends on: oxidation
(Assignee)

Comment 1

3 years ago
xi-unicode supports Unicode 8.0.  I send a PR to update to 9.0 (rev 37).  Also, our line breaker rule isn't same as UAX#14 and pretty omplex...
What would the codesize of xi-unicode?

I wonder whether we can replace our bidi engine with unicode-bidi crate as well.
Simon, thoughts?

+1 on sharing unicode handling with servo.
(Assignee)

Comment 4

3 years ago
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #2)
> What would the codesize of xi-unicode?
> 
> I wonder whether we can replace our bidi engine with unicode-bidi crate as
> well.

This data is on Linux 64 opt.  about 30KB.

Before: libxul.so ...  93709728 bytes
After:  libxul.so ...  93738960 bytes

This data is just UAX#14 rule only.  We need additional rules for historical fix.
(Assignee)

Comment 5

3 years ago
(WIP doesn't remove old table to 8bit character. So If removing old table, it reduces increase of size.)
(Assignee)

Comment 6

3 years ago
The following test isn't same rule against UAX#14.

REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/between-whitespaces.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/chemical-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/datetime-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/emoji-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/hyphens-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/hyphens-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/leaders-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/markup-src-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/parentheses-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/quotationmarks-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/smileys-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/smileys-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-3.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/winpath-1.html

We need add more rules for old compatibility...
> I wonder whether we can replace our bidi engine with unicode-bidi crate as well.

I would not recommend this right now. unicode-bidi is not quite spec compliant yet, and it also is missing some significant optimizations compared to Gecko. (I consulted the Gecko code while writing the unicode-bidi crate, and there are issues filed for the optimization work.)
(Assignee)

Updated

3 years ago
Priority: -- → P3
Comment on attachment 8793196 [details] [diff] [review]
WIP: Part 1. Import xi-unicode and capi to our tree

Review of attachment 8793196 [details] [diff] [review]:
-----------------------------------------------------------------

Do we need to vendor xi-unicode inside intl/?

In case you wish to edit xi-unicode, this may make sense (but then you'd need to come up with a syncing mechanism).

If you don't (I personally prefer this -- if you do need to make a change to xi-unicode best do it upstream), using the solution for rust-url-capi and mp4parse-capi would be better -- use the regular Rust build system cargo file.

We did this with rust-url-capi (https://hg.mozilla.org/mozilla-central/rev/54a28d6fbed4, parent commit contains the vendoring and is huge). In this case we had more transitive dependencies.

To vendor, add xi-unicode-capi as a path dependency to toolkit/library/rust, add xi-unicode as a regular version dependency to xi-unicode-capi; cd to the rust  folder and run cargo update; and then run `cargo vendor ../../../third_party/rust`. May need to do something similar for gtest.
(In reply to Manish Goregaokar [:manishearth] from comment #3)
> Simon, thoughts?
> 
> +1 on sharing unicode handling with servo.

More Rust code in Gecko and more code shared with Servo sounds good in general, but no opinion beyond that. I’m not familiar with xi-unicode (other than it exists and gives better results than what Servo had before) or Gecko’s line breaking code.
(Assignee)

Comment 12

3 years ago
Like Manish's comment (comment #10), I already create capi crate as https://github.com/makotokato/xi-unicode_capi for WIP.  Big problem is that line breaker rule into Gecko isn't same as UAX#14.  Due to a lot of histrionically reason and bugs (ex. adding URL breaking rule), it is too complex.  

But we may be able to use this when implementing line-break property in Gecko.  So I need discuss Ishii-san about strict line breaker rule.
You need to log in before you can comment on or make changes to this bug.