Closed Bug 1290022 Opened 8 years ago Closed 3 years ago

Investigating we can use Rust based xi-unicode as Linebreaker

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox50 --- affected

People

(Reporter: m_kato, Assigned: m_kato)

References

Details

Attachments

(2 files)

Actually, our legacy line breaker is too old.  Although I am considering that it replace with ICU (bug 820261), it causes that binary size (including data size) is 2-3+MB.

Now, servo uses xi-unicode, so we will be able to use it for replacement.
Depends on: oxidation
xi-unicode supports Unicode 8.0.  I send a PR to update to 9.0 (rev 37).  Also, our line breaker rule isn't same as UAX#14 and pretty omplex...
What would the codesize of xi-unicode?

I wonder whether we can replace our bidi engine with unicode-bidi crate as well.
Simon, thoughts?

+1 on sharing unicode handling with servo.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #2)
> What would the codesize of xi-unicode?
> 
> I wonder whether we can replace our bidi engine with unicode-bidi crate as
> well.

This data is on Linux 64 opt.  about 30KB.

Before: libxul.so ...  93709728 bytes
After:  libxul.so ...  93738960 bytes

This data is just UAX#14 rule only.  We need additional rules for historical fix.
(WIP doesn't remove old table to 8bit character. So If removing old table, it reduces increase of size.)
The following test isn't same rule against UAX#14.

REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/between-whitespaces.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/chemical-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/datetime-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/emoji-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/hyphens-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/hyphens-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/leaders-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/markup-src-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/parentheses-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/quotationmarks-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/smileys-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/smileys-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-3.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/winpath-1.html

We need add more rules for old compatibility...
> I wonder whether we can replace our bidi engine with unicode-bidi crate as well.

I would not recommend this right now. unicode-bidi is not quite spec compliant yet, and it also is missing some significant optimizations compared to Gecko. (I consulted the Gecko code while writing the unicode-bidi crate, and there are issues filed for the optimization work.)
Priority: -- → P3
Comment on attachment 8793196 [details] [diff] [review]
WIP: Part 1. Import xi-unicode and capi to our tree

Review of attachment 8793196 [details] [diff] [review]:
-----------------------------------------------------------------

Do we need to vendor xi-unicode inside intl/?

In case you wish to edit xi-unicode, this may make sense (but then you'd need to come up with a syncing mechanism).

If you don't (I personally prefer this -- if you do need to make a change to xi-unicode best do it upstream), using the solution for rust-url-capi and mp4parse-capi would be better -- use the regular Rust build system cargo file.

We did this with rust-url-capi (https://hg.mozilla.org/mozilla-central/rev/54a28d6fbed4, parent commit contains the vendoring and is huge). In this case we had more transitive dependencies.

To vendor, add xi-unicode-capi as a path dependency to toolkit/library/rust, add xi-unicode as a regular version dependency to xi-unicode-capi; cd to the rust  folder and run cargo update; and then run `cargo vendor ../../../third_party/rust`. May need to do something similar for gtest.
(In reply to Manish Goregaokar [:manishearth] from comment #3)
> Simon, thoughts?
> 
> +1 on sharing unicode handling with servo.

More Rust code in Gecko and more code shared with Servo sounds good in general, but no opinion beyond that. I’m not familiar with xi-unicode (other than it exists and gives better results than what Servo had before) or Gecko’s line breaking code.
Like Manish's comment (comment #10), I already create capi crate as https://github.com/makotokato/xi-unicode_capi for WIP.  Big problem is that line breaker rule into Gecko isn't same as UAX#14.  Due to a lot of histrionically reason and bugs (ex. adding URL breaking rule), it is too complex.  

But we may be able to use this when implementing line-break property in Gecko.  So I need discuss Ishii-san about strict line breaker rule.

I decide that I can rewrite new line breaker next year. closed as WONTFIX.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: