Open Bug 1560038 Opened 1 year ago Updated 3 hours ago

Replace Fluent.jsm with fluent-rs in Gecko

Categories

(Core :: Internationalization, task, P2)

task

Tracking

()

ASSIGNED

People

(Reporter: zbraniecki, Assigned: zbraniecki, NeedInfo)

References

(Depends on 2 open bugs, Blocks 2 open bugs)

Details

Attachments

(9 files, 7 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

As part of bug 1441035, we'd like to move away from using JS based Fluent parser/resolver to use Rust backed one.

I'm going to experiment with it here, but likely will have to spin off more specific bugs for things like Localization (fluent-fallback), and L10nRegistry (l10nregistry-rs).

Assignee: nobody → gandalf
Blocks: 1441035
Status: NEW → ASSIGNED
Priority: -- → P2

Bugbug thinks this bug is a task, but please change it back in case of error.

Type: defect → task
Attachment #9074690 - Attachment description: Bug 1560038 - Add XPIDL mozIFluentResource powered by Rust. → Bug 1560038 - Add fluent-rs backed FluentBundle and FluentResource WebIDL.

Question in phabritcator for :emilio, but NI in bugzilla. Yay.

Flags: needinfo?(emilio)

Is there any chance I could get a try build or some version of your patch I could apply locally? I can apply the first two revisions, but they seem to need a rebase. If I rebase manually and try to manually patch your last revision it still doesn't apply cleanly.

Flags: needinfo?(emilio) → needinfo?(gandalf)

According to our conversation on IRC only the last revision is needed, and I was on a slightly old central so I needed a rebase to apply it.

Flags: needinfo?(gandalf) → needinfo?(emilio)

Hopefully that does the trick.

Flags: needinfo?(emilio)

It did! I incorporated it into my patchset and today added support for l10nArgs. Seems like it all starts coming together.

The next step is to add an API to handle Intl formatting using ECMA402 from Rust.

Attached file perf.js (obsolete) —

I updated the patchset today to use an res-generic branch which I hope to land into fluent-rs 0.7.

I then constructed a mock script that emulates the experience of loading preferences. It's not a real timing because a lot of the operations happen async and don't really impact the talos numbers, so treat it more as a microbenchmark.

I'm attaching the script I'm using. To test the m-c JS version, just launch it in browser console. To test Rust, comment out the ChromeUtils.import and replace the FluentResource.fromSource with new FluentResource line.

Perf numbers:
JS: 15-24ms
Rust: 1.4-1.7ms

Here are profiles for both:

  1. JS: https://perfht.ml/32haRLp
  2. Rust: https://perfht.ml/32haHDN

What's also interesting is the memory:

  1. JS: 2.04Mb
  2. Rust: 606Kb

:emilio - would you have a moment to take a look at the Rust profile and check if it looks reasonable?
I plan to continue working on my patch (it's the https://phabricator.services.mozilla.com/D36286 ) but knowing if I'm doing some major mistakes at this point on the FFI boundry (by doing unnecessary allocations etc.) would be helpful.

Flags: needinfo?(emilio)

Commented on the patch.

Flags: needinfo?(emilio)
Depends on: 1449861
Attachment #9072728 - Attachment is obsolete: true
Attachment #9072729 - Attachment is obsolete: true
Attachment #9075418 - Attachment is obsolete: true

:emilio, :smaug - I need a bit of help here.

I need to add an opaque object named FluentPattern that gets returned from FluentBundle::GetMessage as part of FluentMessage shape, and then can be passed to FluentBundle::FormatPattern.

I have a patch that applies on central, that does almost that, but in FluentBundle.webidl, when I switch one line:

diff --git a/dom/chrome-webidl/FluentBundle.webidl b/dom/chrome-webidl/FluentBundle.webidl
--- a/dom/chrome-webidl/FluentBundle.webidl
+++ b/dom/chrome-webidl/FluentBundle.webidl
@@ -21,8 +21,8 @@ dictionary AttributeNamePattern {
 };
 
 dictionary FluentMessage {
-  boolean value;
-  // FluentPattern? value = null;
+  //boolean value;
+  FluentPattern? value = null;
   // sequence<AttributeNamePattern>? attributes = null;
 };

my build errors with pretty confusing messages: https://paste.mozilla.org/124qaUcW

It seems to start claiming that FluentMessage doesn't exist and OwningStringOrDouble is undefined.

I suspect that there's some problem with my opaque interface vs. dictionary in IDL because I can easily make my methods return FluentPattern, and this only happens when I try to make FluentPattern a field on the dictionary, but the debug log is unhelpful.

Can you help me pls?

Flags: needinfo?(emilio)
Flags: needinfo?(bugs)

You seem to be missing an include or a forward declaration. The WebIDL change probably just changed the order of includes.

Flags: needinfo?(emilio)

You seem to be missing an include or a forward declaration. The WebIDL change probably just changed the order of includes.

I don't understand how it would be possible. My FluentBundle.h includes FluentBundleBinding.h which has the FluentMessage declaration (in mozilla::dom).

And FluentBundle.h has using namespace mozilla::dom; so it should recognize FluentMessage and yet it doesn't. It looks like it should work well :/

Okay, I added forward declarations:

namespace dom {
struct FluentMessage;
struct L10nMessage;
class OwningStringOrDouble;
}

into FluentBundle.h, but it feels weird that I need that since I am including the FluentBundleBinding.h and LocalizationBinding.h :/ I'll keep a note for my reviewer.

Flags: needinfo?(bugs)
Attachment #9077006 - Attachment is obsolete: true
Attached file perf.js (obsolete) —

I did some perf measurements with the fluent-rs 0.7 which I released today.

I also added resolver measurements for using the bundle-writer branch which I'm considering for 0.8 [0].

m-c:

perf(parse): 14.223429999998189
perf(parse): 14.284325000000536
perf(parse): 14.540483999999196

perf(resolve): 5.433333999999377
perf(resolve): 5.521090999999615
perf(resolve): 5.435711000000083

Fluent-rs:
perf(parse): 1.3419030000004568
perf(parse): 1.4342740000010963
perf(parse): 1.4273840000005293

perf(resolve): 1.3891739999999118
perf(resolve): 1.404367999999522
perf(resolve): 1.1638170000001082

Writer:
perf(resolve): 1.0452879999998004
perf(resolve): 1.0862849999998616
perf(resolve): 1.0990079999996851

There's tons of unoptimized code still in the current PR with many unnecessary allocations, so I hope we'll get even faster. There are also ways to make parsing faster if we'll want to [1].

For now, I'll focus on completeness.

[0] https://github.com/projectfluent/fluent-rs/pull/127
[1] https://github.com/projectfluent/fluent-rs/pull/106

Depends on D54324

Attachment #9074690 - Attachment is obsolete: true

I'm going to work on extending this to support bindgen calls to ICU (for NumberFormat and DateTimeFormat), but majority of the code is in place, and I'd like to get your high-level take on the first patch in queue which adds FluentResource.

FluentResource on the JS side is going to be cached by L10nRegistry, and fed to FluentBundle in later patches. It takes a String retrieved from resource:// protocol and parses it into an AST which is stored in a rental driven struct with the String and its AST as a view into the String.

I'm interested in your feedback on the general implementation, the way it binds into WebIDL and perf/memory optimizations.

Flags: needinfo?(emilio)

So there's a ton of utf-16-to-utf-16 conversions that stands out. It'd be good if this was done in the C++ side and the rust type took a nsACString instead, probably, so it can be fixed by bug 1449861. We should definitely have a bug on file depending on that to fix that up.

Other than that the patch looks generally reasonable. I've left some comments in the first one that apply to the rest as well I think.

I can take profiles and such if you want, but it seems both perf and memory are much better than what we have in tree, so it shouldn't block anything, right?

Flags: needinfo?(emilio)

utf-16-to-utf8, that should be :)

Attachment #9110897 - Attachment description: Bug 1560038 - Add FluentResource and vendor in fluent-rs. → Bug 1560038 - Add FluentResource.
Blocks: 1602808
Attachment #9110930 - Attachment is obsolete: true

I still have another patch to write to use bindgen to pull:

but the rest of the code is functional and I was able to swap the JSM with Rust backend, launch Firefox, install langpack, switch locales and test pseudolocales!

Depends on: 1525869

An update on where we are:

  1. I added C++ -> Rust bindings to handle PLATFORM in line with AppConstants.platform.
  2. I added handling of basic resolver errors to be in line with Fluent.jsm
  3. We pass all intl/l10n/test and dom/l10n/test now!

What's left:

  1. Waiting for bug 1449861 and bug 1560038
  2. I need to get hooks to Intl.DateTimeFormat and Intl.NumberFormat. That will likely requite a new release of fluent-rs - https://github.com/projectfluent/fluent-rs/issues/146

I plan to work on (2) tomorrow. I don't know when (1) will get fixed, but will start poking people about the timeline (emilio and Pike respectively).

Numbers from today's m-c vs patchset:

Fluent.jsm:
parsing: [8.2, 8.4, 10.6], avg: 9.06
resolving: [4.2, 3.4, 4.3], avg: 3.96

fluent-rs:
parsing: [1.2, 1.14, 1.13], avg: 1.15 (-87.31%)
resolving: [2.4, 2.36, 2.57], avg: 2.44 (-38.38%)

Knowing that there are still a lot of improvements we can do with the rust lib (see comment 18, and the UTF8 IDL blocking bug), I expect us to be able to drive those numbers further down.

Here's the procedure I'm using now: https://gist.github.com/zbraniecki/c047476fc7435dbd0d62b04e341d56c5

Emilio: do you want to try to profile parsing/resolving using the IDL backed code?

Flags: needinfo?(emilio)
Attachment #9082464 - Attachment is obsolete: true

Profiles:

JSM parsing: https://perfht.ml/2PuwpzL
Rust parsing: https://perfht.ml/2RWQAYD
Rust resolving: https://perfht.ml/36G0OAU

Memory is similar to comment 11:

JSM: 1.55 mb
Rust: 656 kb (-57.7%)

I'm not sure I have the cycles to do that atm, but if you absolutely need to let me know and I'm happy to give it a shot.

Flags: needinfo?(emilio)

try run: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=281647759&revision=414c0a9bf35bc712d78d7a03d4d75e95ac67e0e8

Remaining items:

  • Intl.Formatters for DATETIME() and NUMBER()
  • browser_misused_characters_in_strings.js (bug 1525869)

Nothing unexpected! Yay!

No longer blocks: 1602808

Progress - I have landed all the required work into https://github.com/projectfluent/fluent-rs master branch, and was able to get NUMBER formatting to work!

Remaining work:

  • Add more options to FluentNumberOptions
  • Add DATETIME
  • Add FluentType for DateTime
  • Make arguments of type Date passed to WebIDL be stored as FluentValue::Custom(DateTime)
Depends on: 1611506

Depends on D57402

Progress:

  • NumberFormat fully functional
  • DATETIME added
  • FluentDateTime added FluentType

Remaining work:

  • Add DateTimeOptions
  • Verify that arguments from devs passed as Date work

Progress:

  • DateTimeOptions completed (in accordance to ECMA402 rev. 4, skipped hour12)

Remaining work:

  • Fix passing date as an argument to formatPattern.

Progress:

It seems like we didn't really have a good story for the FluentDateTime in JS world either, because we collapse it as fluent-args into JSON which cannot store Date.

So, in all cases where we do use DATETIME we actually passed a number, and DATETIME accepts a number as epoch - this already works in Rust.

So, the migration will regress one unused functionality - we don't currently have FluentDateTime, FluentNumber nor FluentDateTime exposed via WebIDL. The good news is that nothing in the platform used it, and when they tried (bug 1465718) they ended up just passing a number [0].

I'd like to fix that, but it'll require some additional WebIDL/C++/Rust code, and some design work to handle non-JSON types in L10nArgs.
We could also try to attack the "store args as JSON in DOM", and implement a real native solution, but that would make Fluent drift away from being able to work as a web library.

If we stick to JSON, I'd imagine we could handle argumens as UTF8String or double or object where the object would be sth like {"type": "datetime", value: 1580112783564} or in the future some other complex type.
Either way, I'll just file a follow up.

Which means, that I believe we're complete in this bug!

I'm going now to clean up the fluent-rs crate and cut out a new release, tidy up the code in the patches we have and ask for review! If all goes well, we may be able to land it early in 75!

[0] https://searchfox.org/mozilla-central/source/devtools/client/application/src/components/service-workers/Worker.js#190

See Also: → 1611754
Depends on: 1612096
Attachment #9116281 - Attachment description: Bug 1560038 - Add bindgen for Fluent. → Bug 1560038 - Add NumberFormat bindgen for Fluent.

I think this patchset is ready for review. I've been using Firefox with it for the whole day and tested different scenarios and seems to be working well.

I'd like to target for landing this early in 75 and unblock further changes (L10nRegistry, Localization.jsm) away from JS.

See Also: → 1613271

Also, I will release fluent-bundle 0.10 before landing this. I'm awaiting a review from Stas on that.

One thing I noticed is that I seem to lost some performance win in the resolver. In comment 32 I noted ~40% win in resolver perf, but I can't reproduce it today.

I tried to remove all features I added since then (mostly formatters for number/date), and it doesn't seem to make a diff.
I tried also to compare fluent-bundle I used back then in criterion benchmark compared to the latest master against the whole about:preferences context and it seems to maintain the perf.

The only other change is the switch from DOMString to UTF8String which was intended to be faster, so I don't think it would cause the perf win loss.

:emilio - would you have time to profile the Resolving portion of https://gist.github.com/zbraniecki/c047476fc7435dbd0d62b04e341d56c5 in idlBacked = false vs idlBacked = true and verify that nothing unexpected shows up in the profile for you?

Flags: needinfo?(emilio)

Well, to be clear, UTF8String is only a perf win if you were converting to UTF-8 somewhere in the middle... So far the changes in DOMLocalization are mostly in the wrong direction (adding rather than removing conversions). Where is UTF8String used to move it to rust? Depending on the hotness of that path it may be worth doing or not.

I probably don't have time to profile myself at least this week, but if you have before / after profiles I'm happy to take a look at them or what not?

Flags: needinfo?(emilio)

Well, to be clear, UTF8String is only a perf win if you were converting to UTF-8 somewhere in the middle... So far the changes in DOMLocalization are mostly in the wrong direction (adding rather than removing conversions). Where is UTF8String used to move it to rust? Depending on the hotness of that path it may be worth doing or not.

The way it works is that you have

JS (id, args) -> C++ -> Rust -> C++ -> JS (value)
DOM (id, args) -> C++ -> Rust -> C++ -> DOM (value)

the id/args and value are all UTF8 Strings in Rust, so I assumed that it makes sense to define them as UTF8String in IDL. Otherwise I have to accept UTF16, convert to UTF8, call fluent-rs, take the result UTF8, convert to UTF16, return.

Am I wrong that using UTF8 on the WebIDL if both input and output are UTF8 is the right call?

Depends on: 1613569

With bug 1525869 in review, us not waiting for bug 1613564 and fluent-rs 0.10.2 released, I think we're nearing completion of this bug.

I updated the patchset today to use 0.10.2, and found a new incompatibility.

FluentBundle has historically been Sync+Send which is something amethyst and handlebars-fluent crates use.

As we completed the feature set to close the gap between fluent-rs and fluent.js, I introduced IntlMemoizer which is helps us manage IntlFormatters instances to be reused, rather than recreated for each bundle+options combination.

In particular, with the fluent-rs 0.10, and IntlMemoizer, I was able to introduce DateTimeFormat and NumberFormat which on the Rust side are treated like a normal IntlFormatter and are memoized by IntlMemoizer, and on the C++ side are using ICU for date and number formatting.

But here's a collision - the way we implemented those DateTimeFormat and NumberFormat FFIs of course makes them not Send+Sync friendly.
In particular, we have:

pub struct DateTimeFormat {
    raw: *mut ffi::RawDateTimeFormatter,
}

impl DateTimeFormat {
    pub fn new(locale: LanguageIdentifier, options: FluentDateTimeOptions) -> Self {
        let loc: nsCString = locale.to_string().into();
        Self {
            raw: unsafe { ffi::FluentBuiltInDateTimeFormatterCreate(&loc, &options.into()) },
        }
    }

    pub fn format(&self, input: usize) -> String {
        unsafe {
            let mut output = nsCString::new();
            ffi::FluentBuiltInDateTimeFormatterFormat(self.raw, input, &mut output);
            output.as_str_unchecked().to_owned()
        }
    }
}

impl Drop for DateTimeFormat {
    fn drop(&mut self) {
        unsafe { ffi::FluentBuiltInDateTimeFormatterDestroy(self.raw) };
    }
}

and an analogous struct for NumberFormat.

When trying to hook that into IntlMemoizer which uses Send+Sync TypeMap, I get:

 0:13.36 error[E0277]: `*mut ffi::RawDateTimeFormatter` cannot be sent between threads safely
 0:13.36    --> intl/l10n/rust/fluent-ffi/src/builtins.rs:380:14
 0:13.37     |
 0:13.37 380 |             .try_get::<DateTimeFormat>((self.options.clone(),))
 0:13.37     |              ^^^^^^^ `*mut ffi::RawDateTimeFormatter` cannot be sent between threads safely
 0:13.37     |
 0:13.37     = help: within `builtins::DateTimeFormat`, the trait `std::marker::Send` is not implemented for `*mut ffi::RawDateTimeFormatter`
 0:13.37     = note: required because it appears within the type `builtins::DateTimeFormat`

Gecko, for now, doesn't need FluentBundle to be multithreaded, so one option would be to introduce two IntlMemoizers - one with Sync+Send and another without, and then make FluentBundle take either.
Another option might be (wild guess) to implement Send+Sync for those Formatters manually and regenerate the instance and the pointer on Send. Since the instances are immutable, Sync hopefully wouldn't be an issue.

There may be other approaches, and I really don't have enough experience to decide how to tackle this.
Summoning Manish and Emilio for advise.

Flags: needinfo?(manishearth)
Flags: needinfo?(emilio)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #50)

With bug 1525869 in review, us not waiting for bug 1613564 and fluent-rs 0.10.2 released, I think we're nearing completion of this bug.

I updated the patchset today to use 0.10.2, and found a new incompatibility.

FluentBundle has historically been Sync+Send which is something amethyst and handlebars-fluent crates use.

Why do they use it / require it?

Another option might be (wild guess) to implement Send+Sync for those Formatters manually

Is the underlying ICU object thread-safe? If so this is just the right thing to do.

and regenerate the instance and the pointer on Send. Since the instances are immutable, Sync hopefully wouldn't be an issue.

What does this mean? I have no idea what you want to do here. Something can either be sent or not. It seems to me that as long as the ICU API is thread-safe it should be Send.

Flags: needinfo?(emilio)

(thread-safe as "movable between threads, callable from multiple threads once constructed", to be clear)

You need to log in before you can comment on or make changes to this bug.