Closed Bug 1719747 Opened 3 years ago Closed 3 years ago

Unify ListFormat in SpiderMonkey

Categories

(Core :: Internationalization, task, P3)

task

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: gregtatum, Assigned: allstars.chh)

References

Details

(Whiteboard: [i18n-unification])

Attachments

(3 files)

https://searchfox.org/mozilla-central/source/js/src/builtin/intl/ListFormat.cpp

This will most likely need a mozilla::intl::ListFormat implementation, as a replacement for UListFormatter.

Assignee: nobody → allstars.chh

FYI, we have a design document from April. I left a few comments, but I plan on updating it for the current state of things: https://docs.google.com/document/d/1Mr16-dEoQxR-nGNB6RQmiZrTBqxCxHiI8n0WJpL7T38/edit#

Move implementations to mozilla::intl::ListFormat

Still working on some tests.

According to the document from comment 1, the implementation is as following:

  1. ListFormat.format:
    SM allocates a 'list' of type ListFormatStringList and a FormatBuffer and pass them to Mozilla::intl::ListFormat, and Mozilla::intl::ListFormat will fill the result into FormatBuffer.

  2. ListFormat.formatToParts
    Like format, SM also allocates a list and the parts of type ListPartVector, and Mozilla::intl::ListFormat will append the parts into ListPartVector.
    The ListPartVector implementation is referenced from NumberPartVector from Mozilla::intl::NumberFormat

Attachment #9235759 - Attachment description: Bug 1719747 - Unify ListFormat in SM → Bug 1719747 - Unify ListFormat in SM.

Still working on some tests.

Ah I missed this while reviewing. Happy to take another looks with more granular tests.

Hi Daniel, Greg and Zibi
For the "Formatting Operation API" in https://docs.google.com/document/d/1Mr16-dEoQxR-nGNB6RQmiZrTBqxCxHiI8n0WJpL7T38
I have some questions regarding to the formatToParts.

For the ListFormat.format(), it's implemented according to the gdoc,
js::intl::ListFormat provides a FormatBuffer, and mozilla::intl::ListFormat fills the buffer with FillBufferWithICUCall API.
But for ListFormat.formatToParts is a little more complicated.

So in ListFormat.formatToParts will return an array of parts according to https://tc39.es/ecma402/#sec-formatlisttoparts

In my patch the ListPart is represented as std::pair<ListPartType, Span<const char16_t>>,
my question is, who takes the ownership of the data in the mozilla::Span in this 'formatToParts' case?

I have some approaches.

  1. As shown in my patch, the string of the part.value is from ICU (the variable 'formattedChars' in ListFormat::formatToParts), then mozilla::intl::ListFormat will pass the Span of the string (from ICU) to js::intl::ListFormat.
    The downside is now we have to make sure the ICU string is valid, so I move ulistfmt_openResult to ListFormat cstor, and ulistfmt_closeResult to ListFormat dtor. Originally it was done in https://searchfox.org/mozilla-central/rev/3a8091d1c29473a0839ad7a5810028f41363fe2e/js/src/builtin/intl/ListFormat.cpp#293-298

  2. js::intl::ListFormat will allocate a Vector<FormatBuffer<char16_t>>, and mozilla::ListFormat will fill those FormatBuffers in the vector.

  3. Add a Callback parameter in the ListFormat::formatToParts.

  4. Add another Listformat::closeParts interface and js::intl::ListFormat will call this interface after Spidermonkey has copied the data from the parts.

What do you think?

Thanks

Flags: needinfo?(zbraniecki)
Flags: needinfo?(gtatum)
Flags: needinfo?(dminor)
  1. The downside is now we have to make sure the ICU string is valid

Yeah I think this is a bit risky. I word prefer avoiding "use after free" type of bugs if possible.

  1. js::intl::ListFormat will allocate a Vector<FormatBuffer<char16_t>>, and mozilla::ListFormat will fill those FormatBuffers in the vector.

This seems like it would work, but at the cost of being a little allocation heavy, or inefficient in memory usage. Tt might be weird to inflate the stack with lots of memory to hold all of it. For short lists you could avoid heap allocations, but I wouldn't want to have the stack size be too large here. The benefit here is that it's harder to use the API incorrectly.

  1. Add a Callback parameter in the ListFormat::formatToParts.

I think a callback parameter may be a reasonable solution, and allow the caller to handle the memory. This seems like a simple and safe implementation. I prefer to have something that is hard to do the wrong thing.

  1. Add another Listformat::closeParts interface and js::intl::ListFormat will call this interface after Spidermonkey has copied the data from the parts

This seems like a way to make 1. safer, but it's still possible to do the wrong thing. You could at least add assertions to check for incorrect behavior.


I'm having a little trouble visualizing all of the flows, but can you do option 3 with a callback, where the lifetime of the original string is guaranteed valid across the callback, and then work with Span<char16_t> that point back to the valid string? Then SpiderMonkey could do the right thing from there with the Spans. Maybe that's what you are saying in option.

Flags: needinfo?(gtatum)
Flags: needinfo?(zbraniecki)

Thanks, will update the patch with method 3.

Flags: needinfo?(dminor)
Attachment #9235759 - Attachment description: Bug 1719747 - Unify ListFormat in SM. → Bug 1719747 - Part 1: Unify ListFormat in SM.

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1719747#c6,
In Part 1, the ownership of the Span in PartsVector is maintained by
ICU. This method adds a callback so js::intl::ListFormat could copy the
content of the Span to Spidermonkey, and mozilla::intl::ListFormat could
use ScopedICUObject to release the string owned by ICU earlier.

Blocks: 1719751
Blocks: 1728181
Whiteboard: [i18n-unification]

Ted, review ping.
Your review has been pending for weeks, if you are busy, can you suggest a different reviewer review it?
Or if Greg and Anba's reviews are enough, then I'll remove your r?.

I have been rebasing to the latest m-c many times to fix the conflict during my leave, now my leave is going to end however the patches are still waiting for your review.

Thanks

Flags: needinfo?(tcampbell)
Attachment #9237120 - Attachment description: Bug 1719747- Part 3: ListFormat::FormatToParts takes a callback. → Bug 1719747 - Part 3: ListFormat::FormatToParts takes a callback.

Stamped. Thanks for the tackling this bug!

Flags: needinfo?(tcampbell)
Pushed by allstars.chh@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/a83961b4491c
Part 1: Unify ListFormat in SM. r=gregtatum,anba,platform-i18n-reviewers,tcampbell
https://hg.mozilla.org/integration/autoland/rev/1aa6a3fcd632
Part 2: Use NewDenseFullyAllocatedArray. r=anba
https://hg.mozilla.org/integration/autoland/rev/0433a2711b0a
Part 3: ListFormat::FormatToParts takes a callback. r=gregtatum,tcampbell
Regressions: 1730524
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: