Open Bug 1879712 Opened 2 years ago Updated 4 months ago

menus API: selectionText is truncated to 16K characters

Categories

(WebExtensions :: Frontend, enhancement, P5)

enhancement

Tracking

(Not tracked)

People

(Reporter: tdulcet, Unassigned, NeedInfo)

References

Details

(Keywords: parity-chrome)

Per Bug 1338898 Comment 43, as of Firefox 56, the selectionText property of the OnClickData object is truncated to 16K characters, which is undocumented. Please consider reevaluating this decision and eliminating the truncation, as extensions should be able to easily access the full selection text without needing to resort to using a content script. Chromium does not truncate the selected text and there are no noticeable performance issues.

Keywords: parity-chrome
See Also: → 1338898

The current logic is at https://searchfox.org/mozilla-central/rev/9013524d23da6523a7ec4479b5682407a1323f6c/toolkit/modules/SelectionUtils.sys.mjs#135-137

If we were to raise the count, what would you consider to be a reasonable cut-off point?

To balance between risk and functionality, we could consider making this threshold configurable by preference. Patches are welcome.

Severity: -- → N/A
Type: defect → enhancement
Priority: -- → P5

Chrome does not seem to have any length limits, at least not that I could hit. If Firefox must have a limit, I would consider 1 Mi (2^20) of UTF-16 characters to be reasonable. In this case, it would be helpful for Firefox to inform the add-on that the selection text was truncated (add a new property to the OnClickData object) so that they have the opportunity to load a content script if necessary, especially if the limit is going to be configurable.

I have an add-on that needs to work with long URLs, but in the majority of cases (at least until Bug 1317166 is fixed), they should be less than 1 Mi of characters.

If we were to raise the count, what would you consider to be a reasonable cut-off point?

To balance between risk and functionality, we could consider making this threshold configurable by preference. Patches are welcome.

I had brought up the issue many years ago (also in the referenced bug). For large texts, it creates a lot of extra processing e.g.

  • On menu click check if text was truncated info.selectionText.length === 16384
  • Insert a script into the page to get the text from window.getSelection()
    • Complications with page CSP preventing script injection
    • Complications when there are frames involved
    • Requires "<all_urls>" permission (annoys users) instead of the usual "activeTab"
    • Script has to be injected into multiple frames
    • Can create errors in frames with no selection
    • Can run into errors with dynamically created content, or shadow DOM
    • Can run into problem if JavaScript on the page changes the active element & de-selects the text

The benefit of smaller temporary selectionText value is limited comparing to the extra processing.

As a suggestion:

  • Remove the limit completely
    • Does Chrome also truncate selectionText ?!
  • Add some sort of no-truncate/full-text flag to menus.create() so that developers can apply it to the menus that are likely to need the full text

I have tested it and Chrome does not truncate selectionText.
Therefore, for cross-browser compatibility reason alone, it is worth removing the 16k limit.

The truncation was introduced to reduce the overheads. However, in practice, it not only does not reduce the overheads but also increases it by many times.

Setting a limit to truncate text

  • if text length is less than the limit, the truncation servers no purpose
  • if text length is over the limit, getting the full test is passed to the extension (and from extension back to the browser) and the overheads increase by many times

Under all circumstances, there is no benefit in the truncation.
Truncation means browser already had the full text, but decided not to pass the full text to the extension, thus forcing the extension to get the full text "again" on its own.

Subsequent to further discussions, if a limit has to be set, a limit of either 48k or 64k should cover all but some edge cases.

The current limit of 16384 characters is hardcoded at https://searchfox.org/mozilla-central/rev/9c6e9ca72d4a59267d985fb35ba259fb937c8503/toolkit/modules/SelectionUtils.sys.mjs#135-137 . While I'm not opposed to a higher limit, it would be very useful if there is data (examples / use cases) supporting the need for a higher limit, as well as performance profiles (e.g. with https://profiler.firefox.com) demonstrating the lack of significant perf impact. A minor point in favor of a higher limit is that the context menu can already hold data that already has much more data than just 16k (e.g. URLs).

To get an idea what large text selections could look like, I looked up Wikipedia: List of chiropterans (ranked #1 at Wikipedia: Long pages), which consists of about 345k characters (counted by running document.body.innerText.length from the console). A smaller example is the Selection API specification, which currently sits just below 30k characters. How realistic is it for a user to knowingly select a specific part of the document consisting of so much text, with the expectation that an extension would read exactly that selection?

Although initially provided for use with add-ons, it has since been used by other internal components, so I'll ask the other teams for their preferences on supported maximum lengths.

needinfo :nordzilla
Erik do you have any concerns with increasing the maximum size of the text selection, which is used for translations of text selections, at:
https://searchfox.org/mozilla-central/rev/9c6e9ca72d4a59267d985fb35ba259fb937c8503/browser/base/content/nsContextMenu.sys.mjs#2736

needinfo :Mardak
Ed, do you have any concerns with increasing the maximum size of the text selection, which is used at:
https://searchfox.org/mozilla-central/rev/9c6e9ca72d4a59267d985fb35ba259fb937c8503/browser/components/genai/GenAI.sys.mjs#649

Flags: needinfo?(enordin)
Flags: needinfo?(edilee)

(In reply to Rob Wu [:robwu] from comment #7)

To get an idea what large text selections could look like, I looked up Wikipedia: List of chiropterans (ranked #1 at Wikipedia: Long pages), which consists of about 345k characters (counted by running document.body.innerText.length from the console). A smaller example is the Selection API specification, which currently sits just below 30k characters. How realistic is it for a user to knowingly select a specific part of the document consisting of so much text, with the expectation that an extension would read exactly that selection?

Just a point of clarification here, .length on JavaScript string will return the number of UTF-16 code units, which may not align 1:1 with characters. Though at the magnitude of the string lengths that we are discussing, that distinction isn't really important.


Insights into Translating Selections

(In reply to Rob Wu [:robwu] from comment #7)

needinfo :nordzilla
Erik do you have any concerns with increasing the maximum size of the text selection, which is used for translations of text selections, at:
https://searchfox.org/mozilla-central/rev/9c6e9ca72d4a59267d985fb35ba259fb937c8503/browser/base/content/nsContextMenu.sys.mjs#2736

I had a look at the actual Translations data to make an informed decision here:

99% of all translated selections are 5,000 code units or less. Though, 0.25% of that top 1% is greater than or equal to 16,375 code units, which signals to me that there is some interest in translating large selections (either that or people are just trying it for fun to see what happens).

Based on the data, I wouldn't even mind limiting the selection for Translations to 5,000 code units, though I will have to run that by my team.


Outlook on Translating Selections

I'm certainly not against increasing the limit for the context menu itself, though I would caution against increasing the size of what is sent to the SelectTranslationsPanel in its current state.

If we do increase the size, I would like to add logic to keep the translations limit either the same, or I will bring up the idea in our team meeting tomorrow about reducing the limit to 5,000 code units.

If someone is trying to translate an entire Wikipedia page, for example, then they should be using Full-Page Translations, which selectively parses the DOM and translates DOM nodes incrementally.

The design and intent of the SelectTranslationsPanel is to translate smaller snippets of text on the page, and it sends the whole string to be translated in one go. I imagine that even the current limit of 16k could make lower-end machines take an unreasonably long time to compute, however we were fine with that limit for the initial release of the feature.

We do have plans to look into another Translations UI surface that may be able to more easily handle larger snippets of selected text, possibly with the configuration option to open translated selections from the context menu there as well.

Flags: needinfo?(enordin)
See Also: → 1995281
You need to log in before you can comment on or make changes to this bug.