Pass initial options to content scripts without async messaging (contentScriptOptions equivalent)

NEW
Assigned to
(NeedInfo from)

Status

()

Toolkit
WebExtensions: Untriaged
P2
normal
4 months ago
3 days ago

People

(Reporter: The 8472, Assigned: rpl, NeedInfo)

Tracking

(Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 affected)

Details

(Whiteboard: [design-decision-approved] triaged)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

4 months ago
Scripts running during `document-start` that want to ensure they can apply some user-configrable behavior before any page script runs and before the page is painted cannot wait for asynchronous messaging or storage APIs. They need to be passed some initial set of options which can be changed from the background page or panel. In case there is no guarantee that background pages load before the first tab does this would also have to be persisted across restarts so there's no race between background script initialization and content scripts reading that data.

I think currently the only way to achieve something like that is to use synchronous XHRs and intercepting them with webrequest, which is pretty ugly and not great for load times.

The addon-sdk provides `contentScriptOptions`[1], bootstrapped extensions have many ways to achieve this such as preloading the options in a per-process JSM, reading preferences, using `nsIContentProcessMessageManager.initialProcessData` (which makes this 0-RTT even during startup) or using synchronous messages.

bug 1323433 proposes a superset of this but has been deemed low-prio due to complexity. So one-way API to preload data for content script might be simpler to implement.

[1] https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Content_Scripts#Passing_configuration_options

Updated

4 months ago
Whiteboard: [design-decision-needed] triaged
To be discussed at 2/21 WebExtensions triage meeting. 

Agenda: https://docs.google.com/document/d/1H4sjnRFc87NZXZsM6XIgwWaoV2bdRFepiG28G7WzgPs/edit

Comment 2

3 months ago
This would be hugely useful to NoScript, even if it could be polyfilled on non supporting platforms with an obvious performance and complexity cost.
Blocks: 1214733
Approved; P2. Put on tracking schedule for 57.
Flags: needinfo?(amckay)

Updated

3 months ago
webextensions: --- → ?
Flags: needinfo?(amckay)
Priority: -- → P2
Whiteboard: [design-decision-needed] triaged → [design-decision-approved] triaged
(In reply to The 8472 from comment #0)
> In case there is no guarantee that background pages load before the first tab
> does this would also have to be persisted across restarts so there's no race
> between background script initialization and content scripts reading that data.
There is no such guarantee.  Additionally, content scripts are only loaded after the background page is done, so you are not guaranteed to be able to run your code before the page's scripts anyway, and must code with that in mind (luckily, only an issue for the first tab in practice).

> I think currently the only way to achieve something like that is to use
> synchronous XHRs and intercepting them with webrequest, which is pretty ugly
> and not great for load times.
Even this is not guaranteed to work for the first tab, as all the webRequest.*.addListener methods are async anyway, and could end up missing the requests at startup.

> bootstrapped extensions have many ways to achieve this such as preloading the
> options in a per-process JSM, reading preferences, using `.initialProcessData`
> (which makes this 0-RTT even during startup) or using synchronous messages.
All of these necessarily affect the startup performance (of the content process), and it could be especially bad if we allow unbound extension data, which can for some addons (ad/tracker/xx blockers) measure in megabytes.

Because of this concern, my current plan is to only guarantee sync data availability with some sane limit on data size (something like 4k per extension).  Any extension that needs more than that could check for availability of the config data in sync, with an event as a fallback.  (Again, this would only be an issue for the first tab loaded per process, as we would obviously preload/cache the data for subsequent tabs).

So the API would look something like:

  from content script:
    browser.runtime.configData;   // ready only property 
    browser.runtime.onConfigDataAvailable.addListener(...);

  from background page:
    browser.runtime.setConfigData(value);

Comment 5

2 months ago
(In reply to Tomislav Jovanovic :zombie from comment #4)
> (In reply to The 8472 from comment #0)

> Even this is not guaranteed to work for the first tab, as all the
> webRequest.*.addListener methods are async anyway, and could end up missing
> the requests at startup.
> 
This sounds incredibly bad for any adblocker and, of course, NoScript.
Are you sure of this? If so, this *must* be fixed ASAP.
Flags: needinfo?(tomica)

Comment 6

2 months ago
> Because of this concern, my current plan is to only guarantee sync data
> availability with some sane limit on data size (something like 4k per extension).

Arbitrary limits like this make for seemingly broken features.  For Greasemonkey, I can't guarantee that (the sum of all installed) scripts' source are any particular size.  As a user, I'd prefer a little slowness to features I depend on sometimes working and sometimes not.

Alternately: if there's a circumstance in which I (any extension author) can guarantee a tiny thing works, but not other things, I'm likely to apply a really dirty hack like: notice that I'm in a start-up case and can only run 4k of code, make that small code "block for a while and reload the page" so the document_start event happens again, then be outside of the startup case and able to make my feature work.  Which is obviously worse than just letting me pass more data to begin with.  The user experience will still be that start up takes a long time, but it's likely to be even worse than just letting me pass some data directly.
(Reporter)

Comment 7

2 months ago
> For Greasemonkey, I can't guarantee that (the sum of all installed) scripts' source are any particular size. 

In my opinion scripts or similarly large data should not be stored in such configuration storage as strings. Blobs (if structured clone is allowed) or Blob URIs would be far better. Blob URIs can load their content into the process lazily thus avoiding the bloat and they can use shared memory.

We need a way to eval scripts from URIs synchronously, like loading a <script> tag, but in the sandbox of an addon or in terms of a ES6 Realm[1][2] instead of the content page.
The easiest approach here would be to expose the subscript loader[3] to addons in some way.

I have put some effort into loading things from URIs in greasemonkey and it improved performance and reduced footprint. I would hate to see webextensions undoing those gains.

[1] http://www.ecma-international.org/ecma-262/6.0/#sec-code-realms
[2] https://gist.github.com/dherman/7568885
[3] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/mozIJSSubScriptLoader#loadSubScript()

Comment 8

2 months ago
> This sounds incredibly bad for any adblocker and, of course, NoScript.
> Are you sure of this? If so, this *must* be fixed ASAP.

Agree, for Greasemonkey, this will break features.  (From the user's perspective: sometimes things work, sometimes (first tab?) things don't work or break in confusing ways.)

> In my opinion scripts or similarly large data should not be stored in such configuration storage as strings.

Probably off-topic so please email me directly, but: what option do I have under WebExtension besides string?  AFAICT I only get storage.local, which only stores strings, and I can only talk to content processes with sendMessage() (and alikes) which only send strings?

I'd like to be able to store (in background) a blob, later get (only) its URL and pass that URL (synchronously) to content script(s), but I know of no way to do any of those things.
(In reply to Giorgio Maone [:mao] from comment #5)
> This sounds incredibly bad for any adblocker and, of course, NoScript.
> Are you sure of this? If so, this *must* be fixed ASAP.

From my understanding and from bug 1343940 comment #3, that is the case, at least theoretically.  However, I'm not sure if this is ever an issue in practice.

(Also, from my limited understanding how *blockers work, they already must account for situations like this because of a number of hack techniques that web pages employ).


(In reply to Anthony Lieuallen from comment #6)
> Arbitrary limits like this make for seemingly broken features.  

I agree the number is arbitrary, but any limit would be.  It is not ideal, but if the only alternative is unbound amount of data (from unbound number of extensions), I don't think that would be acceptable when firefox starts using 4-8 processes, considering both memory and performance implications.

> Greasemonkey, I can't guarantee that (the sum of all installed) scripts'
> source are any particular size.  As a user, I'd prefer a little slowness to
> features I depend on sometimes working and sometimes not.

Content scripts already don't guarantee that your code will be executed when you specify, so if you don't account for that, it already works only sometimes.

> I'm likely to apply a really dirty hack [...] Which is obviously worse 
> than just letting me pass more data to begin

Not necessarily.  If I have userscripts enabled for only a few domains (common?), then slowing down *every* process load on the odd chance that the first tab in the process is one of those is not better IMO.
Flags: needinfo?(tomica)

Updated

2 months ago
webextensions: ? → +

Updated

2 months ago
Assignee: nobody → lgreco
Duplicate of this bug: 1352917

Comment 11

2 months ago
4K also isn't nearly enough for my own use-case either. I need to be able to specify each frame's initial cookie, localStorage, indexedDB, etc state before their own script(s) run, and cookies alone can be 4K.

I'm not against a small default limit if the extension could specify a permission allowing for unlimited use (or a very high limit). I should be able to keep the config data clear except in the cases where I absolutely need to pass data to each frame.

However, I am concerned that this would not be a useful API for my case, where each frame needs to be sent specific information (not just each top frame). Would set config data be able to target a specific tabId and frameId? Otherwise, I worry that this approach would race or have locking issues to resolve (since my background script could try to setConfigData for a new frame before the old one manages to read the onConfigData intended for it).

Because of these concerns I'm not sure why it wouldn't be better (at least for my use-case) to simply have an webNavigation.onBeforePageScriptsRun(tabId,frameId) type of event where I can insertScript(frameId), and seed an arbitrary amount of data using window.wrappedJSObject in that script.
Flags: needinfo?(tomica)
Comment hidden (mozreview-request)
(Assignee)

Comment 13

2 months ago
Comment on attachment 8854479 [details]
Bug 1332273 - Implement browser.contentScriptOptions API.

The attached patch contains a prototype of the API mostly as described by :zombie in Comment 4:

- the browser.tabs.setContentScripOptions({...}) method, provided only in the privileged extension pages (e.g. the background page, options ui page and popup pages), allows an extension to broadcast a javascript object (an object can be serialized to string and sent to the child processes) to all the child processes.

- the browser.runtime.contentScriptOptions getter, provided only in the content scripts, allows to a content script to synchronously retrieve the object value that has been set.

The property of the browser.runtime.contentScriptOptions property is not persisted across browser restarts and it is cloned when provided to a content script that retrieve the property value (and so every content script running in a process receives a different copy and the changes applied to the cloned value are not shared between different content scripts)

Once received by a child process, the broadcasted data is stored in a property of the initialProcessData, and so I'm a bit concerned as :zombie about how this transient storage can be used (and abused in some cases) by an extension (e.g. would be reasonable to pre-serialize the object into a string and refusing to broadcast an object bigger than a certain amount of data? or remove all the contentScriptOptions stored in the initialProcessData on an memory-pressure notification? should we turn the contentScriptOptions getter into a lazy getter and then return the value cloned when it is accessed for the first time?).


Let me know what you think about it.
Attachment #8854479 - Flags: feedback?(tomica)
Attachment #8854479 - Flags: feedback?(mixedpuppy)
Attachment #8854479 - Flags: feedback?(kmaglione+bmo)

Comment 14

2 months ago
mozreview-review
Comment on attachment 8854479 [details]
Bug 1332273 - Implement browser.contentScriptOptions API.

https://reviewboard.mozilla.org/r/126418/#review128920

The proposed API is bad for performance, because it allows only one object per extension, shared globally with all content scripts.

In real-world use cases (e.g. rule-based content modifiers such as ad blockers or user script managers), the set of rules can easily grow to multiple megabytes of data. The current API proposal does not allow for future optimizations because of its global nature.

At the very least, extensions should be allowed to assign options to multiple keys. Add-ons can use this to access information in a more efficient way (access a small subset of all options without cloning megabytes of data).

This API is "powerful" because it can significantly affect the performance of Firefox, and therefore it should be locked behind a manifest key or permission. This enables easy static analysis of manifest.json (a highly structured document) to assist in identifying potential performance issues.
At the bottom of https://bugzil.la/1323433, I proposed the `browser.contentScriptOptions` API, which is extensible and auditable by design. If you think that the API looks nice, you can adopt parts of it and the internal implementation will look largely similar to yours (ignore the "matches" and "tabId" parameters in the `browser.contentScriptOptions.set` method for now and the API behaves identical to yours).

::: browser/components/extensions/schemas/tabs.json:910
(Diff revision 1)
>              "parameters": []
>            }
>          ]
>        },
>        {
> +        "name": "setContentScriptOptions",

The `tabs` namespace is not the best place for this API. `runtime` or a new namespace such as `contentScriptOptions` (see my notes preceding the review) sounds like a better candidate.

::: toolkit/components/extensions/ExtensionContent.jsm:123
(Diff revision 1)
> +function getProcessContentScriptOptions(extensionId) {
> +  let optionsMap = Services.cpmm.initialProcessData["Extension:ContentScriptOptions"] || {};
> +  return optionsMap[extensionId];
> +}
> +
> +function clearProcessContentScriptOptions(...args) {

Why are you using `...args` instead of an explicitly named `extensionId` parameter?

And AFAICS, `clearProcessContentScriptOptions` is never called without arguments, so the `args.length` branch is never taken.

::: toolkit/components/extensions/ext-c-runtime.js:102
(Diff revision 1)
>          getURL: function(url) {
>            return extension.baseURI.resolve(url);
>          },
> +
> +        get contentScriptOptions() {
> +          return Cu.cloneInto(extension.contentScriptOptions, context.cloneScope);

`browser.runtime.contentScriptOptions === browser.runtime.contentScriptOptions` is false with this implementation.

When writing documentation, make sure to document that the value of `browser.runtime.contentScriptOptions` should be stored in a local variable before reading properties for performance reasons.


Or change the implementation to be a function, which "feels" heavier than a getter and may deter add-on devs from carelessly killing performance by ignorance.
If you follow my request to have key-value pairs, then you need a method anyway.
See again the sample code at the bottom of https://bugzil.la/1323433.
(Assignee)

Comment 15

2 months ago
(In reply to Rob Wu [:robwu] from comment #14)

Thanks a lot for the very helpful feedback Rob, these are the same concerns that I have on this API (some of them are exactly the one that I mentioned in Comment 13).

The changes to the API  that you suggest sound reasonable to me (and they were definitely in the options that I had in mind), I will proceed with this changes in the meantime.

Updated

2 months ago
Blocks: 1334174
Another thought is to create browser.storage.memory.  storage already has onchanged.  memory would be a key/value store.  already has a well defined api, but maybe could be modified to have more synchronous reads.

But, if you want to go down the other road...

From the top level documentation on browser.runtime and browser.extension, it seems like browser.extension is the more appropriate place for the api:

runtime:

This module provides information about your add-on and the environment it's running in.

vs. extension:

Utilities related to your add-on. Get URLs to resources packages with your add-on, get the Window object for your add-on's pages, ** get the values for various settings **.

I'm also not clear on why we would limit access to the config to content scripts.  I can see potential use for this in other areas.  Going back to comment 4, I would suggest:

  from any script:
    browser.extension.globalData;   // ready only property 
    browser.extension.onGlobalDataAvailable.addListener(...); // called any time data is changed in background

  from background page:
    browser.extension.setGlobalData(value); // maybe key/value pair instead

Comment 17

2 months ago
I apologize if I'm missing something obvious (it's busy on my end lately), but my concern with these APIs is that I'm not 100% sure how frames' content scripts will be able to determine which part of an addon-specific memory store is "intended for them". Will each content script now know its related tabId and frameId, or be able to query for that info synchronously? Or will it perhaps be possible to specify that a given part of the content-store is specific to a particular window, tab, or frame, and the browser will free it when those ids are no longer used?
(Assignee)

Comment 18

2 months ago
(In reply to Shane Caraveo (:mixedpuppy) from comment #16)
> Another thought is to create browser.storage.memory.  storage already has
> onchanged.  memory would be a key/value store.  already has a well defined
> api, but maybe could be modified to have more synchronous reads.

While I was thinking on an initial approach to propose, I was thinking about extend the storage API for it
but then I chose to not use the storage namespace for the following reason:

- all the storage APIs in the storage namespace provide the same API and none of them has any synchronous API methods (and for very good reasons)

- I would prefer to not provide this synchronous API as an additional storage API available to every one of the other extension pages, but only something restricted to the content scripts

also, the following are other details of the proposed approach that I'm thinking of: 

- I would like to limit this API for storing only very small amounts of data

- I would prefer to not provide any real time update mechanisms on it (it would can receive the data asynchronously there is no need to use this API, you can use the storage API or the messaming APIs already provided in every extension page or content scripts)

> 
> But, if you want to go down the other road...
> 
> From the top level documentation on browser.runtime and browser.extension,
> it seems like browser.extension is the more appropriate place for the api:
> 
> runtime:
> 
> This module provides information about your add-on and the environment it's
> running in.
> 
> vs. extension:
> 
> Utilities related to your add-on. Get URLs to resources packages with your
> add-on, get the Window object for your add-on's pages, ** get the values for
> various settings **.

I agree that the 'runtime' namespace wasn't a great pick, in the meantime 
I moved everything related to it in its own 'contentScriptOptions' namespace.
 
> I'm also not clear on why we would limit access to the config to content
> scripts.  I can see potential use for this in other areas.  

The main reason is that I would like to prevent this API from becoming one additional storage available to any
arbitrary extension page, especially given that is expose a synchronous API and the data is stored in memory.

I don't think that most of the other extension pages need this additional synchronous
in-memory storage API, they can use the messaging API or the other storages available,
or in the worst case they can already access the other views using `extension.getViews` 
(https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/extension/getViews) and retrieve
whatever they want from the other extension pages' globals.

> Going back to comment 4, I would suggest:
> 
>   from any script:
>     browser.extension.globalData;   // ready only property 
>     browser.extension.onGlobalDataAvailable.addListener(...); // called any
> time data is changed in background
> 
>   from background page:
>     browser.extension.setGlobalData(value); // maybe key/value pair instead

In my opinion this API doesn't need events:
if an extension can achieve what it needs from an event listener, you don't need this synchronous in-memory
storage, you can use the existent storage API or the messaging APIs.

Does the above reasons sound reasonable to you?
(Assignee)

Comment 19

2 months ago
(In reply to Thomas Wisniewski from comment #17)
> I apologize if I'm missing something obvious (it's busy on my end lately),
> but my concern with these APIs is that I'm not 100% sure how frames' content
> scripts will be able to determine which part of an addon-specific memory
> store is "intended for them". Will each content script now know its related
> tabId and frameId, or be able to query for that info synchronously? Or will
> it perhaps be possible to specify that a given part of the content-store is
> specific to a particular window, tab, or frame, and the browser will free it
> when those ids are no longer used?

This issue is currently focused on exploring how to provide the contentScriptOptions as a small amount of data synchronously available to the content scripts, mostly to "seed" their initial behavior when the document is starting to load (and then try to defer the loading on bigger amounts of data through the usage of the asynchronous data storage and messaging APIs).

Make the API able to set a property on a single tab is technically possible, I would not exclude it completely.

For an API to let the content scripts able to detect their own tabId and frameId, I think that it would be better to file a separate issue.

About the size limitations of this storage, I agree with :zombie that the size of this storage really needs to be limited, even more limited of the storage APIs.
As a API able to move bigger amounts of data between the two processes, e.g. images, bigger text files content (e.g. user scripts) etc. (basically anything like a Blob), I agree with :robwu that a nicer solution could be to provide an API to programmatically make some of the Blob URLs generated from an extension as "web accessible" (currently a blob URL generated from the extension pages will fail to load in a content webpage with a "Source URI is not allowed in this document" security error).

I would like to know more about your use case (e.g. to check if the "web accessible" blob URLs solution described above could fit your use case as well).
Comment hidden (mozreview-request)
(Assignee)

Comment 21

2 months ago
As anticipated in Comment 18, the updated patch contains a new proposal, 
the most important changes are the following:

- all the API is contained in its own namespace (and a permission, at least currently): browser.contentScriptOptions

- the API limits the maximum number of properties that a single extension can set and the maximum length of the string values (which are currently const defined in the API module, but we can also turn them into values read from the about:config preferences if we think that this could be a reasonable approach)

- the API allows the extension pages (e.g. a background page) to set single named properties, without send the entiry configuration object at every change

- the type of the property values can only be number, boolean or string

Comment 22

2 months ago
Thanks, :rpl. What I need is to essentially be able to pass data to a frame's content script from the background script, so that the content script can use it to seed the frame's state BEFORE other scripts in the frame have a chance to run.

As such a blob-based solution *could* work, if the content script can read the data out of it synchronously (using an XMLHttpRequest I suppose). If it has to use an async FileReader, or some other callback that might only trigger after the frame's scripts begin running, then it's a no-go.

Three solutions that I can see working for such a use case are:

1. Allowing to executeScript(document_start, frameId), in an early listener like webNavigation.onCommitted, and have that run as part of the content script for the frame, but before the ones listed in manifest.js for document_start (which should also ensure that it runs before the frame's other scripts). (This was basically what I requested in bug 1352917, which was duped to this bug).

2. Allowing to setFrameSpecificData(frameId) in the background script, and then readFrameSpecificData() from the content script, where the read is synchronous and guaranteed to call back before the other scripts on the page run. Whether that uses blob URLs or some other mechanism is just an implementation detail in that case.

3. Allowing to runtime.sendMessage() from the content script, with a guarantee that the synchronous callback will run before the rest of the frame's scripts (which doesn't happen right now).

Bear in mind that I don't mind helping to implement something for this use-case, if that helps.

Comment 23

2 months ago
mozreview-review
Comment on attachment 8854479 [details]
Bug 1332273 - Implement browser.contentScriptOptions API.

https://reviewboard.mozilla.org/r/126418/#review130366

I wonder whether the limits are useful. To work around the limits, I can imagine an add-on dev splitting their big value over multiple keys and use getPropertyNames / getProperty to retrieve all stored values. This results in extra work for the dev and a degraded performance for add-on users.
How about choosing a generous yet sane limit and enforce it for all data combined?

::: toolkit/components/extensions/ext-c-contentScriptOptions.js:12
(Diff revision 2)
> +    return {
> +      contentScriptOptions: {
> +        getProperty(propertyName) {
> +          if (context.envType !== "content_child") {
> +            throw new Error('This method is only supported in content scripts');
> +          }

Use the "allowedContexts" key in the schema for access control.

Currently there is no key to restrict APIs to content scripts only, so you can add a new one, e.g. by putting the following at the end of [shouldInject][shouldInject] in ExtensionChild.jsm

```
if (this.context.envType !== "content_child" &&
    allowedContexts.includes("content_only")) {
  return false;
}
```

And then add `"allowedContexts": ["content", "content_only"]` to the specific methods  in content_script_options.json. That will make sure that the methods only show up in contexts where these APIs are supported.

The documentation of `shouldInject` is still accurate: https://wiki.mozilla.org/WebExtensions/Implementing_APIs_out-of-process#shouldInject

[shouldInject]: http://searchfox.org/mozilla-central/rev/fcd9f1480d65f1a5df2acda97eb07a7e133f6ed4/toolkit/components/extensions/ExtensionChild.jsm#747-770

::: toolkit/components/extensions/ext-c-contentScriptOptions.js:30
(Diff revision 2)
> +          if (context.envType !== "addon_child") {
> +            throw new Error('This method is not support in content scripts and devtools pages');
> +          }
> +          return context.childManager
> +                        .callParentAsyncFunction("contentScriptOptions.getUsageInfo", []);
> +        }

If `allowedContexts` and `defaultContexts` do not contain "content" in the schema JSON, then they are not enabled in content scripts by default.

If you follow my above suggestion of enforcing access control through the schema, then you can remove getUsageInfo here because the default implementation in the child is to delegate the call to the parent.

::: toolkit/components/extensions/ext-contentScriptOptions.js:26
(Diff revision 2)
> +      usageInfo = contentScriptOptionsUsage.get(extension);
> +    }
> +
> +    return {
> +      contentScriptOptions: {
> +        setProperty(propertyName, propertyValue) {

You should also enforce a maximum length on the key name. Otherwise one can encode a large chunk of data in the key and use getPropertyNames to obtain the values.

If the maximum length is static, you can use the maxLength key in the JSON schema, otherwise (e.g. if you want to allow it to be configured through preferences later), put the check here.
(Assignee)

Comment 24

2 months ago
Thanks Rob for the new round of feedback, I agree that the limits to apply and how it would be better to enforce them is one of the aspects that need more evaluations, besides that I'm still not sure that I like the general approach that I'm proposing
(in particular I do not like that it basically provides one more storage API, I would  prefer to just provide something stored in one of the existent storage APIs).

An alternative approach that I'm thinking of is the following:

- provide an API method that allow a privileged extension page (e.g. the background page) to configure which browser.storage.local/browser.storage.sync keys that are going to be preloaded when the content scripts declared in the manifest are injected

- an optional match parameter can restrict when the preloading should happen (e.g. restrict it to a particular content script / url pattern / tabId / topframe / subframe etc.)

- an additional parameter of tabs.executeScript can configure the keys to preload for dinamically injected content scripts

- pause the document loading while we are preloading the configured keys from the extension storages, using the same approach that we are already using here: http://searchfox.org/mozilla-central/rev/fcd9f1480d65f1a5df2acda97eb07a7e133f6ed4/toolkit/components/extensions/ExtensionContent.jsm#405
(Assignee)

Comment 25

a month ago
Today I spent some additional time on this issue and I'm really not convinced that a synchronous storage is a good solution for the described needs and, especially based on Comment 22, this contentScriptOptions seems a lot more a workaround for another missing feature ("not being currently able to dinamically register content scripts"), than a feature on its own.

What I'm wondering is:

If, on the contrary, we make it possible to programmatically register new content scripts from the background page, and allowing the definition on this new content scripts also from extension Blob URLs (which doesn't need to be "blessed" because they are going only be loaded as extension content scripts instead of being injected into the page), this content scripts will be able to run at document_start (as a content script registered from the manifest) and load everything else asynchronously (e.g. there is probably no really need to load other kind of assets, like images, to inject them at document_start, they can be loaded asynchronously by the registered content script).

I also see that "being able to programmatically register new content script" is a feature that has been already requested more than once (e.g. Bug 1305856 and Bug 1323433) and Kris has shown to be in favor of allowing content script to be registered programmatically (https://bugzilla.mozilla.org/show_bug.cgi?id=1305856#c4).

Given that, as I mentioned in my previous comment, I really don't like the approach that I proposed in the attached patch and that any synchronous storage API would have mostly the same issue (e.g. which is the reasonable amount of data that we can preload? what happens when an extension tries to preload too much data? what we should do in case of "memory pressure"? etc.), I'm going to close the current mozreview request and open a new one with the new proposal:

a prototype of extending the tabs API to support "programmatically registered content scripts" and a related small test case (an xpcshell test case) for an initial feedback.
(Assignee)

Updated

a month ago
Attachment #8854479 - Attachment is obsolete: true
(Reporter)

Comment 26

a month ago
(In reply to Luca Greco [:rpl] from comment #25)
> Today I spent some additional time on this issue and I'm really not
> convinced that a synchronous storage is a good solution for the described
> needs and, especially based on Comment 22, this contentScriptOptions seems a
> lot more a workaround for another missing feature ("not being currently able
> to dinamically register content scripts"), than a feature on its own.

Mostly. I stated as much in comment 0. But on the other hand if dynamically loading content scripts is the *only* feature available then loading large hunks of data that they want to use synchronously as jsonp might also be inefficient. E.g. an adblocker that wants to load a selector-based ruleset or a localstorage shim for sites that break when cookies are blocked.

Ideally, to keep memory footprint and script execution time down, such things should exist in as immutable data in a shared compartment that can be read by multiple content script.

But that's mostly a performance optimization. If I had to choose then the ability to register content scripts at runtime would be my preferred option.

> also from extension Blob URLs

This on the other hand would be somewhat limiting. Filesystem URLs are preferred since it allows the user to develop a script, e.g. via an IDE and have the userscript pick up  the current copy whenever it is changed. But I guess filesystem access is orthogonal to the ability to register scripts.
Comment hidden (mozreview-request)
(Assignee)

Comment 28

a month ago
(In reply to The 8472 from comment #26)
> Mostly. I stated as much in comment 0. But on the other hand if dynamically
> loading content scripts is the *only* feature available then loading large
> hunks of data that they want to use synchronously as jsonp might also be
> inefficient. E.g. an adblocker that wants to load a selector-based ruleset
> or a localstorage shim for sites that break when cookies are blocked.

Yeah, I agree that it would not cover every scenario, what I hope is that "being able to create content scripts programmatically" will help to greatly reduce both the scenarios that needs such preload and also the amount of the data and then, if such API is still needed, we can more likely be able to set reasonably low storage limits to it.

Besides that, the adblocker scenario described above should be possible with the proposed patch using something like the following snippet:

     async function updateAdBlockingContentScript(oldContentScript) {
       // Remove the previous content script version
       await browser.tabs.unregisterContentScript(oldContentScript.scriptId);
       URL.revokeObjectURL(oldContentScript.ruleSetURL);
       
       // Load the updated rule set
       const ruleSet = await browser.storage.local.get("adblock_ruleset");

       // Create a new blob URL which provides the updated ruleset
       const ruleSetBlob = new Blob([`const ruleSet = ${JSON.stringify(ruleSet)};`], {type: "text/javascript"});
       const ruleSetURL = URL.createObjectURL(ruleSetBlob);

       // Register the update content script composed of the updated rule set and the common (static) content script.
       const contentScriptId = await browser.tabs.registerContentScript({
         js: [ruleSetURL, "/adblock_content_script.js"], 
         matches: ["<all_urls>"],
         run_at: "document_start",
       });

       return {ruleSetBlob, ruleSetURL, contentScriptId};
     }

 
> This on the other hand would be somewhat limiting. Filesystem URLs are
> preferred since it allows the user to develop a script, e.g. via an IDE and
> have the userscript pick up  the current copy whenever it is changed. But I
> guess filesystem access is orthogonal to the ability to register scripts.

Being able to register a content script using a blob URL should make this possible:

- user scripts edited in an external editor can be provided to the extension using
  an input element or drag and drop
- editing user scripts in an editor embedded in the extension itself can be provided
  by storing the blobs into indexedDB (by readind the Blob content and saving new Blobs, or by using a IDBMutableFile)
(Assignee)

Comment 29

a month ago
Comment on attachment 8862577 [details]
Bug 1332273 - Support programmatically registered content scripts.

Hi Rob, Hi Shane,
like described in Comment 25 this patch do not provide a contentScripOptions API but an API to register and unregister content scripts programmatically, the javascript and css assets associated to the dinamically registered content scripts can be a relative extension URL (like the content scripts defined in the manifest) or a blob URL created by the same extension.

The two url types can also be mixed in the same registered content script, which should also allow, as described in Comment 28, a way to preload some data into a content script.

I'd really like to get your initial feedback on the proposed approach (and its rationale).

I would be still open to allow some form of data preloading to it (e.g. by specify an additional property to the browser.tabs.registerContentScript and the manifest content_script options, where an extension can specify a set of browser.storage.local keys to preload before the script is injected)
Attachment #8862577 - Flags: feedback?(rob)
Attachment #8862577 - Flags: feedback?(mixedpuppy)

Comment 30

a month ago
mozreview-review
Comment on attachment 8862577 [details]
Bug 1332273 - Support programmatically registered content scripts.

https://reviewboard.mozilla.org/r/134442/#review137634

This approach looks great! It is much closer to the actual use case than the previous solution, and it can also be used to implement sync storage if an add-on author really wants to. Furthermore, it has more potential for being efficient in Firefox's resource usage.

`tabs` is an odd namespace for this. How about a new namespace, `declarativeContentScript`?

::: browser/components/extensions/ext-tabs.js:515
(Diff revision 1)
>            let tab = await promiseTabWhenReady(tabId);
>  
>            return tab.sendMessage(context, "Extension:DetectLanguage");
>          },
>  
> +        async registerContentScript(options) {

You should also check whether the add-on has the permission to register a script for the given URLs.

::: toolkit/components/extensions/Schemas.jsm:846
(Diff revision 1)
> +  strictExtensionBlobUrl(string, context) {
> +    // Do not accept a string which is not a blob URL with the origin of the current extension.
> +    let blobURL = new URL(string);
> +
> +    if (blobURL.protocol == "blob:") {
> +      if (!context.checkLoadURL(blobURL.origin)) {

I think that this check is too permissive, since it allows remote URLs to declaratively be loaded.

Based on
http://searchfox.org/mozilla-central/rev/068e6f292503df13515a52321fc3844e237bf6a9/toolkit/components/extensions/Schemas.jsm#303

I ran the following snippet in the global JS console

```
Services.scriptSecurityManager.checkLoadURIStrWithPrincipal(Services.scriptSecurityManager.createCodebasePrincipal(NetUtil.newURI('moz-extension://aaa/'),{}), 'http://example.com/', Services.scriptSecurityManager.DISALLOW_INHERIT_PRINCIPAL)
```

This snippet did not throw, so that suggests that `checkLoadURL` would return true and allow remote URLs.

Try checking whether the inner URL of the blob URL matches the extension's origin.

::: toolkit/components/extensions/extension-process-script.js:551
(Diff revision 1)
>      this.instanceId = data.instanceId;
>      this.manifest = data.manifest;
>  
>      this.scripts = data.content_scripts.map(scriptData => new ScriptMatcher(this, scriptData));
>  
> +    // A map of dinamically registered content scripts

s/dinamically/dynamically/

::: toolkit/components/extensions/test/xpcshell/test_ext_contentscript_registering.js:59
(Diff revision 1)
> +
> +    let blobScript = new Blob([`(${userScript})()`], {type: "text/javascript"});
> +    let blobScriptURL = URL.createObjectURL(blobScript);
> +
> +    const blobScriptId = await browser.tabs.registerContentScript({
> +      js: [blobScriptURL],

Please add a test for "css" too.

Updated

a month ago
Attachment #8862577 - Flags: feedback?(rob) → feedback+

Comment 31

a month ago
One thing more: For user script managers, the add-on should be allowed the possibility of running the script in separate sandboxes (or even directly in the page, a la @grant none in Greasemonkey).
Is this possible or considered out of scope?

See also bug 1353468, where I am requesting such a feature.
Flags: needinfo?(lgreco)
(Reporter)

Comment 32

a month ago
I'm surprised this works. Last time I tried to implement a similar solution in greasemonkey the subscript loader didn't accept a blob URI. There even is bug 1221148 for this. Has something changed?

(In reply to Rob Wu [:robwu] from comment #31)
> One thing more: For user script managers, the add-on should be allowed the
> possibility of running the script in separate sandboxes (or even directly in
> the page, a la @grant none in Greasemonkey).
> Is this possible or considered out of scope?

|@grant none| in GM does not actually execute things in the page context, it still uses a sandbox, but with xrays turned off. That has the effect that the user script still gets a set of pristine primordial objects that haven't their prototypes manipuldated by the page content and it also does not pollute the page global.

Comment 33

29 days ago
Re comment #32

And keeps scripts isolated from each other.  They each get a `const GM_info = {...};` with different contents.  Today if I `tabs.executeScript()` that, only the first works and the second breaks (redeclared constant).  Distinct configurable (possibly xray-off) per-content-script sandboxes are desirable.
Flags: needinfo?(tomica)
You need to log in before you can comment on or make changes to this bug.