Closed Bug 1385832 Opened 8 years ago Closed 7 months ago

getBytesInUse() is not supported in `local` and `managed` storage areas

Categories

(WebExtensions :: Compatibility, defect, P3)

54 Branch
defect

Tracking

(firefox144 fixed)

RESOLVED FIXED
144 Branch
Tracking Status
firefox144 --- fixed

People

(Reporter: u580826, Assigned: nate)

References

Details

(Keywords: dev-doc-complete, Whiteboard: [storage])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0 Build ID: 20170628075643 Steps to reproduce: I used uBlock Origin 1.13.9.0 (dev channel) and noticed that "Storage used" in "Dashboard" of uBlock Origin shows "? bytes". I opened an issue and gorhill (author of the addon) explained that it's because Firefox doesn't support browser.storage.local.getBytesInUse(). The issue's url is here: https://github.com/gorhill/uBlock/issues/2812 Actual results: uBlock Origin 1.13.9.0 (dev channel, webext-hybrid) doesn't show storage usage. Version 1.13.8 (stable channel, legacy addon) shows correctly. Expected results: Both addon of webext-hybrid and of legacy shows correct value. I would appreciate if you could implement the API.
Component: Untriaged → WebExtensions: Compatibility
Product: Firefox → Toolkit
Priority: -- → P3
Whiteboard: [storage]
In case this matters, uBlock Origin only uses the getBytesInUse(null, ...) version of the API, it is just interested in reporting the storage used as a whole.
Product: Toolkit → WebExtensions
The documentation is confusing on this topic. Looking at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/local gives one the impression that getBytesInUse() is supported since it's listed as one of the methods and the browser compatibility table below shows Firefox support since v45. However, if you click on this method and go to https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/StorageArea/getBytesInUse you will see that Firefox doesn't support this method. I wish the Mozilla team would give a clear indication that this method is not supported from the parent page (https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/local). Any plans to add support for this method? It's preventing me from porting over my Chrome extension also.
For now you can accomplish the same thing by doing this: browser.storage.local.get(function(items) { console.log(JSON.stringify(items).length); });

I checked in firefox 65.0.1. browser.storage.local.getBytesInUse is still undefined.

Since the implementation is so trivial, maybe it's worth marking this as Good First Bug?

One question though (for your Tom), your implementation seems to be giving different numbers in Chrome:

browser.storage.sync.get().then(function(items) { console.log(JSON.stringify(items).length); });
2915
await browser.storage.sync.getBytesInUse()
2961

Any ideas why?

For the record, as part of bug 1634615 etc, browser.storage.sync does have this API. I'm not going to close this because IIUC, browser.storage.local still does not.

See also bug 1637166 comment #7 and https://wiki.developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/StorageArea/getBytesInUse

getBytesInUse
Firefox
Full support 78
Notes
Only supported by the sync storage area.

Can confirm that this is still an issue. I'm running into this for my extension Stock Inspector, which caches data to reduce user's network requests and data usage. I'd love to use this feature to show users the amount of cache storage used like ublock.

@Ray I've been studying this deeply and have a working formula now :), that you can use as workaround:

// Docs: "The maximum amount (in bytes) of data that can be stored in local storage, as measured by the JSON stringification of every value plus every key's length."
Object.entries(await browser.storage.sync.get()).map(([key, value]) => key.length + JSON.stringify(value).length).reduce((acc, x) => acc + x, 0)
// will give you same result as:
await browser.storage.sync.getBytesInUse()

But it will work also on "local" storage where the docs says the same thing.

Know that this will work correctly ONLY in Firefox. Chromium has bug where "<" symbol is stored escaped and takes more space out of your quota than it should:
https://bugs.chromium.org/p/chromium/issues/detail?id=1115239

(In reply to juraj.masiar from comment #9)

Object.entries(await browser.storage.sync.get()).map(([key, value]) => key.length + JSON.stringify(value).length).reduce((acc, x) => acc + x, 0)

As Javascript stores strings as UTF-16, and string.length property returns the number UTF-16 code units[1], wouldn't you have to multiply the value by 2 to get the number of bytes used?

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#Description

(In reply to juraj.masiar from comment #9)

// Docs: "The maximum amount (in bytes) of data that can be stored in local storage, as measured by the JSON stringification of every value plus every key's length."

juraj, can you tell me where you found that quote? I do not find it in the docs I am viewing[1]

[1]https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/StorageArea/getBytesInUse

Does StorageArea.getBytesInUse use backend computation which is more efficient than the polyfill suggested by juraj.masiar? Or is it just a convenience method that is actually doing the same thing?

Flags: needinfo?(markh)

(In reply to JulianHofstadter from comment #10)

(In reply to juraj.masiar from comment #9)

Object.entries(await browser.storage.sync.get()).map(([key, value]) => key.length + JSON.stringify(value).length).reduce((acc, x) => acc + x, 0)

As Javascript stores strings as UTF-16, and string.length property returns the number UTF-16 code units[1], wouldn't you have to multiply the value by 2 to get the number of bytes used?

The data is stored to disk as JSON.

(In reply to JulianHofstadter from comment #11)

juraj, can you tell me where you found that quote? I do not find it in the docs I am viewing[1]

[1]https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/StorageArea/getBytesInUse

See the quota descriptions at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/storage/sync.

(In reply to JulianHofstadter from comment #12)

Does StorageArea.getBytesInUse use backend computation which is more efficient than the polyfill suggested by juraj.masiar? Or is it just a convenience method that is actually doing the same thing?

That's exactly what it does, because that's what chrome's spec says to do. We can't look at the bytes stored in the database column because that wouldn't conform to the spec. The implementation is at https://searchfox.org/mozilla-central/source/third_party/rust/webext-storage/src/api.rs#333

Flags: needinfo?(markh)

(In reply to Mark Hammond [:markh] [:mhammond] from comment #13)

The data is stored to disk as JSON.

As the JSON data can include strings, and in some cases the value passed to storage is only a string, I'm not clear how you would always end up with 1 byte per code unit, but I'll just have to take your word for it that taking the length property of the JSON.stringified object will give the total number of bytes.

Sorry, I missed the point - I think you are correct that the polyfill doesn't take this into account - while utf-8 is stored, that can be > 1 byte. So it probably should be utf-8 encoding after stringifying.

(In reply to Mark Hammond [:markh] [:mhammond] from comment #15)

while utf-8 is stored, that can be > 1 byte. So it probably should be utf-8 encoding after stringifying.

If I understand correctly, JS only deals in UTF-16 as far as strings go, and will convert UTF-8 to UTF-16. So it seems to me the strings you would be dealing with even after stringifying would be UTF-16, as far as length property goes.

Again, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#Description

A non-ascii character in a JS string will have a length of 1. If you utf-8 encode that you will get an array of (typically) 2 - which is how many bytes the character consumes on disk. JS uses utf-16 in memory, but that's not relevant here.

"\u00DF".length
1
(new TextEncoder("utf8")).encode("\u00DF").length
2

(In reply to Mark Hammond [:markh] [:mhammond] from comment #17)

A non-ascii character in a JS string will have a length of 1. If you utf-8 encode that you will get an array of (typically) 2 - which is how many bytes the character consumes on disk. JS uses utf-16 in memory, but that's not relevant here.

"\u00DF".length
1
(new TextEncoder("utf8")).encode("\u00DF").length
2

I see. So then back to the polyfill, in order to be accurate, would you need to do something like:

(new TextEncoder("utf8")).encode(key + JSON.stringify(value)).length

yes, that looks correct.

Thanks @JulianHofstadter for fixing my bug! I guess my "deep" study wasn't deep enough.
So something like this should yield correct results:

new TextEncoder().encode(
  Object.entries(await browser.storage.sync.get())
    .map(([key, value]) => key + JSON.stringify(value))
    .join('')
).length

Note that TextEncoder doesn't take encoding parameter anymore and support only UTF-8:
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder

Severity: normal → S3

7 years later and browser.storage.local.getBytesInUse() is still not implemented. Come on devs... pretty please?

See Also: → 1908925
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: getBytesInUse() is not supported in WebExtentions API → getBytesInUse() is not supported in `local` and `managed` storage areas

Since I feel like I'm on a roll working on the Storage API for WebExtensions I'll take this up.

Rob, would you be the appropriate contact for this bug as well? And would you mind assigning it to me (unless you think I shouldn't work on it for some reason)?

I've got a patch working for local and managed, but in developing it I came across a strange discrepancy in the existing getBytesInUse() methods: for the same values in storage, it looks like session storage returns a much higher number of bytes. I made a little temporary extension that just sets some values in storage of each area and then runs a few different getBytesInUse() and logs the results. I pasted the results into a gist: https://gist.github.com/GrossNate/4768d102f429d9ab64c06d3851aca938. The methods I used for calculating local and managed match sync exactly. I'll dig into this a little more tomorrow and then let you know if I have questions, but just raising this now in case it rings a bell and you just know that the values should be very different right off the top of your head.

Also I wrote some tests, but they're not 100% working yet. It seems like there's an opportunity to clean up and streamline things because some of the sections of tests were written specifically for storage areas that don't support getBytesInUse() and now that all areas will I think they can be eliminated. I'm still wrapping my head around the structure of testing among the different storage areas and then I'll come up with something. However, if there's a reason I shouldn't do that cleanup (or should put it off to a different bug to keep this revision size more manageable) please let me know.

Thanks!

Flags: needinfo?(rob)

(In reply to Nate Gross from comment #24)

Rob, would you be the appropriate contact for this bug as well?

Yes. An alternative could be to add r=#extension-reviewers , which tags the whole group of reviewers (of which I am part of).

And would you mind assigning it to me (unless you think I shouldn't work on it for some reason)?

I think that a bot will do that automatically when the patch is not in WIP state. I've nevertheless set the assignee since it is obvious that you're working on it.

I've got a patch working for local and managed, but in developing it I came across a strange discrepancy in the existing getBytesInUse() methods: for the same values in storage, it looks like session storage returns a much higher number of bytes. I made a little temporary extension that just sets some values in storage of each area and then runs a few different getBytesInUse() and logs the results. I pasted the results into a gist: https://gist.github.com/GrossNate/4768d102f429d9ab64c06d3851aca938. The methods I used for calculating local and managed match sync exactly. I'll dig into this a little more tomorrow and then let you know if I have questions, but just raising this now in case it rings a bell and you just know that the values should be very different right off the top of your head.

The implementation of quota for sync storage is simply the length of the values: https://searchfox.org/mozilla-central/rev/4fd0d5e4669bfa2d0888b730684d8adea061fd30/third_party/rust/webext-storage/src/api.rs#185-187

whereas the size in storage.session is the memory usage of the data: https://searchfox.org/mozilla-central/rev/4fd0d5e4669bfa2d0888b730684d8adea061fd30/toolkit/components/extensions/ExtensionStorage.sys.mjs#514-518

The mechanisms to calculate quota are somewhat arbitrary. In case of session storage we (as browser vendors) had a discussion, some of which is captured at https://github.com/w3c/webextensions/issues/350. The intention there was to choose a limit so that extensions had enough, while still being a realistic measure of the memory usage.

Also I wrote some tests, but they're not 100% working yet. It seems like there's an opportunity to clean up and streamline things because some of the sections of tests were written specifically for storage areas that don't support getBytesInUse() and now that all areas will I think they can be eliminated. I'm still wrapping my head around the structure of testing among the different storage areas and then I'll come up with something. However, if there's a reason I shouldn't do that cleanup (or should put it off to a different bug to keep this revision size more manageable) please let me know.

Based on your description, I would create a patch stack of two patches associated with this bug (or two separate bugs, it doesn't really matter): one with the functional changes, and one that is purely a refactor.

Assignee: nobody → nate
Status: NEW → ASSIGNED
Flags: needinfo?(rob)
Attachment #9504425 - Attachment description: WIP: Bug 1385832 - getBytesInUse() is not supported in local and managed storage areas → Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas. r=#extension-reviewers

I decided to keep this focused on adding the functionality and related tests. Refactoring testing for storage is a big enough task that I think it should be handled as a separate bug.

Following your patch and my observation of the special size estimation logic, I was wondering where it came from, and I think it is comment 20 here. Before I saw that comment I also thought of the storage.sync.getBytesInUse implementation (get_quota_size_of), and/or Chrome documentation, but these did not match what your patch proposed.

I should have mentioned earlier what the purpose of getBytesInUse is, beyond "The mechanisms to calculate quota are somewhat arbitrary.", because the patch proposed a reasonable implementation, but different from the existing mechanisms and of limited practical use. It's not really a matter of counting the number of bytes that could be utilized for some reasonable interpretation, but specifically the number of bytes that count toward the quota enforcement, relevant if the extension wanted to consider writing to the storage.

This bug mentions the lack of getBytesInUse for storage.managed and storage.local.

I'll start with managed since that is easier:

  • There is no quota enforcement for storage.managed. This area is read-only, so the concept of quota is not applicable here. Chrome's implementation simply returns 0 (I tested this manually and saw it in Chromium's source code). I like this simplicity, but I am also willing to accept a reasonable interpretation that has a low maintenance cost.

On storage.local (whose quota characteristics are also documented on MDN), there are two implementations in Firefox:

  • The "JSONFile" implementation, where the data is serialized and stored in a JSON file on disk. This implementation is old and not used by default (since Firefox 66, 7 years ago, in bug 1488825). I've seen some users preferring this implementation because of the perception of having human-readable files in the profile directory (just noting for context, this is not a scenario we should optimize for).
    • In theory, estimating the quota used by a JSON file is straightforward since the relation between the input and the file on disk is obvious. Comment 20 describes such a way.
  • The IndexedDB implementation in ExtensionStorageIDB. The quota is enforced by IndexedDB, but the exact limits and how they are measured are not well-defined.
    • The limit itself is not a fixed value. That diminishes the value of the getBytesInUse method, for anything other than estimating how much storage is occupied by the extension.

The relation between input and the storage quota for IDB is not obvious. Even if we were to reverse-engineer its implementation, it could change and be invalid in the future.

It is however possible to look up the overall byte usage. Here is an example:

  1. Install an extension that has the storage permission and stores something there. For example uBlock Origin: https://addons.mozilla.org/en-US/firefox/addon/ublock-origin/
  2. Open the Browser Console (if you don't see a way to input code, enable devtools.chrome.enabled, see https://firefox-source-docs.mozilla.org/devtools-user/browser_console/index.html)
  3. Run the following code snippet:
var extension = WebExtensionPolicy.getByID("uBlock0@raymondhill.net").extension;
var { ExtensionStorageIDB } = ChromeUtils.importESModule("resource://gre/modules/ExtensionStorageIDB.sys.mjs");
var principal = ExtensionStorageIDB.getStoragePrincipal(extension);
Services.qms.getUsageForPrincipal(principal, req => console.log(req.result.usage));

The last method is documented at https://searchfox.org/mozilla-central/rev/a981ec4d8cbcb3bdc289c2ffb57018368dea4f6c/dom/quota/nsIQuotaManagerService.idl#259-271
and req.result is https://searchfox.org/mozilla-central/rev/a981ec4d8cbcb3bdc289c2ffb57018368dea4f6c/dom/quota/nsIQuotaResults.idl#48-65

All together, I think that we could at the very least have a reliable way to support getBytesInUse(null), i.e. getting the number of bytes in use by the full extension. Getting the quota for individual keys could be possible only if there is a reliable way to estimate the quota.

Side note: In Chrome, the getBytesInUse method is paired with the QUOTA_BYTES constant, exposed as storage.local.QUOTA_BYTES and currently defined to be 10 MB (used to be 5 MB). In Firefox we don't have a fixed quota so it would be rough to support this. Therefore I advice against supporting this in the storage.local API. We don't even have this in the storage.sync.QUOTA_BYTES API, despite enforcing a quota there (webext-storage Rust source).

Attachment #9504425 - Attachment description: Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas. r=#extension-reviewers → Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas.
Attachment #9504425 - Attachment description: Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas. → WIP: Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas.

So, for managed storage it seems like the following would make sense:

  • I'll make the function return 0 for all input.
  • We'll update the documentation with a note like those for Chrome and Edge ("Always resolves with a value of 0.")

For local JSONFile implementation:

  • Either use the implementation I already built based on Comment 20 or do something that keeps it consistent with the IndexedDB implementation (TBD).

For local IndexedDB implementation:

  • I tried out the method you shared to look up overall byte usage, but rather than use uBlock Origin, I used AI to quickly build an extension that would let me add and remove key-value pairs to local storage. I was hoping to get a lead on how to estimate the quota but of course it's not that simple. Simply running the extension without adding or removing any items from storage bumps the usage from 0 to 49152. Adding four different key-value pairs that each have a 5,000-character value doesn't increase the reported usage. However adding two KVPs that each have a 10,000-character value does increase the reported usage to 57344. I also noted that 49152 is 48MB and 57344 is 56MB, and those seem inefficient.
  • My understanding is that under the hood, Firefox is using SQLite. I could dig in and figure out a method for estimating the use of a particular item on that basis. The problem is that it seems to be fairly theoretical if we believe the results from my previous experiment using getUsageForPrincipal() because usage seems to be a step function rather than continuous. (Maybe this is because of how SQLite has page sizes?)
  • It seems like the best options are:
    1. Implement it only for getBytesInUse(null) to return the usage of the whole StorageArea.
    2. Implement an approximation based on how SQLite3 file format works.

I guess my underlying question is what's the point of this function? What are the intended use cases? From what I'm seeing in this bug, developers seem to care about usage by the extension in total, not the individual items.

Flags: needinfo?(rob)

(In reply to Nate Gross from comment #28)

So, for managed storage it seems like the following would make sense:

  • I'll make the function return 0 for all input.
  • We'll update the documentation with a note like those for Chrome and Edge ("Always resolves with a value of 0.")

Sounds reasonable.

For local JSONFile implementation:

  • Either use the implementation I already built based on Comment 20 or do something that keeps it consistent with the IndexedDB implementation (TBD).

If we want it to reflect reality, we can indeed do that.
Or, if we don't support storage estimations of keys, we could even just return the size of the file itself.

For local IndexedDB implementation:
Adding four different key-value pairs that each have a 5,000-character value doesn't increase the reported usage. However adding two KVPs that each have a 10,000-character value does increase the reported usage to 57344. I also noted that 49152 is 48MB and 57344 is 56MB, and those seem inefficient.

  • My understanding is that under the hood, Firefox is using SQLite. I could dig in and figure out a method for estimating the use of a particular item on that basis. The problem is that it seems to be fairly theoretical if we believe the results from my previous experiment using getUsageForPrincipal() because usage seems to be a step function rather than continuous. (Maybe this is because of how SQLite has page sizes?)

I don't know exactly how the quotas are computed, but it is indeed backed by SQLite. In general (not sure if applicable here), SQLite can contain "unused" space, which can be minimized with VACUUM. However, these are merely implementation details and I don't know if the enforced quota is ultimately based on the files on disk or on the sum of disks. Even if we implement something that works with the current implementation, we're one refactor away from becoming inaccurate.

  • It seems like the best options are:
    1. Implement it only for getBytesInUse(null) to return the usage of the whole StorageArea.

Sounds good to me. This is something we can already do with what we have.

  1. Implement an approximation based on how SQLite3 file format works.

Rather than trying to pierce down to the lowest layers here, I'd prefer to use a higher level method if available. Naively, at some point there is a database read that returns the data from the database. Somewhere between the API call and this layer, it might be feasible to get a decent estimate of the byte usage.

I guess my underlying question is what's the point of this function? What are the intended use cases? From what I'm seeing in this bug, developers seem to care about usage by the extension in total, not the individual items.

I can think of two clear use cases:

  • In this bug, the desire for an extension to show how much storage space is used by a particular feature within the extension.
  • In browsers that have fixed quota limits (e.g. Chrome), knowing the number of bytes in use can help with estimating how much more space the extension can use. As written in my last comment, we don't have fixed quota limits, which makes it rather hard to do something useful with this, other than displaying information.
    • In Firefox we don't have fixed quotas. There may even be a bug in the extension framework implementation with handling data removal (that happens as part of quota enforcement due to disk space pressure), which we recently witnessed in bug 1979997.
Flags: needinfo?(rob)
See Also: → 1979997
Attachment #9504425 - Attachment description: WIP: Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas. → Bug 1385832 - Added support for getBytesInUse() to local and managed storage areas. r=#extension-reviewers

I've implemented it so that for local storage, getBytesInUse() uses the same logic (and returns the same values) as sync when using JSONFile and uses the same logic (and returns the same values) as session when using IndexedDB.

Keywords: dev-doc-needed
Pushed by rob@robwu.nl: https://github.com/mozilla-firefox/firefox/commit/41372fac2943 https://hg.mozilla.org/integration/autoland/rev/54a170b0d86d Added support for getBytesInUse() to local and managed storage areas. r=extension-reviewers,robwu
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 144 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: