Open Bug 733207 Opened 13 years ago Updated 2 months ago

Sync failure due to Adblock Plus Add on when size exceeds 1 Meg

Categories

(Firefox :: Sync, defect)

defect

Tracking

()

People

(Reporter: jimmyjoet, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2 Build ID: 20120215223356 Steps to reproduce: Tried to Sync Actual results: Sync encountered an error while syncing:Unknown error. Sync will automatically retry this action. A complete sync never occurs Expected results: A complete sync occured.
I want to add that I have tried with varrying sizes of blocked items in the Adblock Plus list and the error is size related when my total sync hits a little over 1 Meg
The problem is Mozilla's server imposes a per-record size limit, which Adblock Plus is exceeding. I /think/ this might be the first report that Adblock is running into our record size limits. I see the following outcomes: 1) Adblock rewrites its Sync engine to not rely on such large records 2) Mozilla increases per-record size limit 3) You run your own Sync server that isn't susceptible to these limits. I'm sure others will chime in here...
Mozilla operations says #2 is off the table: we will not be increasing the per-record size for the Mozilla-hosted Sync service.
At some point Adblock will have to break up records, but a quick workaround would be to gzip the subscription filters before encryption and upload. [1] Maybe compression would work for any feasible filter set size, but I don't have any data on that. The current JSON object structures looks like { id: UUID subscriptions: [ { url: SUBSCRIPTION_URL, disabled: BOOL, filters: [ { text: FILTER, disasbled: BOOL } ] } ] } compression should work well in this case. [1] - https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIStreamConverter
(In reply to Gregory Szorc [:gps] from comment #2) > The problem is Mozilla's server imposes a per-record size limit, which > Adblock Plus is exceeding. What is the size limit? > I /think/ this might be the first report that Adblock is running into our > record size limits. It is a very uncommon scenario - normally, Adblock Plus data shouldn't exceed a hundred kB, even in highly customized setups. We don't sync filters from subscriptions, only the custom filters added to the personal list. One would need lots of them to hit the limit. > 1) Adblock rewrites its Sync engine to not rely on such large records The original engine used lots of small records but I was told that this would be slow (don't remember who I talked to back then, this was almost four years ago). I was actually recommended to sync everything in one record.
(In reply to Wladimir Palant from comment #5) > What is the size limit? 1MB, as of Bug 692372. That's supposed to be a sane upper limit for 100 records! Note that this is the limit for a single POST request; *indirectly* that limits the size of a single record. (Sorry, this isn't documented anywhere that I know of.) > It is a very uncommon scenario - normally, Adblock Plus data shouldn't > exceed a hundred kB, even in highly customized setups. We don't sync filters > from subscriptions, only the custom filters added to the personal list. One > would need lots of them to hit the limit. Understood. > > 1) Adblock rewrites its Sync engine to not rely on such large records > > The original engine used lots of small records but I was told that this > would be slow (don't remember who I talked to back then, this was almost > four years ago). I was actually recommended to sync everything in one record. It shouldn't be terrible to use lots of small records (after all, we use one per history item, form field entry, password, and bookmark…). If you can strike a balance whereby records that change together can be bundled together, it'll probably help, but it's better to use small records than ones that are hundreds of KB. How many records are we talking about when you say "lots"? And presumably you only upload ones that have changed, right?
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Richard Newman [:rnewman] from comment #6) > If you can > strike a balance whereby records that change together can be bundled > together, it'll probably help, There are no "records that change together" in my case. > but it's better to use small records than > ones that are hundreds of KB. I just checked and the total size of the Adblock Plus data is 2445 bytes for me. That should be the typical scenario. > How many records are we talking about when you say "lots"? If the total size is 1 MB - something like 20,000. > And presumably you only upload ones that have changed, right? Well, the entire engine logic would have to be changed again. But yes - only the ones that change should be uploaded. Unless it is the first sync of course.
(In reply to Richard Newman [:rnewman] from comment #6) > (In reply to Wladimir Palant from comment #5) > > > What is the size limit? > > 1MB, as of Bug 692372. That's supposed to be a sane upper limit for 100 > records! > There's some confusing terminology being used here. The 1MB limit is actually a limit on the total record size we will process in a single request - 10 records of 101K will trip it. However, the size limit for a single record according to the spec (http://docs.services.mozilla.com/storage/apis-1.1.html) is 256K. > > The original engine used lots of small records but I was told that this > > would be slow (don't remember who I talked to back then, this was almost > > four years ago). I was actually recommended to sync everything in one record. Not sure why this would be slow at this point (four years ago it was a totally different thing. DAV is limiting!). Sync is designed to work with history, and unless you are constantly changing every single one of the records, individual updates should be fine.
Heh, I guess that I will have to make another attempt at syncing a complex structure as individual records... My main problem back then (and the reason why I asked about it at the Summit) was generating record IDs. The "canonical" IDs cannot be used for privacy reasons, not even as hashes, and storing additional GUIDs for everybody when only few people use Sync seems wasteful.
One potential workaround would be to define a specifically named GUID which contains different payload that can be used to coordinate GUID usage across clients. This is how the preferences engine currently works. If your Sync service credentials (but not your sync key) are compromised, someone can already see you have an adblock collection, so that cat is out of the bag: an additional statically-named GUID doesn't reveal anything more. As for the payload, the idea that first entered my head was to generate a cryptographically secure random value, say 512 or 1024 bytes, base64 encode it, and stuff it in the payload. This value would be used as a salt when computing the hashes of the "canonical" IDs. Each client could store this salt locally, preferably in the login manager. There are some details to work out regarding the sync semantics (particularly around when the salt changes). But, I think this would preserve the existing security model just fine. dchan: can you weigh in on the security/privacy aspect? rnewman: what do you think from the Sync perspective?
(In reply to Gregory Szorc [:gps] from comment #10) > rnewman: what do you think from the Sync perspective? I think storing a salt as a magic record is a really interesting idea. Assuming that record is synced first and ignored during normal date fetches, with appropriate wiping of the collection, this is a great way to sync records with non-GUID primary keys. (I should have thought of that before adding GUIDs to favicons…) Arguably we should have a 'salts' collection and manage that automatically…
(In reply to Gregory Szorc [:gps] from comment #10) > dchan: can you weigh in on the security/privacy aspect? This is my understanding of the proposal for the ABP case - ABP filters have a canonical id - storing the id or a regular hash does not address privacy concerns - the proposal is to create a new collection for storing salts - generate a secure value 512-1024 bytes - store b64(SALT) + delim + b64(engine-name) ? in the encrypted payload for this collection - To check whether a filter exists -- client retrieves the sync-engine salt from the special collection -- client checks that the engine name / value of payload matches -- sync-engine generates H(canonical id + salt) and checks ID existence to sync/update/etc The security of the proposal appears to be sound. An attacker that has compromised a user's sync credentials will see - the existence of the special collection - a list of encrypted payloads - but they won't know the values of the salts or the associated sync-engines The only privacy concern I can think of is the limited number of addons that may be using this special collection.
dchan: what is your evaluation of the simpler version, which is to store the salt within the ABP collection itself? That is, assuming AdBlock Plus uses the "abp" collection: abp/salts abp/abcdef abp/ghijkl abp/... where "salts" contains the salt value, encrypted? I believe this is isomorphic, but best to consider what's most likely to be implemented up-front…
(In reply to Richard Newman [:rnewman] from comment #13) > dchan: what is your evaluation of the simpler version, which is to store the > salt within the ABP collection itself? That is, assuming AdBlock Plus uses > the "abp" collection: > > abp/salts > abp/abcdef > abp/ghijkl > abp/... > > where "salts" contains the salt value, encrypted? > > I believe this is isomorphic, but best to consider what's most likely to be > implemented up-front… Storing the salt in the abp collection provides the same amount of privacy as gps suggestion in my opinion. An attacker could infer that a user has the abp plugin by the existence of the collection, but that can already be done. The record IDs aren't meaningful to an attacker since they don't have the unencrypted salts.
Cool! As for having a separate collection and Sync client API for managing salts, I too like that idea! It opens up some interesting possibilities for managing data without having to store a separate, private per-client identifier. rnewman already mentioned this could be used by favicon sync. Jetpack is considering how to create APIs to allow extensions to easily sync data and I feel this could be leveraged there. That being said, the timetable on that would likely be months (given the current focus is BrowserID + Protocol 2.0 updates). So, my recommendation for ABP is to manage a salt in its own collection until something better comes along. Or, ABP can ignore this conversation and do something else/more appropriate.
(In reply to Gregory Szorc [:gps] from comment #15) > That being said, the timetable on that would likely be months (given the > current focus is BrowserID + Protocol 2.0 updates). So, my recommendation > for ABP is to manage a salt in its own collection until something better > comes along. That's what I would do anyway - it would take months until I can rely on that change even if it landed on trunk today. I'm not really certain about the "sync a single record before anything else can be done" part but I cannot claim to know the code, will try to work this out.
(In reply to Wladimir Palant from comment #16) > I'm not really certain about > the "sync a single record before anything else can be done" part but I > cannot claim to know the code, will try to work this out. In your engine, do something like the following: _processIncoming: function _processIncoming() { let coll = new Collection(this.engineURL, this._recordObj); coll.ids = "salts"; // Or whatever you name your special record. coll.full = true; coll.recordHandler = function(record) { // record is a decrypted WBORecord just like you manage in your engine. // Do stuff with the record here. }; let response = coll.get(); if (!response.success) { // ... } // Validate your recordHandler did the right thing. // Call the main _processIncoming implementation. SyncEngine.prototype._processIncoming.call(this); } Alternatively, you can keep state in your engine on whether the salt record has been retrieved. The above logic could be installed in a custom applyIncoming implementation which just-in-time fetches the salt record if an incoming record is seen. This would save you from doing salt record processing on every sync. rnewman probably has better ideas. The #sync channel in IRC would be a great place to hash this out...
Salts: Bug 824188.
Component: General → Firefox Sync: Backend
Depends on: 824188
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: