Closed Bug 466173 Opened 17 years ago Closed 14 years ago

RFE: per-profile firefox anti-phishing urlclassifier3.sqlite is wasteful, want system-wide copy

Categories

(Toolkit :: Safe Browsing, enhancement)

x86
Linux
enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mcepl, Unassigned)

References

Details

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.0.4) Gecko/2008111217 Fedora/3.0.4-1.fc10 Firefox/3.0.4 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.0.4) Gecko/2008111217 Fedora/3.0.4-1.fc10 Firefox/3.0.4 (originally filed in the Red Hat bugzilla) Firefox-3 contains a new anti-phishing mechanism which relies on a regularly-updated database (from Google). This database (urlclassifier3.sqlite) sits in each profile (so a single user can have multiple copies), and it looks like expected site nowadays (there were some bugs earlier) is ~55MB. For space-constrained home directories this is a problem - we now have regularly users running out of home directory quota thanks to this database. (The total disk space wasted over here is not so bad, just 1TB raw disk... but fixing user quota problems costs manpower). We would like an option to move this database to a centrally-updated (and read-only, for the users) location. Ideally, this would happen automatically, i.e. by checking a system-wide database first (for freshness, i.e. less than 2 days old) before deciding to download the data to the local profile. Known workarounds: * anti-phishing can be turned off and the file deleted (http://kb.mozillazine.org/Urlclassifier2.sqlite). Requires intervention per user, turns off a useful security feature => bad. * the file size can be temporarily reduced by "vacuum"ing the database. Requires intervention per user, gives temporary relief only and is supposed to eventually be done automatically(? unclear since this causes fragmentation, [1] ). But even a nicely cleaned database is expected to be in the 40MB-range. => no real gain. Reproducible: Always Steps to Reproduce: 1.just use firefox on the server with many many users 2. 3. Actual Results: tons of disk space is wasted by duplicate storage of read-only information Expected Results: shouldn't be Additional information: * looks like the mechanism isn't actually turned on yet (so the space is really wasted)? https://bugzilla.redhat.com/show_bug.cgi?id=463157 * somewhat related upstream bug: https://bugzilla.mozilla.org/show_bug.cgi?id=383031 (but people seem to be happy that this is no longer in the "roaming profile", i.e. Windows-only) * [1] https://bugzilla.mozilla.org/show_bug.cgi?id=385834#c20 - auto-vacuum is bad and off (but for the history DB only?), full vacuum will lock up UI?
A system wide copy would be valuable for Windows users, too. Not solely as platform specific need or want.
I think this would be useful, but what is updating the database? It churns very rapidly.
It would be nice if only one copy of the file is maintained or if the file can be redirected to some location (other than roaming profile) by enabling a setting or some other means.
Matej, can we get a response on the question in comment #2 ?
(In reply to comment #2) > I think this would be useful, but what is updating the database? It churns > very rapidly. What do you mean with this question? The database is managed by url-classifier component, you can find it in toolkit/components/url-classifier.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Ahh, sorry, you're talking about updating the shared database from it's origin at Google. Well what about some binary tool?
Depends on: 359145
Is there a way to find out what percentages of users run multiple profiles, for the purpose of prioritizing this? Having a separate url-classifier database for every profile is wasteful, but I think the majority of users don't run multiple profiles. Shrinking the file might give us more bang for the buck.
(In reply to comment #7) > Is there a way to find out what percentages of users run multiple profiles, for > the purpose of prioritizing this? Having a separate url-classifier database for > every profile is wasteful, but I think the majority of users don't run multiple > profiles. Shrinking the file might give us more bang for the buck. For us, this is not that much about people with multiple profiles, but about some our enterprise customers who have $HOME directories on network drives and are quite sad to have (as of now) 58MB stuff many-hundred times uselessly copied on the server.
Severity: normal → enhancement
(In reply to comment #8) > For us, this is not that much about people with multiple profiles, but about > some our enterprise customers who have $HOME directories on network drives and > are quite sad to have (as of now) 58MB stuff many-hundred times uselessly > copied on the server. Again, this is very much an edge case, as the vast majority of users only ever have 1 profile. Are you running this on a linux system? perhaps you can write a shell script that deletes the urlcassifier sqlite and symlinks to a "central" db? This may not work either as there may be race conditions writing to the database since the url-classifier is using the synchronous sqlite api.
I meant to say the "synchronous MozStorage API". And with that in mind, this should be a WONTFIX until we re-write the database connectivity to use 100% async MozStorage API. I don't see that happening for some time.
Would this be possible short term: -- allow a 'read-only' setting in FF which would access the db but turn off the update daemon The thought would be that we could then: 1) setup all normal users in read-only mode 2) put in a symlink to a shared copy of the db 3) run some sort of chron job to fire off every midnight & start a regular FF to pull down any updates Gets us 99% of the benefit without having to wait for a complete re-write of the internals. Steps (1) and (2) could be scripted pretty easily as part of normal new-user setup. This is becoming a problem for us -- we tend to have a larger number of relatively unsophisticated users and this DB is taking more space than all of their other files combined. On the other hand, we also really need the anti-phishing feature .....
(In reply to comment #11) > Would this be possible short term: > > -- allow a 'read-only' setting in FF which would access the db but turn off the > update daemon I doubt it. The "freshness guarantee" would go out the window in this kind of scenario. There is some talk of a re-write in JS - although nothing is planned or approved at this point - in which case the shared db should be considered.
Understand and agree with your concern re freshness. I'm just looking for a short-term hack that will tide us over until a better architected solution is available. At least this would let each site admin get control of the DB size, and make a determination if 24-hour updates (or 8 or 6 depending on how you setup the chron jobs) are sufficient. Better than only having the choice between a lot of duplicate DBs (and update traffic) or turn the feature off completely. Was also hoping that the actual code change would be trivial: add one more check-box to the options panel. If off, the current control flow. If on, then open the DB read-only and don't start the update task.
(In reply to comment #13) > At least this would let each site admin get control of the DB size, and make a > determination if 24-hour updates (or 8 or 6 depending on how you setup the > chron jobs) are sufficient. Better than only having the choice between a lot > of duplicate DBs (and update traffic) or turn the feature off completely. The freshness guarantee is 45 minutes. We have to adhere to this. > > Was also hoping that the actual code change would be trivial: add one more > check-box to the options panel. If off, the current control flow. If on, then > open the DB read-only and don't start the update task. The code in question is overly-complex, so there are no trivial changes to it.
(In reply to comment #13) > If on, then open the DB read-only and don't start the update task. Also, it turns out there is no read-only mozStorage API
Nuts. The freshness guarantee wouldn't bother me as long as it was documented so that if a sysadmin played the symlink game then they were making an informed choice to lose this guarantee for their user community. But the code complexity and lack of a read-only API are killers. Can we encourage a higher priority for this re-write then? It would help any multi-user site, not to mention reducing the update load on the source of the data. For example, we expect to have 1,000s of users! Thanks for the insight on the technical side of this.
Hello, At some point, this was considered by some as "an edge case, as the vast majority of users only ever have 1 profile". The requested change would also benefit enterprises and home PCs where multiple users share the same PC. Say, a home PC with 3 users. As far as I know, an antivirus shares its virus database among all users of the computer. FF should work the same way with urlclassifier3.sqlite, which has similar usage and update frequency. Thank you very much in advance Cheers O.
I added some comments in bug 359145 about the problems with a shared-file approach. The only way I can see this work is by moving the SafeBrowsing updates into a separate process that runs as administrator, and then making the Firefoxes using it read-only. This is I presume how the virus scanners with shared DB's work. But it's a significant amount of engineering. It's necessary to split out the caching of completion requests and results and *do* store those per profile, because it's a hard requirement in the protocol that these be cached whenever they happen, and they can happen at any time. These are small, but in the old code they're stored in the same urlclassifier3.sqlite database. The rework in bug 673470 does this.
I don't think any amount of complexity is worth dealing with multiple profile space considerations here.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Benjamin, I guess this is just because you haven't stepped in any of bugreporter's shoes which makes you decide so. Here is info from two of my hosts: [s2@katleriai ~]$ find ~/.mozilla/firefox/ -name urlclassifier3.sqlite -ls | awk '{printf("%d+",$7)} END {printf("0\n")}' | bc 924577792 winetester@letta:~$ find ~/.mozilla/firefox/ -name urlclassifier3.sqlite -ls | awk '{printf("%d+",$7)} END{printf("0\n")}' | bc 1852481536 One host has 0.8 GiB of reduntant data while another has 1.7 G! Another reason you say so probably is insufficient manpower of Firefox hackers. But if I implement this, resolution WONTFIX means my code won't be accepted apriori. That's harsh:). Please reopen. I not gonna live up with this situation ignorantly.
Product: Firefox → Toolkit
As a workaround, it's always possible to use Hermann Schirnagl's Hard Link Shell Extension (schirnagl.at) and replace copies by hardlinks. As an additional solution, i suggest a preference for the location so users can point to a custom /cache/ directory that can be shared and doesn't eat system space.
The new SB database is now far less than 1/10th the size of what the old one was when this bug was filed, and we honor the cache directory from the environment for storing it. There isn't even a point in working around it.
For the bots, i'll correct the above link: (In reply to melchior blausand from comment #21) > As a workaround, it's always possible to use Hermann Schinagl's Hard Link Shell Extension (http://schinagl.priv.at/) and replace copies by hardlinks. Mind: HLSE is an explorer extension for Windows systems.
You need to log in before you can comment on or make changes to this bug.