Closed
Bug 466173
Opened 17 years ago
Closed 14 years ago
RFE: per-profile firefox anti-phishing urlclassifier3.sqlite is wasteful, want system-wide copy
Categories
(Toolkit :: Safe Browsing, enhancement)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: mcepl, Unassigned)
References
Details
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.0.4) Gecko/2008111217 Fedora/3.0.4-1.fc10 Firefox/3.0.4
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.0.4) Gecko/2008111217 Fedora/3.0.4-1.fc10 Firefox/3.0.4
(originally filed in the Red Hat bugzilla)
Firefox-3 contains a new anti-phishing mechanism which relies on a
regularly-updated database (from Google). This database (urlclassifier3.sqlite)
sits in each profile (so a single user can have multiple copies), and it looks
like expected site nowadays (there were some bugs earlier) is ~55MB.
For space-constrained home directories this is a problem - we now have
regularly users running out of home directory quota thanks to this database.
(The total disk space wasted over here is not so bad, just 1TB raw disk... but
fixing user quota problems costs manpower).
We would like an option to move this database to a centrally-updated (and
read-only, for the users) location. Ideally, this would happen automatically,
i.e. by checking a system-wide database first (for freshness, i.e. less than 2
days old) before deciding to download the data to the local profile.
Known workarounds:
* anti-phishing can be turned off and the file deleted
(http://kb.mozillazine.org/Urlclassifier2.sqlite). Requires intervention per
user, turns off a useful security feature => bad.
* the file size can be temporarily reduced by "vacuum"ing the database.
Requires intervention per user, gives temporary relief only and is supposed to
eventually be done automatically(? unclear since this causes fragmentation, [1]
). But even a nicely cleaned database is expected to be in the 40MB-range. =>
no real gain.
Reproducible: Always
Steps to Reproduce:
1.just use firefox on the server with many many users
2.
3.
Actual Results:
tons of disk space is wasted by duplicate storage of read-only information
Expected Results:
shouldn't be
Additional information:
* looks like the mechanism isn't actually turned on yet (so the space is really
wasted)?
https://bugzilla.redhat.com/show_bug.cgi?id=463157
* somewhat related upstream bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=383031 (but people seem to be
happy that this is no longer in the "roaming profile", i.e. Windows-only)
* [1] https://bugzilla.mozilla.org/show_bug.cgi?id=385834#c20 - auto-vacuum is
bad and off (but for the history DB only?), full vacuum will lock up UI?
Comment 1•17 years ago
|
||
A system wide copy would be valuable for Windows users, too. Not solely as platform specific need or want.
Comment 2•17 years ago
|
||
I think this would be useful, but what is updating the database? It churns very rapidly.
It would be nice if only one copy of the file is maintained or if the file can be redirected to some location (other than roaming profile) by enabling a setting or some other means.
Comment 4•16 years ago
|
||
Matej, can we get a response on the question in comment #2 ?
Comment 5•16 years ago
|
||
(In reply to comment #2)
> I think this would be useful, but what is updating the database? It churns
> very rapidly.
What do you mean with this question? The database is managed by url-classifier component, you can find it in toolkit/components/url-classifier.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 6•16 years ago
|
||
Ahh, sorry, you're talking about updating the shared database from it's origin at Google. Well what about some binary tool?
Comment 7•16 years ago
|
||
Is there a way to find out what percentages of users run multiple profiles, for the purpose of prioritizing this? Having a separate url-classifier database for every profile is wasteful, but I think the majority of users don't run multiple profiles. Shrinking the file might give us more bang for the buck.
| Reporter | ||
Comment 8•16 years ago
|
||
(In reply to comment #7)
> Is there a way to find out what percentages of users run multiple profiles, for
> the purpose of prioritizing this? Having a separate url-classifier database for
> every profile is wasteful, but I think the majority of users don't run multiple
> profiles. Shrinking the file might give us more bang for the buck.
For us, this is not that much about people with multiple profiles, but about some our enterprise customers who have $HOME directories on network drives and are quite sad to have (as of now) 58MB stuff many-hundred times uselessly copied on the server.
Severity: normal → enhancement
Comment 9•16 years ago
|
||
(In reply to comment #8)
> For us, this is not that much about people with multiple profiles, but about
> some our enterprise customers who have $HOME directories on network drives and
> are quite sad to have (as of now) 58MB stuff many-hundred times uselessly
> copied on the server.
Again, this is very much an edge case, as the vast majority of users only ever have 1 profile. Are you running this on a linux system? perhaps you can write a shell script that deletes the urlcassifier sqlite and symlinks to a "central" db? This may not work either as there may be race conditions writing to the database since the url-classifier is using the synchronous sqlite api.
Comment 10•16 years ago
|
||
I meant to say the "synchronous MozStorage API".
And with that in mind, this should be a WONTFIX until we re-write the database connectivity to use 100% async MozStorage API. I don't see that happening for some time.
Comment 11•15 years ago
|
||
Would this be possible short term:
-- allow a 'read-only' setting in FF which would access the db but turn off the update daemon
The thought would be that we could then:
1) setup all normal users in read-only mode
2) put in a symlink to a shared copy of the db
3) run some sort of chron job to fire off every midnight & start a regular FF to pull down any updates
Gets us 99% of the benefit without having to wait for a complete re-write of the internals. Steps (1) and (2) could be scripted pretty easily as part of normal new-user setup.
This is becoming a problem for us -- we tend to have a larger number of relatively unsophisticated users and this DB is taking more space than all of their other files combined. On the other hand, we also really need the anti-phishing feature .....
Comment 12•15 years ago
|
||
(In reply to comment #11)
> Would this be possible short term:
>
> -- allow a 'read-only' setting in FF which would access the db but turn off the
> update daemon
I doubt it. The "freshness guarantee" would go out the window in this kind of scenario.
There is some talk of a re-write in JS - although nothing is planned or approved at this point - in which case the shared db should be considered.
Comment 13•15 years ago
|
||
Understand and agree with your concern re freshness.
I'm just looking for a short-term hack that will tide us over until a better architected solution is available.
At least this would let each site admin get control of the DB size, and make a determination if 24-hour updates (or 8 or 6 depending on how you setup the chron jobs) are sufficient. Better than only having the choice between a lot of duplicate DBs (and update traffic) or turn the feature off completely.
Was also hoping that the actual code change would be trivial: add one more check-box to the options panel. If off, the current control flow. If on, then open the DB read-only and don't start the update task.
Comment 14•15 years ago
|
||
(In reply to comment #13)
> At least this would let each site admin get control of the DB size, and make a
> determination if 24-hour updates (or 8 or 6 depending on how you setup the
> chron jobs) are sufficient. Better than only having the choice between a lot
> of duplicate DBs (and update traffic) or turn the feature off completely.
The freshness guarantee is 45 minutes. We have to adhere to this.
>
> Was also hoping that the actual code change would be trivial: add one more
> check-box to the options panel. If off, the current control flow. If on, then
> open the DB read-only and don't start the update task.
The code in question is overly-complex, so there are no trivial changes to it.
Comment 15•15 years ago
|
||
(In reply to comment #13)
> If on, then open the DB read-only and don't start the update task.
Also, it turns out there is no read-only mozStorage API
Comment 16•15 years ago
|
||
Nuts.
The freshness guarantee wouldn't bother me as long as it was documented so that if a sysadmin played the symlink game then they were making an informed choice to lose this guarantee for their user community.
But the code complexity and lack of a read-only API are killers.
Can we encourage a higher priority for this re-write then? It would help any multi-user site, not to mention reducing the update load on the source of the data. For example, we expect to have 1,000s of users!
Thanks for the insight on the technical side of this.
Comment 17•15 years ago
|
||
Hello,
At some point, this was considered by some as "an edge case, as the vast majority of users only ever have 1 profile".
The requested change would also benefit enterprises and home PCs where multiple users share the same PC. Say, a home PC with 3 users.
As far as I know, an antivirus shares its virus database among all users of the computer. FF should work the same way with urlclassifier3.sqlite, which has similar usage and update frequency.
Thank you very much in advance
Cheers
O.
Comment 18•14 years ago
|
||
I added some comments in bug 359145 about the problems with a shared-file approach.
The only way I can see this work is by moving the SafeBrowsing updates into a separate process that runs as administrator, and then making the Firefoxes using it read-only. This is I presume how the virus scanners with shared DB's work. But it's a significant amount of engineering.
It's necessary to split out the caching of completion requests and results and *do* store those per profile, because it's a hard requirement in the protocol that these be cached whenever they happen, and they can happen at any time. These are small, but in the old code they're stored in the same urlclassifier3.sqlite database. The rework in bug 673470 does this.
Comment 19•14 years ago
|
||
I don't think any amount of complexity is worth dealing with multiple profile space considerations here.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
Comment 20•14 years ago
|
||
Benjamin, I guess this is just because you haven't stepped in any of bugreporter's shoes which makes you decide so. Here is info from two of my hosts:
[s2@katleriai ~]$ find ~/.mozilla/firefox/ -name urlclassifier3.sqlite -ls | awk '{printf("%d+",$7)} END {printf("0\n")}' | bc
924577792
winetester@letta:~$ find ~/.mozilla/firefox/ -name urlclassifier3.sqlite -ls | awk '{printf("%d+",$7)} END{printf("0\n")}' | bc
1852481536
One host has 0.8 GiB of reduntant data while another has 1.7 G!
Another reason you say so probably is insufficient manpower of Firefox hackers. But if I implement this, resolution WONTFIX means my code won't be accepted apriori. That's harsh:).
Please reopen.
I not gonna live up with this situation ignorantly.
| Assignee | ||
Updated•11 years ago
|
Product: Firefox → Toolkit
Comment 21•11 years ago
|
||
As a workaround, it's always possible to use Hermann Schirnagl's Hard Link Shell Extension (schirnagl.at) and replace copies by hardlinks.
As an additional solution, i suggest a preference for the location so users can point to a custom /cache/ directory that can be shared and doesn't eat system space.
Comment 22•11 years ago
|
||
The new SB database is now far less than 1/10th the size of what the old one was when this bug was filed, and we honor the cache directory from the environment for storing it. There isn't even a point in working around it.
Comment 23•8 years ago
|
||
For the bots, i'll correct the above link:
(In reply to melchior blausand from comment #21)
> As a workaround, it's always possible to use Hermann Schinagl's Hard Link Shell Extension (http://schinagl.priv.at/) and replace copies by hardlinks.
Mind: HLSE is an explorer extension for Windows systems.
You need to log in
before you can comment on or make changes to this bug.
Description
•