Last Comment Bug 402469 - urlclassifier database takes forever to load
: urlclassifier database takes forever to load
Status: RESOLVED WORKSFORME
[external dependency]
: regression
Product: Toolkit
Classification: Components
Component: Safe Browsing (show other bugs)
: 3.0 Branch
: All All
: P2 major with 4 votes (vote)
: Firefox 3
Assigned To: Dave Camp (:dcamp)
:
:
Mentors:
: 435365 (view as bug list)
Depends on: 432490
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-04 14:22 PST by Bill Gianopoulos [:WG9s]
Modified: 2014-05-27 12:25 PDT (History)
45 users (show)
mbeltzner: blocking‑firefox3-
mbeltzner: blocking1.9.0.1-
mbeltzner: wanted1.9.0.x+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Bill Gianopoulos [:WG9s] 2007-11-04 14:22:09 PST
Under Firefox 2, if I remove the urlclassifer2.sqlite file and launch firefox2 it is rebuilt in seconds.  If I try the same with the trunk and the urlclissifer3.sqlite file, after an hour the file is only 212992 bytes long and none of the preferences for table version are populated.

Same happens with a new profile.
Comment 1 Mike Beltzner [:beltzner, not reading bugmail] 2007-11-05 08:41:45 PST
Blocking for investigation.
Comment 2 Dave Camp (:dcamp) 2007-11-07 14:43:52 PST
Google feeds us this data in small chunks every half hour.  It can take a little while to build up the complete list.

This is the intended behavior, but maybe we can get them to be more aggressive sending the initial data.
Comment 3 Bill Gianopoulos [:WG9s] 2007-11-07 15:13:37 PST
(In reply to comment #2)
> Google feeds us this data in small chunks every half hour.  It can take a
> little while to build up the complete list.
> 
> This is the intended behavior, but maybe we can get them to be more aggressive
> sending the initial data.
> 

Perhaps then we could post a recent file on the Mozilla mirror servers we use for release distributions and update that weekly or something and then have the browser load that if there is no database and then go to Google for updates from that.
Comment 4 Bill Gianopoulos [:WG9s] 2007-11-08 06:32:31 PST
Based on the amount of data I am downloading per 1/2 hour chunk, and the size of the urlclassifier.sqlite file on a system which appears to be doing anti-phishing protection correctly, it will take over 200 hours for the initial download to complete.

This means that after you upgrade from Firefox 2 to Firefox 3 you will have zero anti-phishing protection for your first 200 yours of usage.

This would seem to be entirely unacceptable.
Comment 5 Bill Gianopoulos [:WG9s] 2007-11-08 06:46:06 PST
It appears that once it really gets going the chunk size is more in the 90KB range than the 30KB I used above, so i guess it is more on the order of 75 hours.
Comment 6 Mike Connor [:mconnor] 2007-11-08 07:57:16 PST
One added note is this it is recently added entries first, which are generally the active phishes, so its not incredibly bad, but we should figure out how to do initial seeding faster.
Comment 7 Bill Gianopoulos [:WG9s] 2007-11-08 10:41:15 PST
Final result is that the final size of urlclassifier3.sqlite file was 17.728MB which at 90KB per transfer, or 180KB per hour, comes out to be 98.5 hours.

None of the examples anyone could give me seemed to trigger the phishing detection until the entire file was loaded.  But, I suppose it is possible those were all old entries. 
Comment 8 Reed Loden [:reed] (use needinfo?) 2007-11-17 18:51:00 PST
-rw-r--r-- 1 reed reed  41M 2007-11-17 20:29 urlclassifier3.sqlite

Why in the world is my urlclassifier3.sqlite 41MB large? Do we really make every user download a 41MB file in order to be secure? That seems crazy compared to 9.2M for urlclassifier2.sqlite and 1.7M for urlclassifier.sqlite.

Also, http://rrnryspace.com/index.cfm-fuseaction314Dlogin.process8526MyTokens79843964886883084155.htm shows up as a phish on branch but not on trunk.
Comment 9 Bill Gianopoulos [:WG9s] 2007-11-18 00:29:25 PST
(In reply to comment #8)
> -rw-r--r-- 1 reed reed  41M 2007-11-17 20:29 urlclassifier3.sqlite
> 
> Why in the world is my urlclassifier3.sqlite 41MB large? Do we really make
> every user download a 41MB file in order to be secure? That seems crazy
> compared to 9.2M for urlclassifier2.sqlite and 1.7M for urlclassifier.sqlite.

This is kind of the whole point of this bug.  The file will eventually grow to a ridiculous size despite the fact that it still does not have all the data.  I finally got a complete file loaded, and my urlclassifer3.sqlite is under 19MB.

-rw-r--r-- 1 wag wag 18640896 Nov 18 03:17 urlclassifier3.sqlite

> 
> Also,
> http://rrnryspace.com/index.cfm-fuseaction314Dlogin.process8526MyTokens79843964886883084155.htm
> shows up as a phish on branch but not on trunk.
> 

That URL is blocked for me. So, evidently, despite the fact that your urlclassifer3.sqlite file is over twice as large as mine, it is not the complete file.
Comment 10 Bill Gianopoulos [:WG9s] 2007-11-18 07:31:38 PST
After looking more closely at this, it is not just the initial load of the database that the current strategy does not work for.

If you only use the browser an hour or 2 per day, soon it will get hopelessly behind even if you seed it initially with a completely up-to-date database.

Given the current chunk size, the inter chunk delay needs to be more on the order of 1 or 2 minutes than 30 minutes in order for there to be any hope of maintaining an up-to-date database.

I think the longer delay needs to be between attempts to initiate an up date of a given table.

The way I would envision this working is you start to load a table, and use a 2 minute delay between chunks until that table is up-to-date and then wait at least 30 minutes before attempting to refresh that table again.
Comment 11 Bill Gianopoulos [:WG9s] 2007-11-20 11:49:47 PST
Well, a couple of things.  First of all I have been using bad terminology here based on a misunderstanding of the code I was reading.  It is not an inter chunk delay that is an issue here.  the delay is between update connections to the service.  Each connection can return thousands of chunks, so the chunksize is not an issue.

The second thing is that all of a sudden today, with no code changes on the Mozilla side of things, getting an up-to-date database, which up through yesterday took about 100 hours now seems to take more on the order of 3 hours.

The thing that seems to have changed is that suddenly in a single connection there are orders of magnitude more entries being added to the database.

My guess is that now that Firefox has gone beta, Google is dedicating more resources to the new format than they were when the browser was in an alpha state.

So, perhaps there is really no issue here at all.
Comment 12 ourasi 2007-11-21 06:53:40 PST
Now issue is with these bigger data blocks, that it is almost impossible to make a new profile or update older one.

Building of urlclassifier3.sqlite file in new profile freezes Firefos 3.0 beta1 (and Trunk) totally. (In reply to comment #11)
Comment 13 AndrewM 2007-11-21 17:17:53 PST
I just noticed something that at first glance seems alarming in relation to phishing protection (although not exactly this bug). While I was checking the size of my urlclassifier3.sqlite file (it's 14.1 MB), I noticed that the Last Modified stamp on the file is November 15 2007 1013. That is when I closed Firefox and restarted it.

So my question is: is this expected behavior? For places.sqlite, the last modified stamp updates as I visit new pages. If the phishing protection database is being kept up-to-date, shouldn't the last modified stamp on the urlclassifier3.sqlite file be more recent? Or perhaps the updates from Google are cached in memory or something and only written to disk at the end of a session?
Comment 14 AndrewM 2007-12-08 12:59:58 PST
The download of the urlclassifier database seems to be completely and utterly broken for me on the trunk. I have had Firefox open for days, which should be ample time to download the whole database, and yet it is stuck at 6KB. :/

By comparison, on the branch the database downloads within about a minute (although I'm not sure how complete it is, but it seems mostly complete because after that the file size grows very slowly.)

Also, the Target Milestone should be changed to M11 since this hasn't been fixed for Beta 2.
Comment 15 AndrewM 2007-12-19 16:37:08 PST
Can anyone reproduce the behavior I experienced in comment 14? In other words, with a new profile, the urlclassifier3.sqlite database grows to 6KB and then gets stuck there.

> Also, the Target Milestone should be changed to M11 since this hasn't been
> fixed for Beta 2.

Could someone please update the Target Milestone? I don't want this to fall off the radar for Beta 3 :)
Comment 16 Reed Loden [:reed] (use needinfo?) 2007-12-19 16:42:42 PST
(In reply to comment #15)
> Could someone please update the Target Milestone? I don't want this to fall off
> the radar for Beta 3 :)

TM doesn't really matter much... priority means more about when something will be fixed.
Comment 17 Jo Hermans 2007-12-20 01:15:01 PST
(In reply to comment #14)
> The download of the urlclassifier database seems to be completely and utterly
> broken for me on the trunk. I have had Firefox open for days, which should be
> ample time to download the whole database, and yet it is stuck at 6KB. :/
> 

As far as I know, updates for Firefox 3 are currently disabled, as an emergency fix for bug 404645. There have several improvements for beta 2, so it's possible that the updates will start again soon, after everyone upgraded to beta 2.

I don't work for either Mozilla for Google, so it's purely speculation ofcourse.
Comment 18 Jo Hermans 2007-12-21 15:43:47 PST
Note that updates seem to have been started again - my database is back at 827KB.
Comment 19 AndrewM 2008-01-03 18:15:09 PST
So I left my computer on and Minefield open (it was fighting cancer at the same time on BOINC so it wasn't a complete waste of energy ;) for most of the time over the holidays, and this is how the size of the urlclassifier3.sqlite file grew:

602KB   1020  Thursday December 20
971KB   2245  Thursday December 20
2492KB  2340  Friday December 21
4221KB  2340  Saturday December 22
5036KB  2335  Sunday December 23
6617KB  2230  Monday December 24
7600KB  2330  Tuesday December 25
10266KB 2345  Wednesday December 26
12971KB 2000  Thursday December 27
17236KB 2340  Friday December 28
19216KB 2110  Saturday December 29
19259KB 1100  Sunday December 30
19334KB 1725  Monday December 31
19388KB 2330  Wednesday January 2

So for me it still seemed to take at least 150 hours (being conservative given that my computer was not on 100% of the time) of Minefield being open continuously before the database size seemed to start leveling off at about 19 MB.

To me this seems indeed like a big regression from Fx 2; very few users keep Firefox open for that long continuously, which means that their phishing protection would never seem to be complete.
Comment 20 cmtalbert 2008-01-08 14:26:53 PST
Beltzner: is QA still wanted on this bug?  I don't see what new questions need to be answered here or what more information is needed.  Please let us know, or if you feel the problem is well-understood, please remove the QAWanted keyword.  Thanks.
Comment 21 Mike Beltzner [:beltzner, not reading bugmail] 2008-02-11 09:00:30 PST
Dave/Bill: was this fixed by the move to the new protocol in beta 3 and beyond? If so can we get a RESO on it?
Comment 22 Bill Gianopoulos [:WG9s] 2008-02-11 17:04:49 PST
This seems much better to me.  Loads in under an hour.
Comment 23 Dave Camp (:dcamp) 2008-02-29 14:39:48 PST
Google has fixed some problems in the list that were wasting some bandwidth, and seem to be feeding us significantly more data per update.  I'd like to keep this bug open a bit longer to keep it on my radar, but I don't think it needs to block release anymore.
Comment 24 Johnathan Nightingale [:johnath] 2008-03-11 11:31:49 PDT
(In reply to comment #23)
> Google has fixed some problems in the list that were wasting some bandwidth,
> and seem to be feeding us significantly more data per update.  I'd like to keep
> this bug open a bit longer to keep it on my radar, but I don't think it needs
> to block release anymore.

Re-nom'ng to make sure drivers see it leave the blocker list.
Comment 25 Mike Beltzner [:beltzner, not reading bugmail] 2008-03-12 22:49:22 PDT
Dave, the right way to do this is keep it as a blocker and resolve it when you're confident that it's fixed. That way if it becomes an issue again, it will get re-opened and re-inherit blocking status.

I trust that you'll continue to monitor.
Comment 26 Mike Connor [:mconnor] 2008-03-27 11:51:03 PDT
Dave, if this issue is resolved to your satisfaction, please resolve by April 2nd so its out of the way before the final push.  Seems to me we can safely resolve it now, and file bugs on any new issues that arise...
Comment 27 Dave Camp (:dcamp) 2008-04-01 11:08:29 PDT
I'm happy with the current state, will open bugs on new issues.
Comment 28 Dave Camp (:dcamp) 2008-04-09 20:14:21 PDT
OK, the server seems to have regressed a bit.  It appears that we get some of the list fairly quickly, but it takes way too long to really get the complete list.  I discussed it with the google guys, and it apparently it's related to how often the list is updated.

Google is aware and working on fixing this, I'm reopening this bug to track it.
Comment 29 Dave Camp (:dcamp) 2008-04-09 20:21:10 PDT
And to clarify something a bit - the file still grows to roughly its expected size reasonably quickly as it adds the freshest information.  But once you have the freshest information, the older updates have less of an impact on database size, as more of the data is expired.
Comment 30 Garrett Casto 2008-04-14 15:44:09 PDT
Status Update: We basically have a design flushed out which should increase the throughput of the redirects. We should have the changes made by the beginning of next week, and might need a few more days for testing.  I'll be around at the meeting tomorrow morning if people are interested in more information.
Comment 31 Mike Beltzner [:beltzner, not reading bugmail] 2008-04-28 23:00:23 PDT
Status update here? Last comment was April 14th - have we achieved satisfactory resolution?
Comment 32 Marria Nazif 2008-04-29 11:17:04 PDT
Just wanted to give you an update on the status here.  We are wrapping up the server changes that Garrett mentioned in Comment 30.  There should be a new server available for testing today or tomorrow.

So, this is still an outstanding problem and we have not reached a resolution yet.
Comment 33 Garrett Casto 2008-05-01 17:29:57 PDT
We have a new server ready, but we are waiting on bug 430530 so that we don't totally destroy Linux users.  Should be pushed tomorrow morning.
Comment 34 Rick Stockton 2008-05-01 18:30:00 PDT
(In reply to comment #33)
> We have a new server ready, but we are waiting on bug 430530 so that we don't
> totally destroy Linux users.  Should be pushed tomorrow morning.
> 

I like this plan, Garrett-- I commented earlier this afternoon (over on that bug) that I was hoping for a little test window before/after this one was pushed out on Google.

But that code still hasn't actually landed on the Trunk yet, Dave is still working on it. (Dave's comment 5:30 PM, nearly identical time as yours.) If he doesn't do it real early, even us 'Tinderbox' people will have only a few hours-- and tomorrow morning's nightly will definitely be concurrent with your change.

Maybe wait until 5/2 for the Google push, so "Nightly" users have a full day of the 430530 before your "push"? There's hardly anyone pulling from the Tinderbox builds, I wouldn't be surprised if it turned out that I was the ONLY Linux "non-expert, external to Mozilla" user to actually try it.

OTOH, If you would LIKE them to happen together, then your timing (tomorrow AM) as nearly perfect: It sounds like the 430530 change scraps database "Version 3", implementing "Version 4" (from my reading of https://bugzilla.mozilla.org/show_bug.cgi?id=430530#c26.) I haven't read the entire update and wouldn't understand it even if I tried, but I think this means that all the FF3 users are goijg to be starting over with a new "urlclassifier4.sqlite" database.
Comment 35 Rick Stockton 2008-05-01 18:44:53 PDT
No, the filename will not be changed. But the schema is Version 4, and the mismatch will cause the "old version" which us FF3 users have now will be scrapped. (All the DB content will be replaced, using the new schema).
Comment 36 Dave Camp (:dcamp) 2008-05-02 09:45:43 PDT
430530 made it in to the nightlies last night (except for the x86_64 builds, which apparently aren't auto-updated anyway), so I think we should go ahead and try this today.  The disk thrashing in 430530 gets worse as you get a bigger file, and this fix will make sure you get to a bigger file more quickly.
Comment 37 Rick Stockton 2008-05-02 20:49:15 PDT
I agree with Dave.... Go ahead and do it at your EARLIEST convenience. I'll watch this bug and verify that a Linux update works properly after your changes. If desired, I can also create a new naked profile on Linux and verify that the database pretty quickly fills in and matches my 'main' profile's file, which is up to date and running with 430530 already present.
Comment 38 Marria Nazif 2008-05-03 10:35:50 PDT
fyi, the new server has been running since 5/2 at around 4pm.
Comment 39 Rick Stockton 2008-05-04 16:32:50 PDT
For both me (on Linux) and Windows users, urlclassifier3.sqlite is being updated with no quantifiable difficulties. (That's good, it was disastrous on Linux before 430530 wes done.)

OT: My urlclassifier3.sqlite file is now over 40MB in size. How large is it expected to get ??? That's a lot of raw data for dial-up users, even if we try to send it as carefully as possible.
Comment 40 Mike Beltzner [:beltzner, not reading bugmail] 2008-05-05 03:59:12 PDT
Dave: let me know if we can close this out; not sure how to measure/test it.
Comment 41 Johnathan Nightingale [:johnath] 2008-05-26 06:05:41 PDT
*** Bug 435365 has been marked as a duplicate of this bug. ***
Comment 42 juan becerra [:juanb] 2008-05-26 23:31:07 PDT
It appears that on first launch, fresh profile, the urlclassifier3.sqlite file on Linux (Ubuntu) doesn't get updated, and it stays at 32kb unless you restart the application. See bug #434624.

However, I also tried installing afresh, with livehttpheaders, and restarting and at times I could also see that the urlclassifier file did not get updated after one restart. Right now, on my vm installation, I see a GET key request, but hours later, no further update to file.
Comment 43 Garrett Casto 2008-05-27 11:08:01 PDT
We have been trying to keep track of these issues on the server side, and the numbers unfortunately don't look as good as they should.  Theoretically people should be updating in a few hours (~4 last time I checked).  However it looks like it's actually taking on the order of 20 hours or so.  We are investigating the reasons for this, but we haven't figured it out yet.  Dave asked for QA to help us investigate, but I haven't heard back from them yet.  We should really get these numbers down before launch.
Comment 44 Garrett Casto 2008-05-28 13:38:46 PDT
Juan, to clarify, are you saying that you are not seeing a GET downloads request within an hour after startup?  
Comment 45 juan becerra [:juanb] 2008-05-28 14:44:10 PDT
I'm not getting a GET downloads request after several hours on Linux on a fresh profile, first session. After a restart of the browser (or two), then I start getting data. I'll get some numbers this evening for Linux, but on Mac I observed this (time - malware data % / phishing data %):

30mins - 15% / 18%
1 hour - 22% / 29%
1.5 hour - 31% /  43%
2 hours - 37% /  57%
2.5 hours - 44% /  67%
3 hours - 53% / 80%
3.5 hours (after session resumed) - 52% /  80%
4 hours - 58% / 94%
4.5 hours - 64% /  97%
5 hours - 76% /  98%
5.5 hours -  84% /  99%
6 hours -  92% / 99%
6.5 hours - 100% / 100%
Comment 46 AndrewM 2008-05-28 16:54:38 PDT
(In reply to comment #45)
> I'm not getting a GET downloads request after several hours on Linux on a fresh
> profile, first session. After a restart of the browser (or two), then I start
> getting data. I'll get some numbers this evening for Linux, but on Mac I
> observed this (time - malware data % / phishing data %):
> 
> 30mins - 15% / 18%
> 1 hour - 22% / 29%
> 1.5 hour - 31% /  43%
> 2 hours - 37% /  57%
> 2.5 hours - 44% /  67%
> 3 hours - 53% / 80%
> 3.5 hours (after session resumed) - 52% /  80%
> 4 hours - 58% / 94%
> 4.5 hours - 64% /  97%
> 5 hours - 76% /  98%
> 5.5 hours -  84% /  99%
> 6 hours -  92% / 99%
> 6.5 hours - 100% / 100%

I'm curious, how do you find out how complete the malware and phishing data are at a given point in time?
Comment 47 juan becerra [:juanb] 2008-05-28 17:03:50 PDT
Andrew, you can try installing the extension mentioned here https://bugzilla.mozilla.org/show_bug.cgi?id=429263#c3

Then type about:safebrowsing in the location bar to see some numbers.
Comment 48 Mike Beltzner [:beltzner, not reading bugmail] 2008-06-15 15:22:26 PDT
We're going to need more analysis on this, so I'm keeping it on the branch blocker nomination list.
Comment 49 Mike Beltzner [:beltzner, not reading bugmail] 2008-11-09 12:52:12 PST
Dave: did we ever finish the analysis loop on this? Can we close this out?
Comment 50 Dave Camp (:dcamp) 2008-11-10 12:07:48 PST
Yeah, I believe that we concluded that stuff is happening at roughly the expected rate.
Comment 51 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2008-11-10 15:41:33 PST
There is no patch around. In such cases we mark bugs as WFM.
Comment 52 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2009-07-27 13:23:58 PDT
Based on the age of QAWANTED request on this bug, is QAWANTED still wanted?
Comment 53 Aakash Desai [:aakashd] 2009-07-27 13:42:13 PDT
Judging by comments #50 and #51, we can remove the qawanted status.

Note You need to log in before you can comment on or make changes to this bug.