Closed Bug 280603 Opened 20 years ago Closed 19 years ago

"New Updates Avail" popup in bottom right-hand corner pops up endlessly (random occurrence)

Categories

(Toolkit :: Application Update, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: ben, Assigned: mconnor)

References

Details

(Keywords: fixed-aviary1.0.1)

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8b) Gecko/20050122 Firefox/1.0+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8b) Gecko/20050122 Firefox/1.0+

Happened once so far. The "New Updates Available" (green jigsaw) box is popping
up in an infinite loop. It is chewing a fair amount of CPU in the process.
Clicking on the box/window just pops them up faster and increases CPU load.
Four windows open, with 5-20 tabs each. Nothing I can do will get rid of this
window.

Reproducible: Didn't try




Process Explorer notes firefox.exe is in state Wait:WrUserRequest, and
context-switching 300-1000 times a second.
MSVCRT.DLL is also performing a lot of cswitches, cycling between
Wait:UserRequest and Ready.
This is the only easy way to describe what is happening.
Suspected of DoS'ing UMO.  Issue to be determined.
Severity: normal → critical
Version: unspecified → Trunk
update.mozilla.org is currently down, and based on network traffic I highly
suspect it's because of this bug.

We've effectively been under a DDoS attack since exactly midnight GMT on Feb 1.

The following seems to be at fault:

http://lxr.mozilla.org/mozilla/source/toolkit/mozapps/update/src/nsUpdateService.js.in#489

Note the use of getUTCDay (which is day of the week) instead of getUTCDate
(which is day of the month)

This means update checks aren't happening at all after the first week of the
month is over, and can potentially behave REALLY weird during that first week of
the month if the day of the month and the day of the week line up just right.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.0.1?
Version: Trunk → unspecified
much thanks to mconnor for finding the chunk of code where this lived.
Version: unspecified → Trunk
Attached patch use getUTCDate correctly (obsolete) — Splinter Review
Assignee: bugs → mconnor
Status: NEW → ASSIGNED
There may be more to this bug than just the date calculation...

Why did Firefox think there was an update available when there wasn't?
And why does it think there's one available when the server is unreachable (bug
280607)?
Comment on attachment 173049 [details] [diff] [review]
use getUTCDate correctly

who knows, but r=ben@mozilla.org on the patch. I think asa is managing branch
approvals.
Attachment #173049 - Flags: review+
Attachment #173049 - Flags: approval-aviary1.0.1?
*** Bug 280607 has been marked as a duplicate of this bug. ***
(In reply to comment #6)
> Why did Firefox think there was an update available when there wasn't?

Or was there?  The reporter mentioned it was the green jigsaw icon that was
popping up...  that's the extension updates, not the application update, right?
 Extensions and themes can have their own update URLs.
OS: Windows 2000 → All
Hardware: PC → All
This is a little bit of a longshot, but I'll throw it out anyway:

Could this be more fallout (in some way) related to the switch to namespaced expat?
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.1+
Flags: blocking-aviary1.0.1?
Flags: blocking-aviary1.0.1+
Comment on attachment 173049 [details] [diff] [review]
use getUTCDate correctly

a=asa.
Attachment #173049 - Flags: approval-aviary1.0.1? → approval-aviary1.0.1+
278274 is a dupe of this
*** Bug 278274 has been marked as a duplicate of this bug. ***
landed on 1.0.1 branch
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
The trunk still has this problem (Mozilla/5.0 (Windows; U; Windows NT 5.1;
en-US; rv:1.8b) Gecko/20050206 Firefox/1.0+) It would be nice if this patch
would be checked in to the trunk as well. Requesting reopening.
lxr shows that this is fixed on trunk.
I searched bonsai for the checkin, but it's not there in the Seamonkey trunk.
And even if the fix is checked in, it's not working: the bug still appears in
yesterday's build.
*sigh* 

__And even if the fix is checked in, it's not working: the bug still appears in
yesterday's build.__

Or does it work for you?
This is not fixed in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b)
Gecko/20050211 Firefox/1.0+ for me and other testers. Reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
ok, so we're not continously checking anymore, but we're still repeatedly
notifying.  Will investigate further.
*** Bug 282411 has been marked as a duplicate of this bug. ***
test, set machine's clock to right before the problem time period (midnight GMT
of the 1st day of any month, so that would be 4pm PST). eg, 01-Feb-2005 at
15:55. the launch Firefox and see what happens.

please correct me if this test case isn't the right way to verify this bug.
PST would be the day before, 4pm on Jan 31.
Whiteboard: need patch
I have not been able to reproduce this.  If anyone has detailed steps to
increase my odds of seeing this, please post them here. Setting the time isn't
working for me and the few extensions that needed updates did not seem to be
checking for them  automatically.
Attached file Testing extension
Sample extension to test with.

Steps to reproduce:
0. Make a new profile, just in case
1. Install this extension
2. Restart Firefox (to finish the install)
3. Go to about:config and set update.interval to 500
4. Wait half a second for the updates available notification
   (This bug should manifest - the notification will show up
   again right after going away)
5. Reset update.interval

Note that day of month, etc. do not seem to matter - tested 2005-02-17
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050217
Firefox/1.0+

(This extension has a custom update.rdf with a chrome:// URI, so no servers to
worry about)
(In reply to comment #27)
> Steps to reproduce:
> 0. Make a new profile, just in case
> 1. Install this extension
> 2. Restart Firefox (to finish the install)
> 3. Go to about:config and set update.interval to 500

Thinking intuitively, I agree that setting update.interval to such a small value
(from 3600000 to 500) will cause update notifications to fire very rapidly.  But
why is changing the user pref to what I would assume is an obviously insane
value the proper way to reproduce what's supposed to be a legitimate bug?  Is it
the only way to reproduce it?  If so, I'd hesitate to call the problem legitimate.

In conflict with this line of thinking, though, is that the m.o sysadmin group
reported seeing an extraordinary increase in the amount of traffic to UMO during
the first days of the month (beginning at approximately midnight UTC 2/1).

> Note that day of month, etc. do not seem to matter - tested 2005-02-17
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050217
> Firefox/1.0+

Here are some questions for you:

  * When you reset your update.interval to the default value, do you see this 
    problem?
  * When you set your system's date to the first day of the month do you see 
    it?
  * Does this problem go away when you change your system's date to a later day 
    in the month?

We need more data and feedback about system configurations that hold this bug
and what effect it causes, both on the client side and on the server side.  And,
really, we need the data soon!  We're near the end of the line for Firefox 1.0.1
fixes and this one's big on our radar.

The original reporter, beryan, filed this bug at 18:55 1/31 (which was past 2/1
UTC).  To beryan:

  * What was your update.interval set to at that time?
  * What was app.update.interval set to?
  * What are they set to now?
  * Did you have any extensions installed?
      * If so, did any of those legitimately have new versions available then?
  * What was/is your app.version set to in about:config?

We haven't been able to reproduce the endless popup bug locally.  What setting
triggers the popup slider to appear for users?

Also, even with mconnor's patch we see a number of accesses to UMO and we aren't
certain that his patch, while reducing the number of accesses to UMO, cuts those
accesses down to an accessible load level for us.

There are aviary1.0.1 builds available right now.  These can be found in:

  http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-aviary1.0.1/

We'd appreciate it if beryan, Anton, Nickolay, and Mook tested those builds and
let us know if they show the bug for them or not (without changing the update
intervals from their default).  Even if you've tested against trunk builds, it
helps us to know the problem exists on the aviary1.0.1 branch for you still.

Thanks.
Sorry, better steps to reproduce (to force an update check):

1. Set extensions.update.enabled to false (default true)
2. Set extensions.update.enabled to true

Ethereal reports one hit to the server (per extension/theme) only.  I.e., the
problem (the notifier showing up immediately after going away) does not depend
on update requests to the server.  So something is wrong independently of
checking too often.

Interestingly, I can only reproduce this on the trunk - the 1.0.1 branch does
not have the problem with the notifier.  So if everyone else agrees on this
point, at least it won't need to hold 1.0.1 back.

Occurs on: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b)
Gecko/20050217 Firefox/1.0+

Does not occur on: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)
Gecko/20050217 Firefox/1.0
(In reply to comment #29)
> Ethereal reports one hit to the server (per extension/theme) only.  I.e., the
> problem (the notifier showing up immediately after going away) does not depend
> on update requests to the server.  So something is wrong independently of
> checking too often.

Thanks for providing this data point, Mook.  Could you try reproducing this bug
using the build at:

  http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2005-02-03-15-aviary1.0.1/

This build is from before mconnor's patch was committed.  Specifically I'm
interested in hearing if your ethereal trace shows more than just the one access
to the UMO service.
No matter what I do I can't get it to access the update site more than once (per
reset of *.update.enabled).  I had changed it locally to check www.example.com
instead; easier to filter.  I do see one access each time I set/reset
*.update.enabled prefs.

That's with the old build; and yes I did try resetting the clock to Feb 1 23:xx
PST.  It seems to be blocked by *.update.interval (independent of
update.interval, which seems to control how often the decision to check or not
check is made).

Also, the bug (as described in the summary, and as I've been seeing it) does not
occur in 20050203-1.0.1branch either.

(For reading the code - wouldn't the old code just force the app to check at the
first week of the month, but no allow more checks than normal, anyway?  I.e.,
the second time it checks would always be within the first seven days of the
month.  But then again, I know nothing :p)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20050203 Firefox/1.0
Okay, this is a little scary.  The fix in the bug, while correct, will mean that
we'll start seeing MORE traffic to UMO, however the initial spike at the start
of the month should go away .

If you've been using the app, your last updated date will be set to something in
the first week of the previous month.  So with update.interval (the interval at
which Firefox decides whether to check for updates) set to one hour, within an
hour of the new month starting, you will hit the updates URL/URLs because
Firefox thinks its been three weeks.  In reality, it could have been only an
hour ago, if you started using Firefox in the last week of the month.  So as we
tick across timezones into the new month, we compress 24 hours of potential
traffic into an hour, since while in theory we'd be staggered by the 24 hour
interval, it comes up for everyone at the same time (the only thing saving us
here is that not everyone is online at the same time).  Then, fortunately,
things start to decline until after the first week, where most people have an
established last updated date that's late enough in the week that they won't
update again that month, barring a late Saturday session, for example.

There's also the extensions factor, since due to this bug, we'll probably only
update once a month, because of the one week interval for extension update
checking.  However, this is N requests per client, where N is the number of
extensions/themes installed.  So in addition to the theoretical time bomb of
millions of users hitting UMO for app update requests, we also have N requests
on top of that for users with extensions.  Taking an estimate of 3 million users
using an average of 5 extensions/themes per client, that's another 15 million
requests that'll hit the server in that week, and probably most/all in the first
day.
That first spike will get saved as their last update time, so the next week
we'll get hammered by an echo spike.

But none of this explains how the original reporter beryan got his slider flood.
When I experienced this (1.0 final, now using trunk builds), I had lots of tabs
open at the time, so just ignored it for a while. But whilst that notification
was going off, I couldn't change panels within options.
See bug 278016 for UMO being able to receive multiple items in one call
See bug 278014 for Firefox sending a single request instead of multple addon checks.

Please note that this isn't the same as application update checking.  
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050212
Firefox/1.0+ (MOOX M2)
Windows XP

I am able to reproduce this everytime without additional settings on one profile
(but profile is partialy damaged).

Here it is: when I start profile, I get green arrow for updates. If I surf, but
not update, after certain time I get sliding message, as described in this bug.

After a time, this profile became more problematic - now when I start it with
firefox.exe -p, Firefox is locked, so I must close it and start Firefox normaly.
It is probably related to the bug.

I have done more tests on this profile, and I see that this is in connection
with some of the extensions. First, I couldn't start unlock Firefox in safe
mode. Then, when I disabled all extensions, Firefox always starts locked up.

I think I had 4 extension, but all I can remember is this:
Undo Close tab
Text link
Google image (the name could be a bit different - it allows to view images
directly by clicking on thumbs)

One more datum: when I click on arrow for updates, it claims that there are
updates for Undo Close Tab, but it is impossible to update.

Hope some of these explanation can help to find the possible cause of the error
on reporter's computer.
(In reply to comment #36)
> After a time, this profile became more problematic - now when I start it with
> firefox.exe -p, Firefox is locked, so I must close it and start Firefox normaly.
> It is probably related to the bug.

That's true: Once the popup starts sliding in over and over again, you can't
close firefox normally. The window will disappear, but the process will remain
running. There's no way to shut down firefox except for killing it.
*** Bug 282773 has been marked as a duplicate of this bug. ***
My results are the same as Mook's.

The bug as I see it happens with trunk builds both before and after mconnor's
checkin:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050128 Firefox/1.0+
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b) Gecko/20050218 Firefox/1.0+

But not with 1.0 branch builds:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20050218 Firefox/1.0
(from
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-aviary1.0.1/firefox-1.0.en-US.win32.installer.exe)
(Although I did have some issues with that popup, those may be specific to my
system)

I was trying to reproduce by toggling extensions.update.autoUpdateEnabled (and
later just extensions.update.enabled) on a clean profile with the testcase
extension installed.

This bug is reproducible, no matter what date is set.

I haven't yet tried to reproduce this by setting date, so I can't verify that
the original bug was fixed.
Attached patch Possible patch (obsolete) — Splinter Review
(I'm working off the summer here, not the download spike to UMO)

Fallout from bug 267089

nsIAlertsService was changed to use nsIObserver, which means that the
nsUpdateObserver was getting alertfinished / alertclickcallback; but it assumed
that it was only observing the update stuff, and proceeded to show the alert
again.
Attachment #174781 - Flags: review?(mconnor)
(In reply to comment #32)

To clarify from mconnor here, the app looks to see if it needs to update again
or not once an hour.  During the first week of the month, it was using the day
of the week instead of the day of the month as "now".  The days of the week in
this function are numbered from 0 to 6 with 0 being Sunday.  February 1st fell
on a Tuesday.  The value for Tuesday is 2.  So when it would do the date check,
it would grab the last time it did a date check "Feb 1 at midnight" and compare
it to it's fake version of "now" using the day of the week, so it thinks "now"
is "Feb 2 at 1 AM", and says "oh, more than a day has passed since the last
update" and it does another one.  So as long as that person was online, and as
long as the day of the week value was more than the day of the month value, they
were hitting us once an hour.

As for the capacity of the UMO server, don't worry about March.  We have more
than enough capacity in place now to handle a spike four times the size of the
one that hit us in February.

I'm also suspecting that this bug, as reported, is actually a separate issue,
and the timing of it being filed and the parity of symptoms between the client
behavior and server behavior caused us to errantly hijack this bug for the
day-of-the-week issue when it probably wasn't related.
Comment on attachment 174781 [details] [diff] [review]
Possible patch

woo, I suck!  thanks for cleaning up after my most excellent reviewage.
Attachment #174781 - Flags: review?(mconnor) → review+
Is this patch something that would fix a problem that occurs on the aviary
branch, or just on the trunk?  Is it something we want to consider for Firefox
1.0.1?
Whiteboard: need patch
OK, I see now.  Bug 267089 landed on the trunk, but didn't update the one
implementation of nsIAlertListener in JS -- but the interface change was such
that the JS code still worked, but called the observe method instead of the
methods that were implementing nsIAlertListener, and the observe method was not
set up to handle this (since it had no default case in the switch, which it
probably should have -- with a dump and return -- like many observers have
assertions in C++ implementations).  So this patch is not relevant to the aviary
1.0(.1) branches.
Comment on attachment 174781 [details] [diff] [review]
Possible patch

As I said in my previous comment, it would probably be good if the switch had a
default case that does whatever the JS equivalent of an assertion is (probably
dump and return or throw).  Being a little more defensive in methods like this
is a good thing (although in C++ we have the ability to do it without any
runtime cost in non-DEBUG builds).  This is why assertions are good and we try
to write a lot of them to document and enforce expectations.
(That said, if you write such a default case, you need to ensure that there
aren't any other topics that *are* expected.)
Comment on attachment 174781 [details] [diff] [review]
Possible patch

Asking for SR

(If this gets SR+, please check in for me; I don't have CVS access)

dbaron: This was never landed on the branch, so it's not applicable.  The
blocking+ is for the download spike problem (which is independent).
Attachment #174781 - Flags: superreview?(bugs)
Comment on attachment 174781 [details] [diff] [review]
Possible patch

SR isn't needed for toolkit.  I'll land this with dbaron's suggestion.
Attachment #174781 - Flags: superreview?(bugs)
*** Bug 283179 has been marked as a duplicate of this bug. ***
Attached patch patch checked inSplinter Review
Attachment #173049 - Attachment is obsolete: true
Attachment #174781 - Attachment is obsolete: true
previous patch includes a fix for bug 282752, somewhat related and replacing the
initial patch (attachment 173049 [details] [diff] [review]) with a much faster call.  Landed only on
trunk, the initial patch will do for the 1.0.1 branch.
Status: REOPENED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: