Closed Bug 464778 Opened 16 years ago Closed 12 years ago

Improve bouncer logging

Categories

(Webtools :: Bouncer, defect)

x86
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: oremj, Assigned: morgamic)

References

Details

Right now bouncer runs an update against the database each time a download is initiated.  During releases we end up turning LOGGING off, because it overloads the database.  I'm not sure what the best way to improve logging is, but my ideas are:

* parse logs and bulk insert counts ( would lose mirror counts )
* insert counts in to memcache and flush them out every x seconds e.g.:
    # This algorithm doesn't address race conditions, so would need to figure out how to handle the case of two processes seeing the expire time and updating the database twice.
    if foo_count in memcache:
      if memcache[foo_count_expire] - now > INSERT_TIME:
        update database with foo= foo + memcache[foo_count]
        memcache[foo_count] = 0
      memcache[foo_count]++
* make a call to a non-blocking external API that logs the requests
The next release is mid-December, so we'll have to hash this out before then.

For option 1, it would be a better long-term solution because we wouldn't have make a code change in the app.

To avoid losing mirror counts would it be possible to update download.mozilla.org logging so that the 302 destination as logged as well?  It would then be possible to backfill mirror counts as well.

Questions:
1) is it possible to add the 302 destination to the DMO logs
2) what download stats do we already have available from metrics work
Download stats: https://stats.mozilla.com/dashboard/ff_downloads.html

We can record headers in the apache logs, so we can either just record the "Location" header or create a new header.

Ex using location header: 
<?php header("Location: http://google.com/"); ?>

LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Location}o\""  custom

Ex new header:
<?php header("X-Bouncer-Redirect: http://www.foofoo.com"); ?>

LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{X-Bouncer-Redirect}o\""  custom
*nudge* - a smidge over two weeks until the next set of releases is due.
Anyone know where this is?  Comments seem to suggest "we need this right now before the next release"  but we've had several since.
aiui, this doesnt block us from doing a release; it just means that we can never tell how many people download in the first initial days of a release because logging is turned off to keep load down. Fixing this would still be good. 

(If anyone has more details, and I missed anything, please add comments!)
Blocks: 489905
Is there enough in here for Metrics to take over processing and inserts into the db?  Or is there some webdev effort needed?
We did something similar with AMO - if we could get a feed or cron to update numbers here from the metrics db, it would solve the issue.  We could then refactor scripts for any feeds to point to the new table(s).
Who can drive this?  Metrics or webdev?
(In reply to comment #7)
> We did something similar with AMO - if we could get a feed or cron to update
> numbers here from the metrics db, it would solve the issue.  We could then
> refactor scripts for any feeds to point to the new table(s).

Is it easier for Metrics to insert/update the bouncer db directly so no dependent scripts/processes have to change?  

The biggest change would be that instead of near instant download updates, it's be updated once an hour.

Mike, can you sign off on that and move this bug over to Metrics?
Once an hour should be acceptable imo.  I'd suggest setting up a new table and we can create a simple feed that uses it and outputs the data consistent with how it's currently done.
App changes will need to be made to maintain the mirror data.  If we don't care about that we can go with the new table approach.
Wouldn't we just lose this data altogether?  I don't think metrics has this granularity since they don't have mirror logs.  Is the 302 destination recorded in the apache log?
I just mean mirror counts within the app.  If we modified the app we could stick the mirror in the access logs (see comment 2).
I have reported problems with the download counter several times, 3 of which turned out to be caused by the bouncer being turned off, so I am very interested in seeing this problem resolved. Ideally I would like to be able to see the increased downloads during the upcoming release of Fx 3.5 show in the download counter numbers. It's an awesome marketing tool to show how popular Firefox really is.

But what's really important to me and probably other users of the download counter feed, is that is stays as close to real time as possible. I realise this might be hard to achieve, and personally I could live with the suggested hourly update, but it really shouldn't be less often than that. Every 15 minutes would be better. Would that be possible?
We have a release again and logging is off once again. That means I can throw away all my SeaMonkey download counting for the current beta as we miss everything happening now in any case, and as bouncer is the only data we get at all, we don't know anything any more about what uptake we have of this release.

Can't anything be done to move forward on at least back-feeding data from webserver logs into the DB? Then we'd get at least something relatively consistent (even though we know there are different ways to download builds, and a number go via channel not caught by bouncer, but that's at least something and it's comparable between different releases).
A solution to this would be to host a separate bouncer instance for Seamonkey or to write a patch that doesn't skip download counts for non-Firefox products (basically ignoring the config).

Are we not doing anything w/ the metrics data feed?
Status: NEW → ASSIGNED
In whatever way we solve it, our yurrent download numbers are little more than lies, e.g. we released a new SeaMonkey Beta more than a week ago and it has no download yet as logging has been disabled all that time - I guess as usual, nobody thought to re-enable it in a timely manner.
If I'd put up a random number generator, it's numbers would probably be about as useful as the current bouncer download numbers.
Actually, if I had access to the httpd logs, I probably could filter them myself for SeaMonkey downloads and get some stats, but for now, we have no clue at all on SeaMonkey metrics - the download feed is the only metric whatsoever I have access to right now, and that's _very_ unsatisfactory.
Is there any chance of getting something that is somewhat reliable, hopefully in a timely manner?
This continues to bite us on every release.  Now that we're not doing the mirrorbrain thing, we need to get this back on the radar.  This is probably the number one complaint about bouncer currently (yes, it's hurting us more than the lack of geodns is).
Severity: normal → critical
We shipped FF3.6b2 yesterday. After release, we discovered a problem with WinCE FF3.6b2 builds. Because bouncer logging was turned off, we have no idea how many WinCE users (if any) have upgraded to the broken FF3.6b2 users. 

We also discovered that none of last week's FF3.5.5 release has yet been tracked, as bouncer logging was still off for the week.


morgamic/mrz: 
a) in our current configuration, what is the *shortest* time that this bouncer-off is still needed? We can make turning this off, and turning it *on* a part of the release process.

b) would the suggestion in comment#13 be best way to fix this?
> morgamic/mrz: 
> a) in our current configuration, what is the *shortest* time that this
> bouncer-off is still needed? We can make turning this off, and turning it *on*
> a part of the release process.

Usually the morning after a Firefox release when we see the "hump" of traffic.
Metrics processes the dmo access logs on an hourly basis with a three hour lag.  Once we have processed an hour, we can provide the number of download requests per minute fairly easily.  We could even push those in to the bouncer DB although that would take more work so we'd have to schedule out time to do it.

Going back in time, after the current day UTC, we only have download information aggregated up to the daily level.  We wouldn't be able to tell you how many downloads there were for a particular hour on a date in the past.

There is no information in the access logs regarding the mirror the user was directed to so we can't provide that information.  Even if you were to change the log file format, it would take us time to accommodate that change and it would be a non-trivial amount of development work because we'd have to change our data warehouse to store it.

As far as just having an asynchronous DB to store this information on a real time basis, we should look into possibly using MongoDB for this.  It would be easy to integrate in to the php code and you could either report directly from it or you could have a collector that queries it and dumps into the existing bouncer db.
I was just wondering, when the bouncer logging will be enabled again? Or are we waiting for the final solution of this bug until it gets re-enabled?
Bouncer logging is now off for 16 days FWIW.
justdave - we're supposed to have nagios monitoring to remind us when logging is off for greater than some interval.  Is that check broken?
FWIW it's now been 20 days that bouncer download logging has been off. 

I've filed bug 530751 to get it re-enabled, since the slight push into the right direction that I intended to make with comment 23 obviously didn't work ;-)
(In reply to comment #22)
> I was just wondering, when the bouncer logging will be enabled again? Or are we
> waiting for the final solution of this bug until it gets re-enabled?

It was re-enabled on Nov 11th at 7:39am pacific.  It's been on since then.  If you're not seeing numbers advancing, then either there haven't been downloads on whatever you're watching, or something else is broken.
(In reply to comment #26)
> It was re-enabled on Nov 11th at 7:39am pacific. It's been on since then. 
> If you're not seeing numbers advancing, then either there haven't been 
> downloads on whatever you're watching, or something else is broken.

All I can say is that http://download.mozilla.org/rss/download-counts.php 
shows no download numbers at all and therefore no longer serves its purpose 
for me.

It became that way when bouncer logging was disabled on Nov 4th. Since it 
hasn't changed since then, I assumed that bouncer logging was still off.
Is something else broken then?
> It was re-enabled on Nov 11th at 7:39am pacific.

I have a script that reads the download counter feed every 5 minutes.
feed: http://feeds.spreadfirefox.com/downloads/firefox

It's been reporting me this for the last 20 days:

WARNING: New feed contains older, same or no data!
Old: Timestamp: 1257373801 Counter: 1123352262
New: Timestamp: 1259100541 Counter: 1123352262
Ignoring this update.

The same script also tweets each time it finds a new update:
http://twitter.com/FirefoxCounter
You can see the last tweet is from:
1,123,352,262 downloads at 19.4 per second on 2009-11-04 22:30 UTC

So I don't know what was re-enabled, but it certainly isn't giving us any new data.
OK, I've reopened bug 530751 changing it to specifically be that the RSS feed is broken.  There were other changes made around that time unrelated to the bouncer logging that could have affected it.
Simon, ugh, we weren't supposed to publish the URL you put into comment #27 publicly as it directly reads from the bouncer DB on every reload AFAIK and therefore lots of requests to it will slow bouncer down!
The one on comment #28 for example is being cached and therefore doesn't put heavy load on bouncer itself.


(In reply to comment #21)
>  Once we have processed an hour, we can provide the number of download requests
> per minute fairly easily.  We could even push those in to the bouncer DB
> although that would take more work so we'd have to schedule out time to do it.

I think that would be helpful to anyone processing the total download number stuff reported in a few places - even if that update was only done daily. Your numbers are much better than what the bouncer DB has right now.

> There is no information in the access logs regarding the mirror the user was
> directed to so we can't provide that information.

I didn't even know that bouncer recorded that ;-)
In any case, the total downloads per entry (product-plus-type/version/os/locale) are that thing most of us who process some numbers off bouncer really want and would love to be somewhat meaningful - contrary to the random numbers bouncer has in the DB right now.
(In reply to comment #30)
> Simon, ugh, we weren't supposed to publish the URL you put into comment #27
> publicly as it directly reads from the bouncer DB on every reload AFAIK and
> therefore lots of requests to it will slow bouncer down!

... which bug 530751 should have fixed, yay.
(In reply to comment #31)
> ... which bug 530751 should have fixed, yay.

well, not completely, but at least it's using the shadow database now.
How soon can we do comment #7?  

In two weeks I want to disable bouncer logging.  Who/what will be adversely impacted?

[I expect to bring a new instance of bouncer up in Phoenix, one which in no way depends on San Jose for database writes.  In fact, I want the Phoenix bouncer instance to run as an island.]

[Yes, an island with only sentry updates.]
(In reply to comment #33)
> How soon can we do comment #7?  
> 
> In two weeks I want to disable bouncer logging.  Who/what will be adversely
> impacted?

Anything that wants live download counters, I guess, as even the comment #7 solution doesn't give us that - and anything using the current bouncer data feed, which is the only semi-public source of any metrics, as the other data is only accessible to people with MoCo LDAP accounts.
It's the -live- counters that are hard/impossible to scale.  Delayed counters are easy and making download counts or other metrics to non-Firefox projects is easier.

Could you live with counter data that's an hour behind real time?  Anyone know who depends on live data?
I'd agree - up to the hour results are fine, don't need to architect for up to the minute.
I think an hour behind real time should be fine, esp. if we can rebuild what the current bouncer data feed delivers with metrics data, as that would sync every consumer back to data that is actually cleaned from the holes we had with the disabling of internal logging. Would rebuilding that feed with an-hour-behind metrics data be possible?
See comments 11 through 13. Do we care about losing mirror counts?
(In reply to comment #35)
> It's the -live- counters that are hard/impossible to scale. Delayed counters 
> are easy and making download counts or other metrics to non-Firefox projects 
> is easier.
> 
> Could you live with counter data that's an hour behind real time?

I can certainly live with hourly data updates. I don't need real-time data.

(In reply to comment #38)
> See comments 11 through 13. Do we care about losing mirror counts?

I don't fully understand the impact of losing the mirror counts. Would losing 
them mean, that not all downloads are accurately counted? If that's the case 
then I would certainly care about the mirror counts.
Am I correct in that bouncer logging is off again and hasn't been turned on since Firefox 3.5.8? Could we have it back on and the stats backfilled again?
Daniel: What's the status of this?
Looks like we no longer use the LOGGING feature.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.