Closed Bug 1006601 Opened 11 years ago Closed 11 years ago

Size of HTTP logs for update tests are exploding since SPDY data is contained

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

Attachments

(1 obsolete file)

Today we noticed that our Mozmill-CI master was nearly filled up with data. As it turned out all the update tests had a massive increase of the HTTP log size. Form formerly about 10MB it went up to 140MB and more. A check into one of those logs revealed that it may be related to safebrowsing:

2014-05-06 14:41:15.112177 UTC - 173375488[1003349f0]: ReportSpdyConnection safebrowsing.google.com  ent=10dedfd30 preferred=0
2014-05-06 14:41:15.112190 UTC - 173375488[1003349f0]: STS dispatch [10e2580c0]
2014-05-06 14:41:15.112203 UTC - 173375488[1003349f0]:   writing transaction request stream
2014-05-06 14:41:15.112211 UTC - 173375488[1003349f0]: SpdySession31::ReadSegments 10fea5800
[..]
2014-05-06 14:41:15.112456 UTC - 173375488[1003349f0]: 00000000: 80 03 00 01 00 00 02 41 00 00 00 01 00 00 00 00 
2014-05-06 14:41:15.112463 UTC - 173375488[1003349f0]: 00000010: 60 00 78 3F E3 C6 A7 C2 00 27 02 D8 FD 00 00 00 
2014-05-06 14:41:15.112473 UTC - 173375488[1003349f0]: 00000020: 0C 00 00 00 07 3A 6D 65 74 68 6F 64 00 00 00 04 
2014-05-06 14:41:15.112492 UTC - 173375488[1003349f0]: 00000030: 50 4F 53 54 00 00 00 05 3A 70 61 74 68 00 00 00 
2014-05-06 14:41:15.112500 UTC - 173375488[1003349f0]: 00000040: 75 2F 73 61 66 65 62 72 6F 77 73 69 6E 67 2F 64 
2014-05-06 14:41:15.112506 UTC - 173375488[1003349f0]: 00000050: 6F 77 6E 6C 6F 61 64 73 3F 63 6C 69 65 6E 74 3D 
2014-05-06 14:41:15.112512 UTC - 173375488[1003349f0]: 00000060: 6E 61 76 63 6C 69 65 6E 74 2D 61 75 74 6F 2D 66 
2014-05-06 14:41:15.112518 UTC - 173375488[1003349f0]: 00000070: 66 6F 78 26 61 70 70 76 65 72 3D 33 30 2E 30 61 
2014-05-06 14:41:15.112524 UTC - 173375488[1003349f0]: 00000080: 31 26 70 76 65 72 3D 32 2E 32 26 6B 65 79 3D 41 
2014-05-06 14:41:15.112530 UTC - 173375488[1003349f0]: 00000090: 49 7A 61 53 79 44 2D 73 2D 6D 58 4C 34 6D 42 7A 
2014-05-06 14:41:15.112536 UTC - 173375488[1003349f0]: 000000A0: 46 37 4B 4D 52 6B 68 54 43 49 62 47 32 52 4B 6E 

I talked with Gian-Carlo on IRC and he second this feeling. The regressor in our tests is bug 997872.

So what we will have to do is to disable safebrowsing for update tests. The following two preferences would have be set to false:

browser.safebrowsing.enabled
browser.safebrowsing.malware.enabled
Attachment #8418108 - Flags: review?(dave.hunt)
Comment on attachment 8418108 [details] [diff] [review]
disable safebrowsing for update tests v1

Gian-Carlo, would be great if you could do a sanity check if those prefs are enough or if something is missing. Thanks.
Attachment #8418108 - Flags: feedback?(gpascutto)
Comment on attachment 8418108 [details] [diff] [review]
disable safebrowsing for update tests v1

Review of attachment 8418108 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8418108 - Flags: feedback?(gpascutto) → feedback+
Hm, it still doesn't seem to work. Maybe something else is interfering here? I still get about 100MB with the patch applied. So maybe something else is causing it?

wo/ the patch:
-rw-rw-r--  1 henrik henrik  46014764 May  6 18:21 2014-05-04-03-02-05-mozilla-central-firefox-32.0a1.en-US.linux-x86_64.tar.bz2
-rw-rw-r--  1 henrik henrik 134531652 May  6 18:24 http.log

w/ the patch:
-rw-rw-r--  1 henrik henrik  46014764 May  6 18:21 2014-05-04-03-02-05-mozilla-central-firefox-32.0a1.en-US.linux-x86_64.tar.bz2
-rw-rw-r--  1 henrik henrik 100298860 May  6 18:30 http.log
Comment on attachment 8418108 [details] [diff] [review]
disable safebrowsing for update tests v1

Review of attachment 8418108 [details] [diff] [review]:
-----------------------------------------------------------------

I will have to look tomorrow for that. Too late for today.
Attachment #8418108 - Flags: review?(dave.hunt)
The strange thing is that this problem only happens for aurora, nightly, and current releases. I still see small logs for esr24 updates, whether if those are nightlies or releases. I'm currently working with releng to figure out what's wrong here.

Steps:
1. Grab the latest env fro your platform: https://mozqa.com/mozmill-env/
2. Unzip and execute "source mozmill-env/run"
3. export NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsHostResolver:5,append
   export NSPR_LOG_FILE=http.log
4. Run "mozdownload -t daily --date 2014-05-05" to download a nightly from 05/05
5. Run "testrun_update --no-fallback 2014-05-05-03-02-02-mozilla-central-firefox-32.0a1.en-US.mac.dmg"

Afterward simply check the http.log in the same folder.
Looking at this now
As nthomas mentioned to me on IRC there are a lot of SPDY and STS entries in the HTTP log now. All those features have been added after Firefox 24. So we wonder if that could be the reason.

An example log file can be found here: https://www.dropbox.com/s/3mznqzckt3k2bnq/http.zip

Patrick, could you please have a look at it? Do you see why it might be that large nowadays?
Flags: needinfo?(mcmanus)
that's spdy data all right.
Flags: needinfo?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #9)
> that's spdy data all right.

So how does it come that we see an increase of the HTTP log by 10x since SPDY is enabled? Are all those network connections necessary? If that is the case and nothing can be changed, do you know how to reduce logging so we don't get all this data?
Flags: needinfo?(mcmanus)
it logs the transfer data - the logs optimize for debuggability.

you experiment with changes your log levels if you like.
Flags: needinfo?(mcmanus)
So given that we want to catch transmission failures when downloading updates for Firefox, which log level would you think should be best for us? Is there documentation which information goes into which level? I cannot really find that.
Flags: needinfo?(mcmanus)
personally I think you're expecting too much from nspr logging. Its there to help you debug a particular case.

log levels are not defined in one place, at least for necko, nor are they particularly consistently applied. The SPDY IO is logged at level 4. (so 3 or lower in the http category would not log it.)
Flags: needinfo?(mcmanus)
So we have HTTP logging enabled because we had a lot of update failures in the past, and that was a request from RelEng to being able to investigate what went wrong. Given that we haven't had those problems for a long time we might indeed be able to turn this off completely or lower the level.

My proposal would be:

1. Lower the level to 3 as Patrick proposed, which should reduce the size drastically and we are still able to see the requests and response headers.

2. Turn off HTTP logging for daily builds (Nightly, Aurora, ESR24) given that we only serve updates via the FTP server. For release updates we will keep those enabled, so we could identify broken CDN servers.

Ben, given that we were working on this in the past, what do you think? Would that be fine for you?
Flags: needinfo?(bhearsum)
(In reply to Henrik Skupin (:whimboo) from comment #14)
> So we have HTTP logging enabled because we had a lot of update failures in
> the past, and that was a request from RelEng to being able to investigate
> what went wrong. Given that we haven't had those problems for a long time we
> might indeed be able to turn this off completely or lower the level.
> 
> My proposal would be:
> 
> 1. Lower the level to 3 as Patrick proposed, which should reduce the size
> drastically and we are still able to see the requests and response headers.
> 
> 2. Turn off HTTP logging for daily builds (Nightly, Aurora, ESR24) given
> that we only serve updates via the FTP server. For release updates we will
> keep those enabled, so we could identify broken CDN servers.
> 
> Ben, given that we were working on this in the past, what do you think?
> Would that be fine for you?

If at all possible, I would strongly advise keeping update logs enabled everywhere. Is it often extremely difficult to diagnose update issues without them, especially those that can't be reproduced outside of automation. Perhaps you could grep for the update stuff and toss away the rest?
Flags: needinfo?(bhearsum)
Ok, so Ben and I talked on IRC and we came to the following agreement:

* We can reduce the clutter in the HTTP log to only list headers for requests and responses. So level 3 should be totally sufficient.
* We should be able to also remove nsSocketTransport:5 because we are not interested in that low-level information
* We keep HTTP logging enabled for ALL update jobs to get details for mirror servers especially for the CDN network

I will run some tests with that and will file an mozmill-ci issue on github to get this implemented.
Comment on attachment 8418108 [details] [diff] [review]
disable safebrowsing for update tests v1

Making patch obsolete given that it might not be needed anymore.
Attachment #8418108 - Attachment is obsolete: true
Patrick, with nsHTTP:5 I still see a lot of those entries. Are those actually failures, where a bug has to be logged?

2014-05-07 20:58:23.079905 UTC - 497022720[7fcf32f22920]: SpdySession31 7fcf04f0f400 buffering frame header read failure 80470007
Flags: needinfo?(mcmanus)
that's WOULDBLOCK - which is to be expected. It would have to be debugged if it was a different value.
Flags: needinfo?(mcmanus)
So it looks like we have to live with all that clutter then. There is no lower value to use for nsHTTP, where we could get rid of it.
Talked to Patrick and that is our wanted entry:

> NSPR_LOG_MODULES=timestamp,nsHttp:3,nsHostResolver:5,append

So I filed issue: https://github.com/mozilla/mozmill-ci/issues/435
Mozmill CI has been updated for the above mentioned changes. The code has been pushed to staging and production, and both systems are looking fine now. HTTP log file sizes are around 2-5MB now.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Summary: Size of HTTP logs for update tests are exploding since we immediately download safebrowsing data → Size of HTTP logs for update tests are exploding since SPDY data is contained
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: