Closed Bug 992485 Opened 6 years ago Closed 3 years ago

Reduce the size of debug browser-chrome logs, exceeding the maximum log size and thus closing trunk and aurora trees

Categories

(Core :: General, defect, blocker)

x86
Windows 8
defect
Not set
blocker

Tracking

()

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

(Depends on 2 open bugs)

Details

Attachments

(1 file)

On mozilla-inbound, green Win7 browser-chrome logs are sizes like 52413446 bytes.

The maximum log size is 52428800 bytes.

On fx-team, because https://hg.mozilla.org/integration/fx-team/rev/e3564d9cd015 added four tests, we went from one or two of five Win7 debug b-c runs exceeding the maximum size to five of five.

That's it, we're done. If, as is apparently the case with the bug 983948 thing with Linux debug browser-chrome timing out constantly, we're not going to do anything other than wait until the devtools tests get moved out of browser-chrome into their own suite, then we're not going to have any open desktop Firefox trunk trees until that happens. And no, you can't land your not-obviously-browser-chrome related patches because your thing isn't obviously related so you should be allowed to just ignore the problem.
Randomly, https://tbpl.mozilla.org/php/getParsedLog.php?id=37313795&tree=Mozilla-Inbound, a Win8 debug log exceeding the maximum size. Mac and Linux seem to be at a "comfortable" 45-47MB, but then, we have a lot of tests disabled on Mac and Linux.
Why do we have a maximum log size?
Buildbot something something bad things will happen to you if you increase it beyond that.
I'm pretty sure Dustin's actual explanation doesn't have the handwaving, can't remember whether there's an actual hard upper limit, or just the certainty that increasing it will eventually end with buildbot masters going up in flames.
I pulled a random win7 debug log. The most common line was
INFO -  [Parent 4056] WARNING: NS_ENSURE_TRUE(gIMM32Handler) failed: file c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/widget/windows/nsIMM32Handler.cpp, line 254
with 20k, the next most common had 5k occurences, and the rest of the top 10 looked cross platform. Do we need that to be NS_ENSURE_TRUE? Can we just replace it with a quiet early return that doesn't note it in the log?
Flags: needinfo?(masayuki)
Flags: needinfo?(m_kato)
What do you think?
Attachment #8402171 - Flags: review?(masayuki)
Attachment #8402171 - Flags: review?(m_kato)
This line
http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/browser/devtools/netmonitor/test/head.js#l214
is responsible for about 595 dumps that produce about 3465 bytes each for about 2mb total.
Depends on: logspam
Depends on: 984930
Depends on: 819963
That got us about two thirds of the way back to where aurora is, so if we're careful it maybe bought us a couple of weeks.
(In reply to Timothy Nikkel (:tn) from comment #7)
> This line
> http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/browser/devtools/
> netmonitor/test/head.js#l214
> is responsible for about 595 dumps that produce about 3465 bytes each for
> about 2mb total.

Seems to me this output should more appropriately be created under control of a DEBUG_XXX define.  It seems inappropriate for the automated tests.
Severity: blocker → critical
The max log length is configurable (logMaxSize) and the current setting is pretty arbitrary.  The problem is that the version of Buildbot Mozilla uses keeps that log in RAM, so increasing it to an arbitrarily large value may lead to OOMs on buildmasters when lots of builds are blabbering tens of megabytes at the same time.

So: releng could increase this in the short term, but code changes to keep log lengths reasonable are the better long-term solution.  I'd argue that anything outside of try shouldn't be more than, say, 5MB, but what do I know!
Attachment #8402171 - Flags: review?(m_kato) → review+
Flags: needinfo?(m_kato)
Attachment #8402171 - Flags: review?(masayuki)
Flags: needinfo?(masayuki)
Here is one thought:

In the whole log file, majority of the log size is coming from the full path of the test file before each info/is/ok assertion. This can be cleaned up as those full paths do not provide any extra information. So something like :

22:42:27     INFO -  TEST-START | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js
22:42:28     INFO -  console.log: GETDBNAME : file exists : true
22:42:28     INFO -  console.log: opening file c:\users\cltbld~1.t-w\appdata\local\temp\tmpmntor9\storage\persistent\http+++test1.example.org\idb\1031932777i1db.sqlite
22:42:28     INFO -  console.log: opening file c:\users\cltbld~1.t-w\appdata\local\temp\tmpmntor9\storage\persistent\http+++test1.example.org\idb\3686368546i2db.sqlite
22:42:28     INFO -  console.log: returning db names for host http://test1.example.org idb1,idb2
22:42:28     INFO -  console.log: GETDBNAME : file exists : false
22:42:28     INFO -  TEST-PASS | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | cookies storage actor is present
22:42:28     INFO -  TEST-PASS | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | Number of hosts for cookiesmatch
22:42:28     INFO -  TEST-PASS | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | Host test1.example.org is present

...


22:42:29     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | Everything to be received is received.
22:42:29     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | in clear 0
22:42:29     INFO -  TEST-INFO | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | in clear 1
22:42:32     INFO -  TEST-INFO | MEMORY STAT vsize after test: 1333383168
22:42:32     INFO -  TEST-INFO | MEMORY STAT vsizeMaxContiguous after test: 1893662720
22:42:32     INFO -  TEST-INFO | MEMORY STAT residentFast after test: 412942336
22:42:32     INFO -  TEST-INFO | MEMORY STAT heapAllocated after test: 159515792
22:42:32     INFO -  INFO TEST-END | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | finished in 5211ms

can be converted to:

22:42:27     INFO -  TEST-START | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js
22:42:28     INFO -  console.log: GETDBNAME : file exists : true
22:42:28     INFO -  console.log: opening file c:\users\cltbld~1.t-w\appdata\local\temp\tmpmntor9\storage\persistent\http+++test1.example.org\idb\1031932777i1db.sqlite
22:42:28     INFO -  console.log: opening file c:\users\cltbld~1.t-w\appdata\local\temp\tmpmntor9\storage\persistent\http+++test1.example.org\idb\3686368546i2db.sqlite
22:42:28     INFO -  console.log: returning db names for host http://test1.example.org idb1,idb2
22:42:28     INFO -  console.log: GETDBNAME : file exists : false
22:42:28     INFO -  TEST-PASS | cookies storage actor is present
22:42:28     INFO -  TEST-PASS | Number of hosts for cookiesmatch
22:42:28     INFO -  TEST-PASS | Host test1.example.org is present

...


22:42:29     INFO -  TEST-INFO | Everything to be received is received.
22:42:29     INFO -  TEST-INFO | in clear 0
22:42:29     INFO -  TEST-INFO | in clear 1
22:42:32     INFO -  TEST-INFO | MEMORY STAT vsize after test: 1333383168
22:42:32     INFO -  TEST-INFO | MEMORY STAT vsizeMaxContiguous after test: 1893662720
22:42:32     INFO -  TEST-INFO | MEMORY STAT residentFast after test: 412942336
22:42:32     INFO -  TEST-INFO | MEMORY STAT heapAllocated after test: 159515792
22:42:32     INFO -  INFO TEST-END | chrome://mochitests/content/browser/toolkit/devtools/server/tests/browser/browser_storage_dynamic_windows.js | finished in 5211ms


This will potentially reduce the log file size to half, without loss of any meaningful information.
(In reply to Girish Sharma [:Optimizer] from comment #14)
> Here is one thought:
> 
> In the whole log file, majority of the log size is coming from the full path
> of the test file before each info/is/ok assertion. This can be cleaned up as
> those full paths do not provide any extra information. So something like :

Agree. Sometimes tests which fail to clean-up properly end up running listeners and checks during other tests, so it's useful to know which test is generating each output line. But having the test leaf name instead of the full path is probably enough and would still save a lot.
(In reply to :Felipe Gomes from comment #16)
> (In reply to Girish Sharma [:Optimizer] from comment #14)
> > Here is one thought:
> > 
> > In the whole log file, majority of the log size is coming from the full path
> > of the test file before each info/is/ok assertion. This can be cleaned up as
> > those full paths do not provide any extra information. So something like :
> 
> Agree. Sometimes tests which fail to clean-up properly end up running
> listeners and checks during other tests, so it's useful to know which test
> is generating each output line. But having the test leaf name instead of the
> full path is probably enough and would still save a lot.

iianw, the assertion methods is/ok/info etc are not available to the test after it is complete. So the test would simply go on giving out exceptions, which anyways will have the full url of the js file.
Wouldn't that make it more of a pain for sheriffs to figure out which components to file bugs in for test failures?
the test start and test end log still have names.
(In reply to Girish Sharma [:Optimizer] from comment #19)
> the test start and test end log still have names.

Yes, but that requires opening the log, sometimes the full log to find them
This should be an easy fix in the getLogExcerpt.php [0] file which shows the summary of failure when you click on failed run.

[0] https://mxr.mozilla.org/webtools-central/source/tbpl/php/
Oh, convenient, this is still open, I'll borrow it for closure number two. Now it's Win8 debug, starting nearly simultaneously across mozilla-aurora and trunk trees, smelling a great deal like it's the result of one of the network dependencies that we keep letting slip in. We went from around 47.6MB on aurora to (so far) permanently over 52 across the span of a mobile/ push, a gaia push, and a push only touching LinuxNativeApp.js, so not a code change. Now we're running over (or only running a few thousand bytes under) on trunk trees too.

Once again, everything's closed.
Severity: critical → blocker
OS: Windows 7 → Windows 8
Summary: Reduce the size of debug browser-chrome logs, exceeding the maximum log size and thus closing trunk trees → Reduce the size of debug browser-chrome logs, exceeding the maximum log size and thus closing trunk and aurora trees
So this seems to be tests suddenly starting to hit the network. Causing stuff like this:

16:56:24     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/base/content/test/general/browser_aboutHome.js | Console message: [JavaScript Warning: "Error in parsing value for 'unicode-bidi'.  Declaration dropped." {file: "https://www.google.com/search?q=a+search&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a&channel=np&source=hp" line: 11 column: 63 source: "[dir='ltr'],[dir='rtl']{unicode-bidi:-moz-isolate;unicode-bidi:isolate}bdo[dir='ltr'],bdo[dir='rtl']{unicode-bidi:bidi-override;unicode-bidi:-moz-isolate-override;unicode-bidi:isolate-override}"}]


and various other messages that echo all of google's minified CSS, and therefore fill up a lot of space really quickly and effectively. :-|


Was there any change on infra which might explain why previously well-running tests are suddenly hitting the network without any code or test changes in the product they run on?
They've never been well-running (those are tests which perform a Google search, and then count on hitting browser.stop() quickly enough to prevent the search from actually happening), but the only possible infra change I can think of that would cause an increased frequency of not hitting stop() in time would be a very amusing "we made the network faster, so now you have less time to stop a request that you never should have created in the first place."

I don't know what the actual cause for them is, but logs which manage to sneak under the limit seem to have a couple thousand instances of "WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file c:/builds/moz2_slave/m-aurora-w32-d-000000000000000/build/netwerk/base/src/nsFileStreams.cpp" while ones over the limit have more like 19000 of them.
I filed bug 995041 to track fixing and re-enabling browser_aboutHome.js.
The tree has been reopened for quite a while. If we start exceeding the size again, please needinfo me on the bug.
Status: NEW → RESOLVED
Closed: 3 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.