Closed
Bug 379484
Opened 17 years ago
Closed 17 years ago
move leak box off Firefox page to MozillaTest until it can stay green
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: sayrer, Assigned: rhelmer)
References
Details
Attachments
(4 files)
Reporter | ||
Comment 1•17 years ago
|
||
I should add: is it orange because shipping Firefox code is buggy? (if so, it should stay up on the front page)
We need leak numbers on the main tinderbox. If this tinderbox is unacceptable, we need to close the tree until we have one that is.
That said, somebody should debug why it's actually orange. Is it because multiple VMs are running on the same machine causing the time to run out? Is it because Firefox is hanging? If so, where is it hanging? Somebody with access to the machine should investigate.
Some other possible solutions: bug 374822, bug 376874.
Reporter | ||
Comment 4•17 years ago
|
||
(In reply to comment #2)
> We need leak numbers on the main tinderbox. If this tinderbox is unacceptable,
> we need to close the tree until we have one that is.
>
Let's not play chicken. As of now, we don't consistently have leak numbers on the main tinderbox because the box we have now is orange half the time.
We have perfectly usable leak numbers -- it's just a long cycle time sometimes.
Reporter | ||
Comment 6•17 years ago
|
||
(In reply to comment #5)
> We have perfectly usable leak numbers -- it's just a long cycle time sometimes.
>
We would still have perfectly usable leak numbers if the 50%-orange box reported to MozillaTest, and we wouldn't have an orange box on the front page at all times.
Assignee | ||
Updated•17 years ago
|
Assignee: nobody → rhelmer
Assignee | ||
Updated•17 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•17 years ago
|
||
(In reply to comment #3)
> That said, somebody should debug why it's actually orange. Is it because
> multiple VMs are running on the same machine causing the time to run out? Is
> it because Firefox is hanging? If so, where is it hanging? Somebody with
> access to the machine should investigate.
I'm going to stop tinderbox when the current cycle is complete, and see if I can reproduce the problem.
> Some other possible solutions: bug 374822, bug 376874.
It's a single-CPU machine (it's a VM so more correctly it only has one CPU assigned). Does that rule out bug 376874?
Assignee | ||
Comment 8•17 years ago
|
||
Hm, there are old firefox-bin processes hanging around on this machine. Not doing anything in particular according to strace:
futex(0x9a1e734, FUTEX_WAIT, 2, NULL
Not much memory free, which could certainly be slowing things down.
Reporter | ||
Comment 9•17 years ago
|
||
mmm, borrow pskill.exe from robcee?
Assignee | ||
Comment 10•17 years ago
|
||
(In reply to comment #9)
> mmm, borrow pskill.exe from robcee?
>
This is tinderbox client on Linux. It's supposed to do this on timeout:
http://mxr.mozilla.org/mozilla/source/tools/tinderbox/build-seamonkey-util.pl#1571
I'm going to run several test-only cycles and see if I can get a hang like this.
Reporter | ||
Comment 11•17 years ago
|
||
(In reply to comment #10)
> (In reply to comment #9)
> > mmm, borrow pskill.exe from robcee?
> >
>
> This is tinderbox client on Linux.
oh right, duh.
Assignee | ||
Comment 12•17 years ago
|
||
Hm. I am running the tests via tinderbox (so the profile creation/pref settings should all be ok), and every time so far the bloatcycle.html never closes after completing all tests. Made sure that browser.dom.window.dump.enabled was set correctly in prefs.js, I'm going to put some debug statements in the test to see what's up.
Assignee | ||
Comment 13•17 years ago
|
||
Have now seen a couple runs where the above does not happen, but when running with --trace-malloc, firefox-bin invokes another copy of firefox-bin (made sure NO_EM_RESTART=1). This has happened every time I've tried, so far (reduced it down to just --trace-malloc).
Assignee | ||
Comment 14•17 years ago
|
||
Actually it looks like after a while the child process went away, and the test completed.. will check the log.
Assignee | ||
Updated•17 years ago
|
Attachment #263529 -
Attachment mime type: text/x-log → text/plain
Assignee | ||
Comment 15•17 years ago
|
||
Looks like the problem in comment #12 happens about half the time, it looks like the firefox-bin being invoked twice is a different problem.
ajschult helped track that down to mozilla/security/nss/lib/freebl/unix_rand.c:1018 which is trying to fork and run netstat. I've got a stack trace of the child process, attaching.
Assignee | ||
Comment 16•17 years ago
|
||
Comment 17•17 years ago
|
||
I asked rhelmer for some locals from that stack:
frame 7
<rhelmer> p bp is $1 = (void **) 0xbff04668
<rhelmer> p *bp is $2 = (void *) 0x0
frame 6
>p depth
<rhelmer> $3 = 1
> p bp
<rhelmer> $4 = (void **) 0x8b8eec0
> p bpdown
<rhelmer> $5 = (void **) 0x7374656e
> p bpdown[0]
<rhelmer> Cannot access memory at address 0x7374656e
So as far as I can tell, we call |calltree| with a pointer to null. Then the execution works as follows, I think:
First time through the loop:
1047 bpdown = (void**)(0)
1048 (*0xbff04668) = NULL
1049 Test false because RHS is 0.
1051 bpup = 0xbff04668
1052 bp = 0
Second time through the loop:
1047 bpdown = (void**)(*(void**)0) (whatever that is!)
1048 (*0) = 0xbff04668 (not sure why this works)
1049 Test crashes because bpdown is some random pointer
So it seems like the "simple" thing to do would be to either have calltree bail out if its arg points to null or have backtrace() not call calltree with such an arg.
Now the question is why we have such an arg in the first place... dbaron, any ideas?
Assignee | ||
Comment 18•17 years ago
|
||
Stacktrace from bloatcycle.html hanging. This test is checked in here:
http://mxr.mozilla.org/mozilla/source/build/bloatcycle.html
Command line was:
firefox-bin -P default resource:///res/bloatcycle.html
Assignee | ||
Comment 19•17 years ago
|
||
bz and ajschultz thought this might help. Also did a little cleanup.
I think a better solution here would be to use e.g. an onload handler to determine when the pages are done loading instead of a timeout, and to do something more like http://test.bclary.com/bin/quit.js to quit instead of "window.close()" on the parent window.
However, those will require more prefs to be set and more testing, so let's see if this patch helps for right now..
Attachment #263550 -
Flags: review?
Assignee | ||
Updated•17 years ago
|
Attachment #263550 -
Flags: review? → review?(bzbarsky)
Updated•17 years ago
|
Attachment #263550 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 20•17 years ago
|
||
Landed:
Checking in bloatcycle.html;
/cvsroot/mozilla/build/bloatcycle.html,v <-- bloatcycle.html
new revision: 1.3; previous revision: 1.2
done
Assignee | ||
Comment 21•17 years ago
|
||
Looks like we've stopped the bleeding at least. I am reassigning this to back general build alias for now. I'll file a separate bug on improvements to bloatcycle.html that I suggested, as I'd like to make the same kind of change for a lot of our tests and it could be done more generically.
Leaving the bug open to deal with comment #17.
Assignee: rhelmer → build
Status: ASSIGNED → NEW
Assignee | ||
Updated•17 years ago
|
Assignee: build → nobody
Boris, any chance you could file a separate bug on comment 17 with a little more context? (And then we can resolve this one.)
Comment 23•17 years ago
|
||
Filed bug 417872.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Assignee: nobody → robert
Component: Testing → Release Engineering
Product: Core → mozilla.org
QA Contact: testing → release
Version: Trunk → other
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•