Frequent hangs in dromaeo_sunspider during sunspider-access-nsieve.html (after sunspider-access-nbody.html)

RESOLVED DUPLICATE of bug 617505

Status

()

--
critical
RESOLVED DUPLICATE of bug 617505
8 years ago
6 years ago

People

(Reporter: philor, Assigned: gwagner)

Tracking

({intermittent-failure, regression})

Trunk
intermittent-failure, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

Attachments

(1 attachment)

(Reporter)

Description

8 years ago
I don't have any feeling for how this could be, since it feels like it's been going on since long before numerous merges to mozilla-central, but on TraceMonkey dromaeo_sunspider hangs quite often (disguising itself as a crash that's really from the harness crashing it in an attempt to show where it hung, on Linux), and on mozilla-central it does not.

From the last 12 hours on TM:

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288840835.1288844159.12858.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/03 20:20:35 
s: talos-r3-fed-053

Running test dromaeo_sunspider: 
		Started Wed, 03 Nov 2010 20:47:44
	Screen width/height:1280/1024
	colorDepth:24
	Browser inner width/height: 1024/681

NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
NOISE: Found crashdump: /tmp/tmp8KwSdg/profile/minidumps/1c339def-6b29-b54f-0dc34568-76736310.dmp


http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288836283.1288839609.28223.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/03 19:04:43 
s: talos-r3-fed-052
Running test dromaeo_sunspider: 
		Started Wed, 03 Nov 2010 19:31:51
	Screen width/height:1280/1024
	colorDepth:24
	Browser inner width/height: 1024/681

NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
NOISE: Found crashdump: /tmp/tmpcSA5E_/profile/minidumps/702a9df3-032f-5c1d-2b95bb99-331c4a1c.dmp

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288820268.1288823768.26592.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/03 14:37:48 
s: talos-r3-w7-015
Running test dromaeo_sunspider: 
		Started Wed, 03 Nov 2010 15:06:54
	Screen width/height:1280/1024
	colorDepth:24
	Browser inner width/height: 1008/675

NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
Failed dromaeo_sunspider: 

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288821577.1288825085.31571.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/03 14:59:37 
s: talos-r3-w7-039
Running test dromaeo_sunspider: 
		Started Wed, 03 Nov 2010 15:28:48
	Screen width/height:1280/1024
	colorDepth:24
	Browser inner width/height: 1008/675

NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-morph.html (next: http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-3d-raytrace.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-binary-trees.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-fannkuch.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)
NOISE: 
NOISE: __FAILbrowser frozen__FAIL
Failed dromaeo_sunspider:
(Reporter)

Comment 1

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288843658.1288847424.24753.gz
s: talos-r3-w7-022

FAIL: Busted: dromaeo_sunspider
FAIL: browser frozen
(Reporter)

Comment 2

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288845009.1288848801.29548.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/03 21:30:09 
s: talos-r3-w7-029

FAIL: Busted: dromaeo_sunspider
FAIL: browser frozen
(Reporter)

Comment 3

8 years ago
I guess the other possibility is that someone broke it recently, and people have broken it before, and that's why it seems like I've seen it before.

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288847377.1288850895.4789.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/03 22:09:37 
s: talos-r3-w7-005

FAIL: Busted: dromaeo_sunspider
FAIL: browser frozen
(Reporter)

Comment 4

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288865498.1288869037.23187.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/04 03:11:38 
s: talos-r3-w7-041

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288872217.1288875561.24177.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/04 05:03:37 
s: talos-r3-fed-026
Might be a dup of bug 604961.
(Reporter)

Comment 6

8 years ago
Could well be, though the closest tie there seems to be RyanVM saying he sees it in sunspider-access-nsieve (in that I could be misunderstanding where the hang is, and sunspider-access-nbody finishes fine but sunspider-access-nsieve hangs without saying anything).

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288906353.1288909714.16349.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/04 14:32:33 
s: talos-r3-fed-033
Yeah, we don't know for sure so we should keep this open. I just wanted to make sure you (and other followers of this bug) knew there was a related one out there.
(Reporter)

Comment 8

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288911710.1288915245.9871.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/04 16:01:50 
s: talos-r3-w7-029

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288913348.1288916763.16593.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/04 16:29:08 
s: talos-r3-fed-030
(Reporter)

Comment 9

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288920485.1288923995.17524.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/04 18:28:05 
s: talos-r3-w7-042
(Reporter)

Comment 10

8 years ago
And whether or not it's the same hang that's been around since April, something has certainly happened recently to turn it from rare to nearly-constant - even with my enormous ability to ignore Talos orange, there is absolutely no way I could have been ignoring one to two instances of this on every single run for more than a couple of weeks, tops. Nor is there any way for it to be TM-only other than by being caused by something landed since the last merge to mozilla-central.

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288921965.1288925515.24510.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/04 18:52:45 
s: talos-r3-w7-019
blocking2.0: --- → ?
(Reporter)

Comment 11

8 years ago
We need a blocking-next-merge flag :)
(Reporter)

Comment 12

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288970383.1288973906.14178.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/05 08:19:43 
s: talos-r3-w7-005
(Reporter)

Comment 13

8 years ago
While we have had four pushes since when neither Windows nor Linux failed, so it could have been caused by something other than the first push where it hit, it seems pretty suspicious that the first instance was on the push for bug 598650.
Blocks: 598650
Keywords: regression
(Reporter)

Comment 14

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289059297.1289062807.19687.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/06 09:01:37 
s: talos-r3-w7-045
(Assignee)

Comment 15

8 years ago
What is this test doing? Similar to http://dromaeo.com/?sunspider ?
I don't know why increasing the malloc trigger should result in a timeout. I looked at the working set size for this page and it looks the same as before. 
For the Prime Number Computation test we go up to 1GB but access-nbody is no high-throughput benchmark afaik.
(Reporter)

Comment 16

8 years ago
I may well be wrong about access-nbody - if the only output comes at the _end_ of a page's cycle, then the significant part of "Cycle 1: loaded http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next: http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)" would be the "next:" rather than the "loaded".

Files are in http://hg.mozilla.org/build/talos/file/tip/page_load_test/dromaeo, but I'm not sure where the harness bits are, or what decides on __FAILbrowser frozen__FAIL.
(Reporter)

Comment 17

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289217283.1289220634.17403.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/08 03:54:43 
s: talos-r3-fed-051

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289205002.1289208539.19610.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/08 00:30:02 
s: talos-r3-w7-020
(Reporter)

Comment 18

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289248063.1289251402.28653.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/08 12:27:43 
s: talos-r3-fed-028
(Reporter)

Comment 19

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289240344.1289243869.24834.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/08 10:19:04 
s: talos-r3-w7-041
(Reporter)

Comment 20

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289255973.1289259763.7655.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/08 14:39:33 
s: talos-r3-w7-005
(Reporter)

Comment 21

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289257377.1289260725.12090.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/08 15:02:57 
s: talos-r3-fed-033
(Reporter)

Comment 22

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289260194.1289263794.26757.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/08 15:49:54
s: talos-r3-w7-014
(Reporter)

Comment 23

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289264684.1289268186.14899.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/08 17:04:44 
s: talos-r3-w7-050

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289265830.1289269364.20234.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/08 17:23:50 
s: talos-r3-w7-027

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289270567.1289273912.8326.gz
Rev3 Fedora 12 tracemonkey talos dromaeo on 2010/11/08 18:42:47 
s: talos-r3-fed-027
(Reporter)

Comment 24

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289329845.1289333358.22942.gz
Rev3 WINNT 6.1 tracemonkey talos dromaeo on 2010/11/09 11:10:45 
s: talos-r3-w7-009
>  I may well be wrong about access-nbody - if the only output comes at the _end
>  of a page's cycle, then the significant part of "Cycle 1: loaded
>  http://localhost/page_load_test/dromaeo/sunspider-access-nbody.html (next:
>  http://localhost/page_load_test/dromaeo/sunspider-access-nsieve.html)" would be
>  the "next:" rather than the "loaded".

The output comes at the end of a page's cycle (once the page has fully loaded and timed) so we would be more interested in the 'next' page.  

Timeouts occur after 20 minutes with no output from the browser (longer for mobile tests).
(Reporter)

Comment 26

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289343919.1289347276.25741.gz
Summary: Frequent hangs in dromaeo_sunspider during dromaeo/sunspider-access-nbody.html on TraceMonkey → Frequent hangs in dromaeo_sunspider during sunspider-access-nsieve.html (after sunspider-access-nbody.html) on TraceMonkey
What does it take to get a stack trace from the hung browser at that point?  Can we force a crash in a way that causes breakpad to trigger, or use cdb/ntsd?  That would probably go a long way towards finding the problem.
Alice/Ted: can we hack the scripts on windows so that when the browser hangs we run a command like this?

/c/windows/system32/ntsd.exe -pv -pn firefox.exe -g -noio -c ".dump /ma /u c:\temp\hang.dmp;q"
(cd /c/temp; echo *.dmp; zip -p `ls *.dmp | sed s/dmp/zip/` *.dmp)

That should give us a minidump for the hung process, at a cost of < 100MB accumulated disk space per failure.
(ntsd should be on win2k3 in that location by default, but if not we should probably install the debugging tools on the slaves anyway)
The unit test harnesses already use crashinject ( http://mxr.mozilla.org/mozilla-central/source/build/win32/crashinject.cpp ) to produce minidumps and get stacks, it probably wouldn't be hard to make Talos do the same thing.
(Assignee)

Comment 43

8 years ago
Created attachment 489816 [details] [diff] [review]
patch

This patch (reducing MAX_MALLOC_BYTES to 80 MB) fixes the hang (on tryserver) and still avoids GC runs during SS.
It is not ideal since we still don't know what the real problem is.
Attachment #489816 - Flags: review?(gal)

Comment 44

8 years ago
I think you said yourself why I can't r+ this. We have to know what's going on.
(Reporter)

Comment 45

8 years ago
I suspect this is no help at all, but while taras was trying yet again to land the switch to GCC 4.5 this morning, he got a hang in dromaeo_sunspider during nsieve, and unlike us he got to have a stack. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1289489121.1289492488.19861.gz

Updated

8 years ago
Blocks: 559964

Comment 46

8 years ago
(In reply to comment #45)
> I suspect this is no help at all, but while taras was trying yet again to land
> the switch to GCC 4.5 this morning, he got a hang in dromaeo_sunspider during
> nsieve, and unlike us he got to have a stack.
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1289489121.1289492488.19861.gz

Hopefully this is the same bug, so a fix here is the final thing needed for gcc 4.5.

Comment 47

8 years ago
Is it reasonable to expect this bug to get fixed soon or should we put off GCC 4.5 deployment for a few weeks?
Gregor: could you comment on the prognosis here?
Assignee: general → anygregor
From the build provided for me in bug 612445, I'm attempting to isolate a freeze and grab a minidump.
Depends on: 612445
(Reporter)

Comment 67

8 years ago
Not sure what to make of http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1290034182.1290037568.32085.gz since it's what appears from the outside to be the same hang, but on m-c on the rev *before* TM merged and we should be seeing this hang there.
(Reporter)

Comment 69

8 years ago
And http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1290039066.1290042465.22039.gz is post-merge, so it should be this, but the stack in it doesn't look very obviously helpful.

Comment 70

8 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1290041016.1290044664.31016.gz&fulltext=1
Summary: Frequent hangs in dromaeo_sunspider during sunspider-access-nsieve.html (after sunspider-access-nbody.html) on TraceMonkey → Frequent hangs in dromaeo_sunspider during sunspider-access-nsieve.html (after sunspider-access-nbody.html)

Comment 73

8 years ago
Gregor needs access to a windows machine. Filing an IT bug.

Comment 78

8 years ago
I filed an IT bug (bug 613402) for this. If there is no easy way for IT to do this, I can give you access to a box with vmware + vnc, but you have to install windows and the necessary tools.
(Reporter)

Comment 116

8 years ago
I hadn't quite thought of it this way, but now that this has been merged to mozilla-central, this bug really does block bug 559964 even if there's no connection - we can't turn on GCC 4.5, which bounced out of mozilla-central because it made dromaeo_sunspider hang in sunspider-access-nsieve.html, until after we're not hanging in sunspider-access-nsieve.html from this, so we can tell whether or not we're hanging from that.
(In reply to comment #78)
> I filed an IT bug (bug 613402) for this. If there is no easy way for IT to do
> this, I can give you access to a box with vmware + vnc, but you have to install
> windows and the necessary tools.

Any progress here? You merged a very frequent intermittent orange to mozilla-central, comment 61 has a full memory dump that may assist in analysis, but there's been no visible progress here. This is not a great place to be, and someone needs to fix it.

Updated

8 years ago
blocking2.0: ? → beta9+
(Assignee)

Comment 130

8 years ago
(In reply to comment #121)
> (In reply to comment #78)
> > I filed an IT bug (bug 613402) for this. If there is no easy way for IT to do
> > this, I can give you access to a box with vmware + vnc, but you have to install
> > windows and the necessary tools.
> 
> Any progress here? You merged a very frequent intermittent orange to
> mozilla-central, comment 61 has a full memory dump that may assist in analysis,
> but there's been no visible progress here. This is not a great place to be, and
> someone needs to fix it.

I am still waiting to get access to a windows machine. IT is working on it.

Comment 150

8 years ago
This is really driving people crazy on mozilla-central.  Raising the priority in hopes that it gets some review love soon.
Severity: normal → critical
(In reply to comment #150)
> This is really driving people crazy on mozilla-central.  Raising the priority
> in hopes that it gets some review love soon.

Let's land it, then, and we can try to figure out the underlying problem locally. Andreas, what do you think?
(Assignee)

Comment 174

8 years ago
(In reply to comment #162)
> (In reply to comment #150)
> > This is really driving people crazy on mozilla-central.  Raising the priority
> > in hopes that it gets some review love soon.
> 
> Let's land it, then, and we can try to figure out the underlying problem
> locally. Andreas, what do you think?

Yeah I am also for this temporary solution.
(Assignee)

Comment 176

8 years ago
It seems we get stuck in the cycle collector:

 	nspr4.dll!_PR_MD_WAIT_CV(_MDCVar * cv, _MDLock * lock, unsigned int timeout)  Line 280 + 0x14 bytes	C
 	nspr4.dll!_PR_WaitCondVar(PRThread * thread, PRCondVar * cvar, PRLock * lock, unsigned int timeout)  Line 204 + 0x17 bytes	C
 	nspr4.dll!PR_WaitCondVar(PRCondVar * cvar, unsigned int timeout)  Line 547 + 0x17 bytes	C
 	xul.dll!mozilla::CondVar::Wait(unsigned int interval)  Line 373 + 0x11 bytes	C++
>	xul.dll!nsCycleCollectorRunner::Collect(nsICycleCollectorListener * aListener)  Line 3362	C++
 	xul.dll!nsCycleCollector_collect(nsICycleCollectorListener * aListener)  Line 3473 + 0xf bytes	C++
 	xul.dll!nsJSContext::CC(nsICycleCollectorListener * aListener)  Line 3635 + 0x9 bytes	C++
 	xul.dll!nsDOMWindowUtils::GarbageCollect(nsICycleCollectorListener * aListener)  Line 657 + 0x9 bytes	C++
 	xul.dll!NS_InvokeByIndex_P(nsISupports * that, unsigned int methodIndex, unsigned int paramCount, nsXPTCVariant * params)  Line 103	C++
 	xul.dll!CallMethodHelper::Invoke()  Line 3058 + 0x1c bytes	C++
 	xul.dll!CallMethodHelper::Call()  Line 2320 + 0x8 bytes	C++
Comment on attachment 489816 [details] [diff] [review]
patch

(In reply to comment #174)
> (In reply to comment #162)
> > (In reply to comment #150)
> > > This is really driving people crazy on mozilla-central.  Raising the priority
> > > in hopes that it gets some review love soon.
> > 
> > Let's land it, then, and we can try to figure out the underlying problem
> > locally. Andreas, what do you think?
> 
> Yeah I am also for this temporary solution.

OK, let's do it.

We should probably create a new bug for the underlying problem--the comments here are a mess.
Attachment #489816 - Flags: review?(gal) → review+
FYI, I just told ehsan it would be fine to land this patch in m-c, since it's got review and the orange is driving people crazy and all.

Comment 183

8 years ago
http://hg.mozilla.org/mozilla-central/rev/5d4678e9fc37

Let's tentatively call this fixed, and reopen if it happens on future builds!
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Looks to me like it happened again:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291169503.1291172990.13400.gz

On

http://hg.mozilla.org/mozilla-central/rev/824f8a023254

Which has that revision.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backed out the patch for being in the range of bug 615736
This doesn't look like it affects gcc 4.5 deployment at all.  Can we remove this as a blocker for bug 578880?
(In reply to comment #197)
> This doesn't look like it affects gcc 4.5 deployment at all.  Can we remove
> this as a blocker for bug 578880?

The test went from intermittent orange to permaorange last time we tried to deploy it, and it's pretty hard to discern if there's a difference without this getting fixed first.
(Reporter)

Comment 201

8 years ago
Did it go to permaorange, or was it the way I remember it, that we landed the switch to GCC 4.5, got a single build and a single test run, had a single hang, and backed out? Personally, if I was the one who wanted GCC 4.5, I'd be pushing the switch to the tryserver and then asking releng to run ten sets of dromaeo on it.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291301879.1291305511.30391.gz
(In reply to comment #201)
> Personally, if I was the one who wanted GCC 4.5, I'd be pushing
> the switch to the tryserver and then asking releng to run ten sets of dromaeo
> on it.

Sure, that sounds great.  I don't really know how to push on that significantly, though.  If gcc 4.5 didn't cause this to go permaorange (as phil remembers, that we only had one build), I would say it's worth doing the switch again even on non-tryserver if it's possible.

Comment 206

8 years ago
(In reply to comment #205)
> (In reply to comment #201)
> > Personally, if I was the one who wanted GCC 4.5, I'd be pushing
> > the switch to the tryserver and then asking releng to run ten sets of dromaeo
> > on it.
> 
> Sure, that sounds great.  I don't really know how to push on that
> significantly, though.  If gcc 4.5 didn't cause this to go permaorange (as phil
> remembers, that we only had one build), I would say it's worth doing the switch
> again even on non-tryserver if it's possible.

GCC 4.5 triggered the same permaorange as was present on tracemonkey branch. We then backed out 4.5 and tracemonkey-merge cause mc to go permaorange.

I later checked on try and gcc 4.5(without tm) causes the same permaorange bug on try given even runs. https://bugzilla.mozilla.org/show_bug.cgi?id=590181#c101
(Assignee)

Comment 207

8 years ago
I guess statistics was against me with this fix. I got 3 green tryserver runs with it.
We can also reset the MAX_MALLOC_BYTES to 64 again and take the SS regression if you want to be on the safe side with the GCC switch.
I got access to a windows machine yesterday but I can only start serious debugging next week.
If we're deadlocked in the CC, maybe bent can help before next week?  I think we have to take the regression for now, though.
(In reply to comment #176)
> It seems we get stuck in the cycle collector:

Since CC is off the main thread now we will block while we wait for CC to finish, so the stack you posted is expected. Is it really deadlocked? What is the CC thread doing?

Sorry, I wasn't CC'd so I didn't know this was happening until today.
(Assignee)

Comment 246

8 years ago
I haven't seen the hang on tracemonkey for a few days now (there were about 13 builds). Phil you know more about the frequency of this bug. Do you think it's more luck or maybe it got fixed?
(Reporter)

Comment 248

8 years ago
Maybe, but it's hard to be confident with the limited number of pushes (zero Sunday, one Saturday, though seven Friday). I'm trying to get a bunch of runs on the last finished TM rev, so we can see.

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291661655.1291665279.22944.gz
(Reporter)

Comment 249

8 years ago
Fun fact: I accidentally failed to say that I wanted extra runs of Win7, so I got WinXP, which... hey, wait, why doesn't WinXP ever hang?

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291664536.1291668261.3836.gz
(Assignee)

Comment 257

8 years ago
We are able to reproduce this hang now. 
I noticed 3 different hangs:

1) The browser freezes in the nsieve benchmark. The main thread waits in the constructor of AutoGCSession. (Probably most common).

2) The dromaeo tab hangs in the nsieve benchmark and doesn't render properly any more but the browser is still responsive and I can open another tab for example.

3) The browser stops for a second at the nsieve benchmark and continues to always execute 2 benchmarks in parallel afterwards.
Depends on: 617505
Status: NEW → RESOLVED
Last Resolved: 8 years ago8 years ago
No longer depends on: 612445, 617505
Resolution: --- → DUPLICATE
Duplicate of bug: 617505

Comment 264

8 years ago
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
No longer blocks: 438871, 559964, 598650
blocking2.0: beta9+ → betaN+
Keywords: regression

Updated

8 years ago
Keywords: regression

Updated

8 years ago
Blocks: 559964, 598650
Keywords: intermittent-failure
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.