Open Bug 467742 Opened 16 years ago Updated 2 years ago

Frequent random orange in Windows: test_acid3_test46.html times out

Categories

(Core :: General, defect)

x86
Windows XP
defect

Tracking

()

People

(Reporter: roc, Unassigned)

Details

http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox&errorparser=unittest&logfile=1228266174.1228270184.26607.gz&buildtime=1228266174&buildname=WINNT%205.2%20mozilla-central%20moz2-win32-slave07%20dep%20unit%20test&fulltext=1

Changeset 09195bdb8ff7 looks OK. There are failing builds with changeset c6b884676c0d. So something in that window caused it, but I've no idea what. Bug 458898 is my current best guess; I'll try backing that out. But then I need to sleep.
I'm a bit suspicious of bug 463289 though. Can't it cause deadlocks, if we try to load a component while holding a lock which the main thread is waiting on?
I've backed out my patch for bug 458898. If that doesn't work, I recommend backing out bug 463289 next.
Did anyone catch the deadlock in a debugger? We aren't supposed to hold locks while loading components...
Did one of the backouts fix things?  If so, which?
Apparently they didn't fix. Windows tboxes have been randomly orange
for sometime - maybe since branching.
Is that random orange all related to this test?
I don't think so. The end result is timeout error, but it may happen also elsewhere than after test_acid3_test46.html
Are the unit test boxes on more-heavily-loaded VMs now that the branch tinderboxes are running too?
In a scan of the 12 hours before c6b884676c0d landed, I didn't see any hangs in test_acid3_test46.html. There are lots from c6b884676c0d on. So I think something did change around then.
(In reply to comment #8)
> I don't think so. The end result is timeout error, but it may happen also
> elsewhere than after test_acid3_test46.html

Yeah, the point of failure keeps moving, although it's often in the same test on different runs, which is interesting --- the failure point is not completely random.

But this definitely looks like it started in the window ending at c6b884676c0d and probably (but not necessarily) starting after 09195bdb8ff7.
These are the non-merge changesets in that window that haven't been backed out yet:

ce72e9a5dca0	Michael Ventnor — Bug 458031. Take dirty rect into account to limit box-shadow computation. r+sr=roc
1d84189da181	Robert Longson — Bug 465996. Use Ellipse instead of Arc to draw circles. r+sr=roc
9e1eab6135e2	Jonathan Kew — Bug 467228. Disable line start/end swashes on Mac since we don't support line-boundary shaping properly yet. r=roc (Tests)
7baaa800925d	Jonathan Kew — Bug 467228. Disable line start/end swashes on Mac since we don't support line-boundary shaping properly yet. r=roc
992000e45526	Robert O'Callahan — Bug 467283. Ignore dirty rect when doing any image resampling --- it will lead to artifacts. r+sr=dbaron,r=vlad
aaeb20c61fca	Robert O'Callahan — Bug 455826. Don't reconstruct textruns just because we deleted an empty nsContinuingTextFrame. r=smontagu
885dc81bc31b	Robert O'Callahan — Bug 442633. Detect removal of href attribute on SVG <use> elements. r=longsonr,sr=mats
f603fec24bf7	Josh Aas — fix a drawing order glitch in the mac default plugin. b=467580 sr=jst
49a032846a3a	Oleg Romashin — Bug 463872 - Cairo-qpainter build is broken after latest cairo update. missing part. r=vladimir.
4b4ee8b2dc54	Benjamin Smedberg — Bug 467579: --with-static-checking is broken in spidermonkey. There is currently no useful static checking infrastructure for spidermonkey, so disable it for the time being, r=jimb
67f8a5b06156	Peter Weilbacher — [OS/2] No Bug: add minor change and comment to gfxOS2FontGroup::FontCallback; fix debug output for missing fonts
6156d0a39763	Peter Weilbacher — Bug 466956: fix alias check in gfxFontconfigUtils::ResolveFontName for correct return value, r=karlt, sr=roc
bbf7d0e42c09	Arno Renevier — Fix npruntime sample compile problem, npupp.h -> npfunctions.h. b=464481 r=josh sr=jst
52488eb15168	Benjamin Smedberg — Change the stack-class analysis to a warning instead of an error, at least temporarily: the analysis was buggy when originally landed, and there are some heap-allocated autostrings outstanding through the tree.
211c2be2fa1e	Benjamin Smedberg — Bug 466492 - test for the existence of jar.mn in make, rather than in a shell script: this allows us to avoid launching the subshell in the common case where a jar.mn is not present r=ted
9f3807b5e936	Benjamin Smedberg — Bug 442012 - Allocating more than 2GB of memory in mozilla is never a good idea. On 64-bit systems PRSize and size_t are 64-bit and so truncation from PRSize to PRUint32 could cause weird behavior errors. Prevent these huge allocations. r=wtc sr=dveditz
dcd1373d1dff	Benjamin Smedberg — Bug 463420 - SIMPLE_PROGRAMS leads to bustage with generated.pdb r=ted
Most of those patches are trivially safe, not part of the build, or don't affect Windows. That leaves only

ce72e9a5dca0    Michael Ventnor — Bug 458031. Take dirty rect into account to
limit box-shadow computation. r+sr=roc
1d84189da181    Robert Longson — Bug 465996. Use Ellipse instead of Arc to draw
circles. r+sr=roc
992000e45526    Robert O'Callahan — Bug 467283. Ignore dirty rect when doing
any image resampling --- it will lead to artifacts. r+sr=dbaron,r=vlad
aaeb20c61fca    Robert O'Callahan — Bug 455826. Don't reconstruct textruns just
because we deleted an empty nsContinuingTextFrame. r=smontagu
885dc81bc31b    Robert O'Callahan — Bug 442633. Detect removal of href
attribute on SVG <use> elements. r=longsonr,sr=mats

Of those, I'd guess bug 455826 as being the most risky.
The failures might be related to bug 467634, especially if the test is particularly resource intensive.
Backing out bug 455826 seems to have fixed it. Also, nthomas got a stack for a mochitest crash in bug 467150 that implicates the same patch.

I don't understand why this manifested as a timeout on Tinderbox. Maybe once I look more deeply into the bug, I'll figure that out.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.