Open Bug 1167155 Opened 5 years ago Updated Last year

Intermittent Win7 any test in writing-mode/abspos/ like s71-abs-pos-non-replaced-vrl-078-ref.xht | load failed: timed out waiting for test to complete (waiting for onload scripts to complete)

Categories

(Core :: Graphics, defect, P3)

Unspecified
Windows 7
defect

Tracking

()

Tracking Status
firefox44 --- fixed

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [gfx-noted][tests disabled on Win7])

Windows 7 32-bit mozilla-inbound opt test reftest-no-accel

https://treeherder.mozilla.org/logviewer.html#?job_id=10037911&repo=mozilla-inbound

02:33:14 INFO - REFTEST TEST-UNEXPECTED-FAIL | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-078-ref.xht | load failed: timed out waiting for test to complete (waiting for onload scripts to complete)
This is a recently added test (May 19) in bug 1079151.

The error "waiting for onload scripts to complete" is a bit odd considering
there are no onload scripts in the test/reference.  There's no script at all
AFAICT.
Blocks: 1079151
Keywords: regression
Yes, I don't see how the error message can possibly be true!

However, looking in the log immediately preceding the failure, there's a suspicious-looking gfx failure:

02:28:14 INFO - REFTEST TEST-START | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-078.xht
02:28:14 INFO - REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-078.xht | 12040 / 12432 (96%)
02:28:14 INFO - REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-078-ref.xht | 12040 / 12432 (96%)
02:28:14 INFO - ??????????p: N[GFX1]: Failed to create similar cairo surface! Size: Size(800,1000) Status: 1
02:28:14 INFO - JavaScript error: chrome://reftest/content/reftest.js, line 1393: NS_ERROR_FAILURE:
02:28:14 INFO - JavaScript error: chrome://reftest/content/reftest-content.js, line 995: TypeError: ret is undefined
02:33:14 INFO - REFTEST TEST-UNEXPECTED-FAIL | file:///C:/slave/test/build/tests/reftest/tests/layout/reftests/writing-mode/abspos/s71-abs-pos-non-replaced-vrl-078-ref.xht | load failed: timed out waiting for test to complete (waiting for onload scripts to complete) 

My guess is that the reported failure is really fallout from an internal gfx problem, perhaps lack of resources of some kind?
I agree, the root cause is likely the GFX failure:
http://mxr.mozilla.org/mozilla-central/source/gfx/2d/DrawTargetCairo.cpp?rev=8538bc4d2cbd#1518
No longer blocks: 1079151
Component: Layout → Graphics
Keywords: regression
(In reply to Jonathan Kew (:jfkthame) from comment #3)
> 02:28:14 INFO - ??????????p: N[GFX1]: Failed to create similar cairo
> surface! Size: Size(800,1000) Status: 1
> 02:28:14 INFO - JavaScript error: chrome://reftest/content/reftest.js, line
> 1393: NS_ERROR_FAILURE:
> 02:28:14 INFO - JavaScript error:
> chrome://reftest/content/reftest-content.js, line 995: TypeError: ret is
> undefined

These seem like the real problem.  The second JS error is presumably in this code in SendInitCanvasWithSnapshot:
    var ret = sendSyncMessage("reftest:InitCanvasWithSnapshot")[0];

    gHaveCanvasSnapshot = ret.painted;
    return ret.painted;

I'm not sure where the first JS error is, but it seems likely that it is the result of the "Failed to create similar cairo surface" error, which is in DrawTargetCairo::CreateSimilarDrawTarget.
Blocks: 1168401
Whiteboard: [gfx-noted]
Blocks: 1079151
My theory is that this or some other bug does still block bug 1079151, because no matter whose fault it is that these 96 tests fail to run on Win no-accel, it's still significant to that bug that its 96 (or 192, or whatever) tests are going to get disabled if nobody does anything about making them not fail.
Component: Graphics → Layout: Text
Wups, didn't mean to move the bug, just to make threatening noises at it.
Component: Layout: Text → Graphics
See also bugs 1169165, 1171767, 1172376, and 1174379, which are the same thing happening in slightly different tests. (Even in this bug report, the logs show various different tests from the set being affected.)

This looks like a graphics failure of some kind (see comments 3-5), and these tests just happen to come at roughly the point in the overall test run where something in gfx finally rolls over and dies.

Milan, any chance someone could look into this from the Windows/gfx side?
Flags: needinfo?(milan)
See Also: → 1169165, 1171767, 1172376, 1174379
Bug 1019563 is the original bug for the Win7 Ru "onload" failures, FWIW. I'm beginning to suspect that this issue is tied to OOM conditions (Win7 32-bit tends to be one of our canary platforms for such things). It's happening really frequently on PGO builds these days, though :(
The failure certainly could be from OOM.  If this test is run on the debug, that error message would assert, and I guess we're not seeing that?  Anyway, let's see the stack when it crashes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=53ef539cb54e
Flags: needinfo?(milan)
Given the frequency, I'm likely going to have to hide Ru on Win7 pgo before too long. Bug 1019563 is still collecting the bulk of the stars and you can see how frequent it remains. Can we please get some eyes on this soon?
Flags: needinfo?(milan)
See Also: → 1019563
Not sure it's graphics, given comment 5, but let's see if we can gain traction.  BenWa, can you take a look?
Flags: needinfo?(milan) → needinfo?(bgirard)
Blocks: 1171767
Blocks: 1172376
Blocks: 1220784
Blocks: 1224858
Keywords: leave-open
Whiteboard: [gfx-noted] → [gfx-noted][tests disabled on Win7]
Which disables the entire writing-mode/abspos/ directory on Win7, since they've started randomly timing out the same way in accelerated reftests as well, on trunk and on aurora, and I'm bored with being the only one who knows enough about the problem to actually star them. On the bright side, we didn't add any other bustage to Ru on Win7 during the months it was hidden.
Summary: Intermittent s71-abs-pos-non-replaced-vrl-078-ref.xht | load failed: timed out waiting for test to complete (waiting for onload scripts to complete) → Intermittent Win7 any test in writing-mode/abspos/ like s71-abs-pos-non-replaced-vrl-078-ref.xht | load failed: timed out waiting for test to complete (waiting for onload scripts to complete)
Duplicate of this bug: 1226370
Duplicate of this bug: 1220784
Duplicate of this bug: 1172376
Duplicate of this bug: 1171767
Duplicate of this bug: 1174379
Duplicate of this bug: 1169165
Duplicate of this bug: 1227410
Duplicate of this bug: 1224858
Pushed by mchang@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6f6ae4f2c283
Skip reftest/writing-mode/abspos/ on Windows 7. r=philor
Duplicate of this bug: 1299127
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Duplicate of this bug: 1299303
Flags: needinfo?(bgirard)
I suspect this may have just transformed the problem into bug 1300355.
You need to log in before you can comment on or make changes to this bug.