<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 1

•

12 years ago

Attached patch skip another WebGL test on WinXP (obsolete) — Details — Splinter Review

So among the WebGL test pages run just before the OOM, the only one to do anything weird with huge 'size' arguments and also the last one run just before the OOM, is: conformance/more/functions/bufferSubDataBadArgs.html Indeed, it does: gl.bufferSubData(gl.ELEMENT_ARRAY_BUFFER, 2400000, new Uint16Array([1,2,3])); Now that shouldn't cause any buffer allocation, but it's not unconceivable that a buggy driver would OOM as a result of that call, and that's my only theory anyway.

Attachment #739763 - Flags: review?(jgilbert)

Comment 2

•

12 years ago

Have you tested that this fixes it? It would seem like this at-worst allocates ~2.4MB for the buffer.

Comment 3

•

12 years ago

I haven't tested it. You're right, we should push this to try and trigger M1 a lot of times.

https://tbpl.mozilla.org/?tree=Try&rev=373f4ce540fe

Comment 4

•

12 years ago

Reporter

Comment 5

•

12 years ago

Why does the Try run show quickCheckAPI-B2.html still running (and hanging)?

Comment 6

•

12 years ago

Ouch. Good catch. Needs debugging.

Comment 7

•

12 years ago

This try push, if I didn't make a syntax error, will tell us something in the logs. https://tbpl.mozilla.org/?tree=Try&rev=747c3aa9ef7f

Comment 8

•

12 years ago

Indeed, "testsToSkip = <html><head><title>404 Not Found</title></head><body><h1>404 Not Found</h1>", pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/6cfec8d6ce46 as a bug 862520 followup.

Comment 9

•

12 years ago

Haha, yeah, that would do it :-) sorry about this omission.

Comment 10

•

12 years ago

Looks like now we want to skip quickCheckAPI-B3.html and bufferSubDataBadArgs.html.

Comment 11

•

12 years ago

bufferSubDataBadArgs is already under review here. Let me update the patch to add quickCheckAPI-B3.

Comment 12

•

12 years ago

Attached patch skip more tests — Details — Splinter Review

Attachment #739763 - Attachment is obsolete: true

Attachment #739763 - Flags: review?(jgilbert)

Attachment #744151 - Flags: review?(jgilbert)

Updated

•

12 years ago

Attachment #744151 - Flags: review?(jgilbert) → review+

Comment 13

•

12 years ago

Attached patch patch for landing — Details — Splinter Review

Please land this for me, inbound is closed now.

Attachment #744268 - Flags: review+

Attachment #744268 - Flags: checkin?

Reporter

Comment 14

•

12 years ago

Comment on attachment 744268 [details] [diff] [review] patch for landing https://hg.mozilla.org/integration/mozilla-inbound/rev/3b76894efafa

Attachment #744268 - Flags: checkin? → checkin+

Comment 15

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22476141&tree=Mozilla-Inbound - quickCheckAPI-B4.html

Comment 16

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22481103&tree=Mozilla-Inbound - deleteBufferBadArgs.html

Comment 17

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22482553&tree=Mozilla-Inbound - quickCheckAPI-B4.html

Comment 18

•

12 years ago

So could we actually be leaking? (for some definition of 'leak') Does some accumulation of memory accrue while running the test suite, such that we really do hit an OOM?

Comment 19

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22483709&tree=Mozilla-Inbound - quickCheckAPI-B4.html, whom I'm already ready to see the backside of.

Comment 20

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22484721&tree=Mozilla-Inbound - deleteBufferBadArgs.html

https://hg.mozilla.org/mozilla-central/rev/3b76894efafa

Comment 21

•

12 years ago

Assignee: nobody → bjacob

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla23

Comment 22

•

12 years ago

This smells a lot like we're bailing water instead of plugging the leaks. If we split the conformance suite tests in half, I wouldn't be surprised to see these failures go away.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Whiteboard: [keep open]

Comment 23

•

12 years ago

Maybe "we" are leaking, but it's fairly likely to be a leak in a driver rather than our own --- this is mature code that runs well elsewhere and hasn't changed much recently --- so disabling tests may still be the only thing to do about it.

Comment 24

•

12 years ago

Attached patch skip yet more tests (obsolete) — Details — Splinter Review

Attachment #745260 - Flags: review?(jgilbert)

Comment 25

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #23) > Maybe "we" are leaking, but it's fairly likely to be a leak in a driver > rather than our own --- this is mature code that runs well elsewhere and > hasn't changed much recently --- so disabling tests may still be the only > thing to do about it. Maybe, but I think we're just going to end up disabling all tests after these ones, as they come back OOM one after the other. Would we still OOM if we did half the tests, killed the tab, then reloaded it to do the second half?

https://tbpl.mozilla.org/php/getParsedLog.php?id=22574566&tree=Mozilla-Inbound

Comment 26

•

12 years ago

Summary: WebGL mochitests cause frequent Windows XP mochitest-1 crashes → WinXP M1 "remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." from WebGL mochitests

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 29

•

12 years ago

Comment 28 would be attachment 745260 [details] [diff] [review] moving the failure on to quickCheckAPI-C.html.

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 38

•

12 years ago

Final tally: out of 100, 8 did hit this, 6 in quickCheckAPI-C.html and 2 in quickCheckAPI-D_G.html, but on the other hand, twice I hit a complex of media/ timeouts that we almost never hit because you eat every single slow run, and twice I hit a complex of XHR timeouts that we've never hit before because you eat every single slow run. So I'm starting another try run with those two skipped as well, because all too often I've waited for the perfect through a couple thousand failures. If someone can write a patch that lets us run those tests without killing buildbot, sweet, all the more so if they write it while we're still young, but I'm not waiting.

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 46

•

12 years ago

Or, I guess I could just fix it.

Assignee: bjacob → philringnalda

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 51

•

12 years ago

Alas, those last two, in or around bufferSubDataBadArgs.html, mean I'm not actually fixing it, just making it much less frequent.

Assignee: philringnalda → nobody

Whiteboard: [keep open] → [leave open]

Target Milestone: mozilla23 → ---

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 54

•

12 years ago

Pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/b109e2dbf03b to at least take us down from around 10% failure to around 1%. I didn't fix the comment to explain the double GC yet, because I haven't yet found anyone who will explain the why behind what I was cargo-culting from other uses.

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 57

•

12 years ago

And because the world is harsh and cruel, that's a deleteBufferBadArgs.html on my push.

https://hg.mozilla.org/mozilla-central/rev/b109e2dbf03b

Comment 58

•

12 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 60

•

12 years ago

(In reply to Jeff Gilbert [:jgilbert] from comment #25) > (In reply to Benoit Jacob [:bjacob] from comment #23) > > Maybe "we" are leaking, but it's fairly likely to be a leak in a driver > > rather than our own --- this is mature code that runs well elsewhere and > > hasn't changed much recently --- so disabling tests may still be the only > > thing to do about it. > > Maybe, but I think we're just going to end up disabling all tests after > these ones, as they come back OOM one after the other. We're still talking about bufferData / bufferSubData tests here. It's still possible that we're dealing with a driver issue that's limited to these functions. For that reason, I still believe that the present patch is worth a try. In later comments Philor also mentioned failures in other pages, but they are pages just after the pages we are disabling here, so it could be just a deferred effect of this OOM bug. > > Would we still OOM if we did half the tests, killed the tab, then reloaded > it to do the second half? I don't know how we would do that automatically on test slaves. Maybe xpcshell can do that, but that sounds like a lot of work.

Updated

•

12 years ago

Flags: needinfo?(jgilbert)

Comment hidden (Legacy TBPL/Treeherder Robot)

Reporter

Comment 81

•

12 years ago

Jeff - review ping. I know this isn't a great solution, but neither is the current situation.

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 122

•

12 years ago

Please can you disable some more tests asap (the patch awaiting review seems out of date given comment 54) - since this is soon going to have things disabled by sheriffs, who may not be so discerning as to what we switch off. Thank you :-)

Flags: needinfo?(bjacob)

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 124

•

12 years ago

So first of all, it seems that this changeset accidentally removed from the skip-list two of the three tests that we had added to that skip-list in attempts to fix this bug: https://hg.mozilla.org/mozilla-central/rev/b109e2dbf03b So I just landed on inbound a cset re-adding them to the skip-list: https://hg.mozilla.org/integration/mozilla-inbound/rev/a644a0c9f92d Other than that I don't see how the patch awaiting review here is obsoleted by comment 54, but looking at recent starrings, if we are going to take them at face value, they would seem to imply that we have to skip also some of the other *BadArgs tests, namely the texture-related ones. But it's hard to tell if that isn't just an artifact of having removed the above-mentioned two tests from the skip-list.

Flags: needinfo?(bjacob)

Comment 125

•

12 years ago

Attached patch add more tests to skip — Details — Splinter Review

We want to wait a little bit to see if my inbound push (previous comment), re-adding the two tests to the skip-list, is enough. If not, then this patch takes a heavy-handed approach to adding a bunch more tests to it, namely all the *BadArgs ones that pertain to buffers or textures.

Attachment #745260 - Attachment is obsolete: true

Attachment #745260 - Flags: review?(jgilbert)

Attachment #749862 - Flags: review?(jgilbert)

Comment 126

•

12 years ago

(In reply to Benoit Jacob [:bjacob] from comment #124) > Other than that I don't see how the patch awaiting review here is obsoleted > by comment 54 I meant more since the comment 54 changeset removed items from the disable list (as though the approach there was supposed to have fixed things), but that appears to have been an oversight now :-)

Comment 127

•

12 years ago

Thank you :-)

Comment 128

•

12 years ago

It wasn't an accidental removal, with an empty skiplist and 100 runs on try (multiple times, as I tried slightly different things, so say 400-500 runs) I was seeing zero quickCheckAPI failures.

Comment hidden (Legacy TBPL/Treeherder Robot)

https://hg.mozilla.org/mozilla-central/rev/a644a0c9f92d

Reporter

Comment 130

•

12 years ago

Comment hidden (Legacy TBPL/Treeherder Robot)

Comment 193

•

12 years ago

Comment on attachment 749862 [details] [diff] [review] add more tests to skip As much as I love disabling tests (and that's *really really a lot*), you don't need to bother with this. If you expand the tbplbot comments and look at the slave names, you'll see that every single one of them includes "r3-xp" and that none of them includes "xp32-ix". That means that this only happens on the old WinXP Mac mini slaves, and it doesn't happen on the new slaves. Just let it ride, the old slaves (and their insufficient memory or cruddy driver or whatever the actual problem is) will be gone before too much longer.

Attachment #749862 - Flags: feedback-

Comment hidden (Legacy TBPL/Treeherder Robot)

Carsten Book [:Tomcat]

Updated

•

12 years ago

Depends on: 925285

Updated

•

12 years ago

Whiteboard: [leave open]

Comment 249

•

12 years ago

Comment 248 was a mis-star. The others are on release branches etc, likely for the reason in comment 193.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → WORKSFORME

Updated

•

12 years ago

Flags: needinfo?(jgilbert)