Closed
Bug 1019071
Opened 11 years ago
Closed 11 years ago
Frequent OS X 10.6 mochitest-4 timeouts
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
mozilla33
Tracking | Status | |
---|---|---|
firefox31 | --- | unaffected |
firefox32 | --- | fixed |
firefox33 | --- | fixed |
firefox-esr24 | --- | unaffected |
b2g-v1.4 | --- | fixed |
b2g-v2.0 | --- | fixed |
b2g-v2.1 | --- | fixed |
People
(Reporter: emorley, Assigned: dougt)
Details
Attachments
(3 files)
971 bytes,
patch
|
dougt
:
review+
|
Details | Diff | Splinter Review |
5.23 KB,
text/plain
|
Details | |
1.29 KB,
patch
|
garvan
:
review+
|
Details | Diff | Splinter Review |
Not sure what the pattern is here, but over the last week or two, there has been a distinct increase in the number of OS X timeouts, particularly in mochitest-4.
https://tbpl.mozilla.org/php/getParsedLog.php?id=40647707&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40664803&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40692307&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40727732&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40727039&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40759316&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40821611&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40856317&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=40768304&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40779603&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40807498&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40822348&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40835000&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40837845&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40839708&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40862348&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=40865152&tree=Mozilla-Inbound
Reporter | ||
Comment 1•11 years ago
|
||
Comment 2•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=40882522&tree=B2g-Inbound
Steven, as our resident OSX guru, does anything stand out to you in these stacks? :)
Flags: needinfo?(smichaud)
Comment 3•11 years ago
|
||
These are all on OS X 10.6.8. Has the 10.6-specific test machine setup changed recently?
Is the increase "distinct" enough to pin it down to a particular day?
Whereever there's a crash stack, you can tell that the app was killed -- it didn't crash on its own.
Flags: needinfo?(smichaud)
Comment 4•11 years ago
|
||
I highly doubt anything has changed with our OSX 10.6 test slaves in some time now :). But no, I don't see any recent OSX 10.6 changes in the RelEng notes from the past couple weeks.
OSX 10.6 has always been quite timeout-prone, so it's very difficult to pin down exactly when things took a turn for the worse. Going back through the history on m-c, the first M4 orange I see was on May 21.
Comment 5•11 years ago
|
||
I'm going to run retriggers from around May 21 timeframe to see if I can narrow this down a bit.
Comment 6•11 years ago
|
||
Comment 7•11 years ago
|
||
Looks like this is where things took a dive:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8e1ec45ff14d&tochange=83849e385943
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Rev4%20MacOSX%20Snow%20Leopard%2010.6%20mozilla-inbound%20opt%20test%20mochitest-4&fromchange=7ba8c658dc58&tochange=83849e385943
I'm strongly leaning towards the WebCrypto patches being at fault given that a MIPS patch is NPOTB for OSX and there's nothing media-related on the stack to implicate bug 921622 (plus that's been uplifted to other branches without any hangs). I think the NSS guys are going to have to look into this next.
Flags: needinfo?(rlb)
Reporter | ||
Comment 10•11 years ago
|
||
Reporter | ||
Comment 11•11 years ago
|
||
Reporter | ||
Comment 12•11 years ago
|
||
Reporter | ||
Comment 13•11 years ago
|
||
Reporter | ||
Comment 14•11 years ago
|
||
Comment 15•11 years ago
|
||
Comment 16•11 years ago
|
||
Comment 17•11 years ago
|
||
Comment 18•11 years ago
|
||
Comment 19•11 years ago
|
||
Comment 20•11 years ago
|
||
Assigning this to myself.
I suspect that the issue is with the threading model used by WebCrypto -- every crypto operation creates a new, short-lived thread, and the WebCrypto mochitests do this several times. It seems plausible that that could be causing weirdness. In fact, I ran a test that did 4,096 iterations of digest() calls, and things did get weird. I'll try forcing things over to the main thread and see if that makes things better.
Assignee: nobody → rlb
Flags: needinfo?(rlb)
Comment 21•11 years ago
|
||
Comment 23•11 years ago
|
||
Well, I tried removing the threading issues, and it's still not happy.
https://tbpl.mozilla.org/?tree=Try&rev=aa814cb7a141
Any suggestions on how to try to debug this? Assuming it's the WebCrypto patches, I can't think of anything besides the threading that would have this kind of global impact.
I suppose I could try repeating the test with the individual patches added/removed. Might try that in the morning.
And anyway, who's still using 10.6? ;)
Flags: needinfo?(ryanvm)
Comment 24•11 years ago
|
||
Revert to rev 21d19c772468 (AudioStream rework):
https://tbpl.mozilla.org/?tree=Try&rev=1919cfc9f280
Revert to rev 7ea38414bb31 (Add WebCrypto WebIDL interfaces):
https://tbpl.mozilla.org/?tree=Try&rev=5c2f07af074f
Revert to rev 43114cb99480 (Ensure that NSS is initialzed for CryptoTasks):
https://tbpl.mozilla.org/?tree=Try&rev=bcc473802175
Revert to rev 739a582c405c (Add support for SHA-1 and SHA-2 digests to WebCrypto API):
https://tbpl.mozilla.org/?tree=Try&rev=11f9ee68e91a
Flags: needinfo?(ryanvm)
Reporter | ||
Comment 25•11 years ago
|
||
Reporter | ||
Comment 26•11 years ago
|
||
Reporter | ||
Comment 27•11 years ago
|
||
Comment 28•11 years ago
|
||
> And anyway, who's still using 10.6? ;)
About 25% of our current Mac users.
Comment 29•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #24)
> Revert to rev 21d19c772468 (AudioStream rework):
> https://tbpl.mozilla.org/?tree=Try&rev=1919cfc9f280
>
> Revert to rev 7ea38414bb31 (Add WebCrypto WebIDL interfaces):
> https://tbpl.mozilla.org/?tree=Try&rev=5c2f07af074f
>
> Revert to rev 43114cb99480 (Ensure that NSS is initialzed for CryptoTasks):
> https://tbpl.mozilla.org/?tree=Try&rev=bcc473802175
>
> Revert to rev 739a582c405c (Add support for SHA-1 and SHA-2 digests to
> WebCrypto API):
> https://tbpl.mozilla.org/?tree=Try&rev=11f9ee68e91a
These show a pretty convincing spike starting with rev 7ea38414bb31. Not sure what to make of even-bigger spike on rev 43114cb99480 that went away with rev 739a582c405c, though.
Comment 30•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #29)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #24)
> > Revert to rev 21d19c772468 (AudioStream rework):
> > https://tbpl.mozilla.org/?tree=Try&rev=1919cfc9f280
> >
> > Revert to rev 7ea38414bb31 (Add WebCrypto WebIDL interfaces):
> > https://tbpl.mozilla.org/?tree=Try&rev=5c2f07af074f
> >
> > Revert to rev 43114cb99480 (Ensure that NSS is initialzed for CryptoTasks):
> > https://tbpl.mozilla.org/?tree=Try&rev=bcc473802175
> >
> > Revert to rev 739a582c405c (Add support for SHA-1 and SHA-2 digests to
> > WebCrypto API):
> > https://tbpl.mozilla.org/?tree=Try&rev=11f9ee68e91a
>
> These show a pretty convincing spike starting with rev 7ea38414bb31. Not
> sure what to make of even-bigger spike on rev 43114cb99480 that went away
> with rev 739a582c405c, though.
Agreed. Going through that commit now, trying to isolate the problem. Stand by.
Comment 31•11 years ago
|
||
Reporter | ||
Comment 34•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=41106398&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41109789&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41111478&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41116661&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41117555&tree=Mozilla-Inbound
Reporter | ||
Comment 35•11 years ago
|
||
Reporter | ||
Comment 36•11 years ago
|
||
Reporter | ||
Comment 37•11 years ago
|
||
Updated•11 years ago
|
Assignee: rlb → nobody
Comment 38•11 years ago
|
||
OK, I think we're making progress on this. Richard ran a Try push with the dom/crypto tests disabled (but leaving the code itself in place) and M4 is solid green. What makes this interesting is that the dom/crypto tests don't run in M4! This immediately points to changes in test chunking. Given that we chunk-by-dir in mochitest-plain (meaning that only the addition/subtraction of test directories will affect what tests run in a given chunk) and dom/crypto runs in M3, it's a plausible theory.
And in fact, if I compare a before and after for M4, I can see that dom/geolocation was pushed from the end of M3 to the beginning of M4! What I don't know is whether this is an issue with the Geolocation tests themselves or a consequence of just running more stuff overall in M4. And even if it was an issue with the Geolocation tests not cleaning up or something, it stands to reason that we wouldn't have seen problems before with them running at the end of the chunk.
Doug, can you please help with getting someone to look into what might be going on here? This is a very frequent and difficult to star orange, so any attention is greatly appreciated :)
Flags: needinfo?(dougt)
Comment 39•11 years ago
|
||
Comment 40•11 years ago
|
||
Comment 41•11 years ago
|
||
Comment 42•11 years ago
|
||
Comment 43•11 years ago
|
||
I have seen failures when running mochitest --repeat n with the geo mochitests, where the mochitest http server would stop responding. I am having trouble reproducing this ATM.
Is this a correct interpretation of the log: after the last geo test [test_worseAccuracyDoesNotBlockCallback.html], one of the geo tests appears to still be running and calling the mochitest http server (i.e. one of the geo mochitest web pages staying open).
Reporter | ||
Comment 45•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=41169917&tree=Mozilla-Central
https://tbpl.mozilla.org/php/getParsedLog.php?id=41201153&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41199355&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41185545&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41185852&tree=Mozilla-Inbound
Reporter | ||
Comment 46•11 years ago
|
||
Reporter | ||
Comment 47•11 years ago
|
||
Reporter | ||
Comment 48•11 years ago
|
||
Comment 49•11 years ago
|
||
comment 43: there is no issue here, the geo provider gets shut down normally after 5 seconds of being idle.
Comment 51•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=41251957&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=41267655&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41252284&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=41275296&tree=Mozilla-Central
Comment 52•11 years ago
|
||
Comment 53•11 years ago
|
||
Comment 54•11 years ago
|
||
Comment 55•11 years ago
|
||
Comment 56•11 years ago
|
||
Comment 57•11 years ago
|
||
Reporter | ||
Comment 58•11 years ago
|
||
Reporter | ||
Comment 59•11 years ago
|
||
Reporter | ||
Comment 60•11 years ago
|
||
Reporter | ||
Comment 61•11 years ago
|
||
Reporter | ||
Comment 62•11 years ago
|
||
Reporter | ||
Comment 63•11 years ago
|
||
Comment 64•11 years ago
|
||
Comment 65•11 years ago
|
||
Comment 66•11 years ago
|
||
Comment 67•11 years ago
|
||
Comment 68•11 years ago
|
||
Comment 69•11 years ago
|
||
Reporter | ||
Comment 70•11 years ago
|
||
Reporter | ||
Comment 71•11 years ago
|
||
Reporter | ||
Comment 72•11 years ago
|
||
Reporter | ||
Comment 73•11 years ago
|
||
Reporter | ||
Comment 74•11 years ago
|
||
Reporter | ||
Comment 75•11 years ago
|
||
Some of the recent timeouts (was trying to see if any that came up repeatedly that we could file and so not have to manually star):
/tests/content/base/test/test_document_constructor.html | Test timed out.
/tests/dom/imptests/html/dom/nodes/test_getElementsByClassName-02.htm | application timed out after 330 seconds with no output
/tests/dom/tests/mochitest/localstorage/test_embededNulls.html | Test timed out.
/tests/dom/tests/mochitest/localstorage/test_localStorageKeyOrder.html | application timed out after 330 seconds with no output
/tests/dom/tests/mochitest/storageevent/test_storageSessionStorageEventCheckNoPropagation.html | Test timed out.
/tests/dom/tests/mochitest/whatwg/test_postMessage_origin.xhtml | Test timed out.
/tests/dom/workers/test/test_errorwarning.html | Test timed out.
/tests/dom/workers/test/test_eventDispatch.html | Test timed out.
/tests/dom/workers/test/test_instanceof.html | Test timed out.
/tests/dom/workers/test/test_url.html | Test timed out.
/tests/dom/workers/test/test_xhr2.html | Test timed out.
/tests/dom/xbl/test/test_bug591198.html | Test timed out.
/tests/dom/xslt/tests/mochitest/test_bug551654.html | Test timed out.
/tests/editor/libeditor/base/tests/test_bug586662.html | Test timed out.
/tests/editor/libeditor/html/tests/test_bug414526.html | Test timed out.
/tests/editor/libeditor/html/tests/test_bug550434.html | Test timed out.
/tests/editor/libeditor/html/tests/test_bug599322.html | Test timed out.
/tests/embedding/test/test_nsFind.html | application timed out after 330 seconds with no output
/tests/js/jsd/test/test_bug507448.html | application timed out after 330 seconds with no output
/tests/js/xpconnect/tests/mochitest/test_bug772288.html | Test timed out.
/tests/js/xpconnect/tests/mochitest/test_bug772288.html | Test timed out.
/tests/layout/base/tests/test_bug435293-scale.html | Test timed out.
/tests/layout/base/tests/test_bug441782-4e.html | Test timed out.
/tests/layout/base/tests/test_bug467672-1a.html | Test timed out.
/tests/layout/base/tests/test_bug467672-2g.html | Test timed out.
/tests/layout/base/tests/test_bug467672-2g.html | Test timed out.
/tests/layout/base/tests/test_bug467672-3g.html | Test timed out.
/tests/layout/base/tests/test_bug467672-4f.html | Test timed out.
/tests/layout/base/tests/test_bug570378-arabic-5d.html | Test timed out.
/tests/layout/base/tests/test_bug570378-persian-2b.html | Test timed out.
/tests/layout/base/tests/test_bug570378-persian-2e.html | Test timed out.
/tests/layout/base/tests/test_bug570378-persian-5f.html | Test timed out.
/tests/layout/base/tests/test_bug582771.html | Test timed out.
/tests/layout/base/tests/test_bug749186.html | Test timed out.
/tests/layout/generic/test/test_bug288789.html | application timed out after 330 seconds with no output
Reporter | ||
Comment 76•11 years ago
|
||
Have added some to the summary of bug 928309.
Reporter | ||
Comment 77•11 years ago
|
||
Reporter | ||
Comment 78•11 years ago
|
||
Reporter | ||
Comment 79•11 years ago
|
||
636 INFO TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/test_dataURLWorker.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41442167&tree=Mozilla-Inbound
Reporter | ||
Comment 80•11 years ago
|
||
TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/bugs/test_bug593174.html | application timed out after 330 seconds with no output
https://tbpl.mozilla.org/php/getParsedLog.php?id=41450592&tree=Mozilla-Central
Comment 81•11 years ago
|
||
Comment 82•11 years ago
|
||
Reporter | ||
Comment 83•11 years ago
|
||
1691 INFO TEST-UNEXPECTED-FAIL | /tests/gfx/tests/mochitest/test_acceleration.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41494179&tree=Mozilla-Central
Reporter | ||
Comment 84•11 years ago
|
||
1836 INFO TEST-UNEXPECTED-FAIL | /tests/intl/uconv/tests/test_bug959058-1.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41508411&tree=Mozilla-Inbound
2031 INFO TEST-UNEXPECTED-FAIL | /tests/js/xpconnect/tests/mochitest/test_bug655297-2.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41491269&tree=Mozilla-Inbound
1878 INFO TEST-UNEXPECTED-FAIL | /tests/intl/uconv/tests/test_utf8_overconsumption.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41514301&tree=Mozilla-Inbound
Reporter | ||
Comment 85•11 years ago
|
||
Reporter | ||
Comment 86•11 years ago
|
||
782 INFO TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/test_sharedWorker.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41532009&tree=Mozilla-Inbound
Reporter | ||
Comment 87•11 years ago
|
||
2036 INFO TEST-UNEXPECTED-FAIL | /tests/js/xpconnect/tests/mochitest/test_bug661980.html | Test timed out.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41534086&tree=Mozilla-Inbound
Comment 88•11 years ago
|
||
Comment 89•11 years ago
|
||
As of today, m-c tip is currently timing out ~16% of the time on mochitest-4. Per the Try link below, skipping Geolocation resulted in 50/50 runs being green.
https://tbpl.mozilla.org/?tree=Try&rev=cb49372a7ec6
Per the job disabling policy, I'm to wait 2 business days before disabling tests after giving notification, so if I don't hear anything by Friday, I will go ahead push this patch.
https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy
Attachment #8438665 -
Flags: review?(dougt)
Assignee | ||
Comment 91•11 years ago
|
||
Comment on attachment 8438665 [details] [diff] [review]
Skip Geolocation tests on OSX
i'll back this out when we figure out what the heck is going one.
Attachment #8438665 -
Flags: review?(dougt) → review+
Flags: needinfo?(dougt)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → dougt
Comment 92•11 years ago
|
||
Keywords: leave-open
Whiteboard: [tests disabled on OSX]
Assignee | ||
Comment 94•11 years ago
|
||
something might be funny with ipc:
09:52:49 INFO - 113 INFO TEST-START | /tests/dom/tests/mochitest/localstorage/test_appIsolation.html
09:52:49 INFO - ############################### browserElementPanning.js loaded
09:52:49 INFO - ######################## BrowserElementChildPreload.js loaded
09:52:49 INFO - JavaScript error: chrome://browser/content/tabbrowser.xml, line 3230: tab is null
Reporter | ||
Comment 96•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #89)
> As of today, m-c tip is currently timing out ~16% of the time on
> mochitest-4. Per the Try link below, skipping Geolocation resulted in 50/50
> runs being green.
> https://tbpl.mozilla.org/?tree=Try&rev=cb49372a7ec6
Great spot Ryan! :-)
Comment 97•11 years ago
|
||
Doug, you asked about mochitest repeat,
I ran mochitest --repeat 2, on my OS X 10.9.3.
Here is the output, failures in tests/content/canvas and timeouts in
tests/content/html/content.
It took 2 hours, I could run overnight with more repeats, I just have to remember to do so.
Changing the timing on the requests from 200ms to longer (as ggp mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=957580), may magically work around the problem. That particular bug indicates that the mochitest httpd has a bug, and stops responding. I realize that there is no output in the logs that indicate this is the problem (it would be easy to see in the log if the mochitest http stopped responding).
Comment 98•11 years ago
|
||
Landed on Aurora too.
https://hg.mozilla.org/releases/mozilla-aurora/rev/f978315a6f2e
status-firefox31:
--- → unaffected
status-firefox32:
--- → disabled
status-firefox33:
--- → disabled
status-firefox-esr24:
--- → unaffected
Assignee | ||
Comment 99•11 years ago
|
||
Assignee | ||
Comment 100•11 years ago
|
||
Attachment #8440135 -
Flags: review+
Assignee | ||
Comment 101•11 years ago
|
||
we need to backout 9ccdc477061e, and land attachment 8440135 [details] [diff] [review].
Assignee | ||
Comment 102•11 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/79c6d543f29f
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/b6ab581b1547
Keywords: checkin-needed
Comment 103•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/79c6d543f29f
https://hg.mozilla.org/mozilla-central/rev/b6ab581b1547
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Comment 104•11 years ago
|
||
This is test-only, so I'll get it uplifted to Aurora tomorrow if nobody beats me to it.
Whiteboard: [tests disabled on OSX]
Comment 105•11 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•