Note that we've also had other issues on Linux on the -O3 non-pgo builds... glandium might recall details.
We've had float rounding issues, but nothing crashy that i remember.
BTW, non-PGO should build -Os on aurora, by now.
(In reply to Mike Hommey [:glandium] from comment #3) > BTW, non-PGO should build -Os on aurora, by now. Confirmed, that was bug 655003.
Not quite literally permaorange on m-b: there have been three builds, two of which say in the scrape that they were clobbers, which hit this, and one which does not say it was, which did not hit this.
So, Luke pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/4c0e6b3f7791, removing a totally empty header from jseng, and browser_webconsole_bug_595934_message_categories.js went permaorange on 10.6 opt. The only thing I've seen doing that same sort of senseless and impossible in a reasonable world thing is when we've got code accessing memory that it totally shouldn't be touching, and a nothing change to jseng suddenly affects a test that couldn't possibly be affected. A previous case with a11y only happened with some PGO'd builds, when PGO happened to move things around to make the badness happen; apparently the Aurora Linux thing is the opposite, and PGO will cover our tracks but non-PGO leaves us busted. Given that the Aurora case is both devtools and workers, does that mean that both of them are being bad, or is the devtools test being bitten by bad workers? Will running these tests under Valgrind lead to it shrieking in horror at the things we're doing?
The devtools hang is the only devtools test that uses a worker, so I would not be too suspicious of the devtools code. The way these two failing tests work is that they register an error console listener and then use a worker to generate an error. The tests wait until they see the error appear in the console before continuing. Sometimes they randomly time out without anything else seeming to fail (at least, I don't see anything else in the logs). I'm not ready to blame the worker code yet, though. I've looked over the worker error generation code a bunch now and can't see anything amiss. On a whim I decided to take a look at the code in the error console listener implementation and I ran across this (nsConsoleService.cpp#161): * Iterate through any registered listeners and tell them about * the message. We use the mListening flag to guard against * recursive message logs. This could sometimes result in * listeners being skipped because of activity on other threads, * when we only care about the recursive case. That seems pretty bad.. If we're unlucky enough to have an error reported on another thread immediately before the worker code generates the error that the tests are waiting on we'll miss the report and the test will never finish. I'll see what we can do to fix this behavior (or the tests as a bandaid - maybe by looking at the console message list every second or so?). Basically this problem boils down to a bad test mechanism, not a bug in workers. However, I don't think we can simply wave this off due to the crashes, and I'm pretty sure that the crashes are entirely due to the fact that the console manager is using proxies of JS objects to make its listeners get called back on the correct thread. Proxies are no longer safe to be used by JS on other threads, so this is a big red flag. In short, I blame the console service for all my woes ;) CC'ing luke and bsmedberg since they have a plan to nuke proxies and may have a patch to fix that part. Anyone think my hunches are bonkers?
Inconveniently, backing out luke did not fix inbound's permaorange. His 10.6 opt build was a clobber, the two before that were not (and the four before that were burning), so either of them or Waldo's backout not really backing everything out are possibilities if they didn't manage to take effect until after a clobber, as is some releng change. Not sure what, if any, part of that fits with blaming the console service.
(In reply to ben turner [:bent] from comment #85)> > Anyone think my hunches are bonkers? Well, I don't get it, but that's not an unusual thing for me. How does the theory that maybe sometimes an error gets reported on another thread right before your error work with the evidence, that in Linux32 PGO builds on mozilla-aurora, it never ever happens, but in Linux32 non-PGO builds on aurora it happens every single time for both tests, and in Linux32 non-PGO builds on mozilla-beta it happens every single time only for the devtools test, not for the workers test? And what part of sfink's pushes caused an error to be reported on another thread right before yours, every single time, only on 10.6 opt?
(In reply to Phil Ringnalda (:philor) from comment #90) Dunno, race conditions like this are always pretty strange, subject to weird tiny variations in timing and whatever OS scheduler you're dealing with. I'm reasonably certain this will go away once bug 675221 removes the proxies and makes the listener logic more sane. In the meantime I believe that we're just going to disable this test.
Created attachment 566979 [details] [diff] [review] Patch to disable console listener tests This is rs=sicking.
Comment on attachment 566979 [details] [diff] [review] Patch to disable console listener tests This just disables small parts of two tests that are problematic. The real fix will land on trunk soon-ish, but the patch isn't really suitable for branches. Disabling these test pieces in the interim seems the smartest path forward.
can we turn this off on aurora or is this a real bug?
Ok, as per c#111, we can create a patch to turn off the test on aurora. We'll do the work in this bug until it lands.
and, since I'm talking to myself in this bug now, can we get approval to land BenT's patch? :)
I think we leave this FIXED since it has landed on trunk then wait for them to approve the patch for aurora.
Maybe so, I just wanted the aurora-approver-gods to see that there was still something to be done here. Hopefully my incessant prattling will convince them of that! :)
Comment on attachment 566979 [details] [diff] [review] Patch to disable console listener tests Make it so!
Heads up: having revised the infrastructure for console-monitoring tests, I turned these tests back on in bug 663291 (specifically, https://hg.mozilla.org/integration/mozilla-inbound/rev/b36eaac9ecf8 ) I may not have gotten all the race conditions.
... and they were backed out again because yep, still race conditions. Filed new bug 812714 to get them fixed properly.