Closed Bug 1140394 Opened 6 years ago Closed 6 years ago

b2g desktop reftest suite reporting WARNING when all tests were successfully executed but shutdown spew is interleaved in pass/fail/todo log line

Categories

(Testing :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(firefox40 fixed)

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: gerard-majax, Assigned: gerard-majax)

References

(Depends on 1 open bug)

Details

(Whiteboard: [systemsfe])

Attachments

(3 files)

I ran into this intermittent issue when working on mulet reftest. One try that exposes the issue is: https://treeherder.mozilla.org/#/jobs?repo=try&revision=13e6c3ec48da&exclusion_profile=false
Blocks: 1094369
In effect, that's a test of how quickly your product shuts down, and congratulations, it's intermittently getting faster.

The decision about whether the job step passed or failed comes from looking in the log for the pass/fail/todo line in a particular format (literally, matching a particular regex), and that format is not

13:55:10     INFO -  REFTEST INFO | UnexpectREFTEST INFO | b2g_desktop.py | Running tests: end.
13:55:10     INFO -  ed: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)

like it is in the failure cases. As was the case with bug 967647, some one of the multitude of harness layers isn't buffering output like it ought to, so writing the summary and writing shutdown spew get intermingled when the shutdown spew comes faster than it usually does. What that layer is, dunno, though I'd bet it's named mozsomething.
Component: Mozharness → General
Product: Release Engineering → Testing
QA Contact: jlund
Summary: Reftest suite reporting WARNING when all tests were successfully executed → b2g desktop reftest suite reporting WARNING when all tests were successfully executed but shutdown spew is interleaved in pass/fail/todo log line
Thanks for the explanation!
Phil, there are data about this on https://treeherder.allizom.org/#/jobs?repo=try&revision=7b52379388d1. I don't know if it's luck or not, but R2 and R4 are not impacted, and R5 and R6 have only one. R1 and R3 are more impacted.
Flags: needinfo?(philringnalda)
Flags: needinfo?(philringnalda)
This would be fixed by bug 798300 (but we need to fix another issue before we can re-land that).
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> This would be fixed by bug 798300 (but we need to fix another issue before
> we can re-land that).

Thanks ted. Any way I can help speeding up those ?
Flags: needinfo?(ted)
Depends on: 798300
bug 798300 comment 18 describes the fix needed. It's straightforward but unfortunately maybe not simple.
Flags: needinfo?(ted)
Bug 798300 is about changing the default behavior of mozprocess. If this is
Sorry, entered that too early.

If this is acutely a problem for b2g reftests, it looks like we could pass arguments from the b2g destkop runner to mozprocess to split the stdout/stderr streams. These tests aren't using the shutdown leaks detector so would not require the same invasive changes discussed there.
Looking at b2g_desktop.py and comparing with comment 5, the interleaving looks like it's between python sources (https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/reftest/b2g_desktop.py#l107 and https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/reftest/b2g_desktop.py#l164). If these are in different threads that could be the source of the problem, and sharing a lock in this file would be enough to fix it.
(In reply to Chris Manchester [:chmanchester] from comment #13)
> Looking at b2g_desktop.py and comparing with comment 5, the interleaving
> looks like it's between python sources
> (https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/
> reftest/b2g_desktop.py#l107 and
> https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/
> reftest/b2g_desktop.py#l164). If these are in different threads that could
> be the source of the problem, and sharing a lock in this file would be
> enough to fix it.

Thanks, I'm going to give this a try.
(In reply to Chris Manchester [:chmanchester] from comment #13)
> Looking at b2g_desktop.py and comparing with comment 5, the interleaving
> looks like it's between python sources
> (https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/
> reftest/b2g_desktop.py#l107 and
> https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/
> reftest/b2g_desktop.py#l164). If these are in different threads that could
> be the source of the problem, and sharing a lock in this file would be
> enough to fix it.

Andrew, it's in code you hacked, do you want to fix it? :)
Flags: needinfo?(ahalberstadt)
Chris, I have cross checked and the interleaving seems not to be like you said, but I see:
> 20:37:56     INFO -  REFTEST INFO | Successful: 1801 (1798 pass, 3 load only)
> 20:37:57     INFO -  REFTEST INFO | UnexpectREFTEST INFO | b2g_desktop.py | Running tests: end.
> 20:37:57     INFO -  ed: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
> 20:37:57     INFO -  REFTEST INFO | Known problems: 261 (23 known fail, 0 known asserts, 22 random, 216 skipped, 0 slow)

The part "REFTEST INFO | b2g_desktop.py | Running tests: end." seems to be coming from https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/reftest/b2g_desktop.py#l107 as you suggested.

But the part "REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)" (with the interleaving fixed) does seems to come from https://hg.mozilla.org/mozilla-central/annotate/dd32e3ff3717/layout/tools/reftest/reftest.js#l1367
Chris, am I missing something in comment 16 or the issue you pointed at in comment 13 is unrelated ?
Flags: needinfo?(cmanchester)
(In reply to Alexandre LISSY :gerard-majax from comment #17)
> Chris, am I missing something in comment 16 or the issue you pointed at in
> comment 13 is unrelated ?

I don't have first hand knowledge of this implementation, but the way this is usually done is the python harness reads output from the firefox process (managed with mozprocess) and forwards it to its own stdout or another log file. The line in reftest.js is delivering the summary to the firefox process' stdout, which would then be passed to the on_output callback I linked in comment 13. Python's print isn't thread safe, so if mozprocess is calling this callback on another thread, and I think it is, this could be our interleaving scenario. I think it's worth testing, but maybe Andrew has another take on this.
Flags: needinfo?(cmanchester)
Right, it makes sense, I missed this :). Let's try something around threading.Lock().
Not sure if it was what you had in mind, but this do not works: https://hg.mozilla.org/try/rev/efa4c9f3da95
Flags: needinfo?(cmanchester)
So I lost my sunday on trying several ways to fix this, nothing ever worked.
(In reply to Alexandre LISSY :gerard-majax from comment #20)
> Not sure if it was what you had in mind, but this do not works:
> https://hg.mozilla.org/try/rev/efa4c9f3da95

That's pretty much what I had in mind. Can you provide a link to the try results?
Flags: needinfo?(cmanchester) → needinfo?(lissyx+mozillians)
Here: https://treeherder.allizom.org/#/jobs?repo=try&revision=737d6de2d8ee

I'm giving a try to using sys.stdout.write() only and in combination with sys.stdout.flush() in:
https://treeherder.allizom.org/#/jobs?repo=try&revision=dde031e66745
https://treeherder.allizom.org/#/jobs?repo=try&revision=0b9f05b77f54
Flags: needinfo?(lissyx+mozillians)
(In reply to Alexandre LISSY :gerard-majax from comment #23)
> Here: https://treeherder.allizom.org/#/jobs?repo=try&revision=737d6de2d8ee
> 
> I'm giving a try to using sys.stdout.write() only and in combination with
> sys.stdout.flush() in:
> https://treeherder.allizom.org/#/jobs?repo=try&revision=dde031e66745

And according to:
> https://treeherder.allizom.org/#/jobs?repo=try&revision=0b9f05b77f54

sys.stdout.write() + sys.stdout.flush() helps a lot!
Attachment #8591396 - Flags: review?(ahalberstadt)
Blocks: 1153574
Comment on attachment 8591396 [details] [diff] [review]
Protect standard output from interleaving

Review of attachment 8591396 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm.
Attachment #8591396 - Flags: review?(ahalberstadt) → review+
Flags: needinfo?(ahalberstadt)
Assignee: nobody → lissyx+mozillians
Whiteboard: [systemsfe]
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2c35669aeb6f
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Blocks: 1150579
You need to log in before you can comment on or make changes to this bug.